Elixir in k8s

Why?

Because I can. Because this was the only way to setup service communication.

And as I’m an Elixir writer, I’ll give code examples in Elixir (just joking, there’ll be no code examples).

Source materials

I used these articles:

Erlang (and Elixir) distribution without epmd (more to understand what’s going on)
Clustering Elixir/Erlang applications in Kubernetes (as an example for the setup)

Let’s begin

So, this is the situation: there’s production app with N services (N ≤ 10) deployed to k8s (openshift actually, but that doesnt matter) as a set of deploymentconfig, some deployments have k8s services and routes pointed at them. Some services (not k8s services) use redis/memcached/PG/kafka/whatever to exchange data.

But suddenly (as it usually goes) a need for direct service communication appeared. For an Elixir app there’re more than one (two actually) ways of such communication:

Using “third-party” protocols (grpc and others)
Using OTP

I decided to go with OTP for these reasons:

~~Too lazy~~ not enough time to implement (even with libraries) grpc and others.
Dude, that’s erlang, c’mon, we’re fashion-driven programmers, aren’t we?

Erlang part of things is really simple, but infrastructure caused a bit of pain.

Fairytale-case scenario

For fairytale-case there should be fixed DN for all instances of each service which automatically properly deployed and services have full network interconnection (tcp, ofc).

Then we’ll just start the node:

ERL_OPTIONS="-name ${SNAME}@${HOSTNAME} -setcookie ${ERLANG_COOKIE}"
elixir --erl "${ERL_OPTIONS}" -S mix run --no-halt

And run:

Node.ping(:"some_other_node@some.other.domain.name")
# => :pong

But I don’t work in fairytale infrastructure with ponies, rainbows and respectful infrastructure ingeneers. Ugh.

DN discoverability

So, first problem: DNs are dynamic with regular deploymentconfig. And using some sort of service-discovery won’t work properly because erlang node wants to know it’s full DN at start.

Or something like that.

If we use service’ы, we can have one DN for one deploymentconfig. But what if we have multiple instances? Where the connection will go?

To sort this out we have to divide services in two groups:

Waiting for connection
Initiating connection

Waiting for connection

This one’s simple but has it’s limitations.

We just don’t scale these services (urghhh…).

So the only instance will be available at service DN like exclusive-service.project.svc.cluster.local.

Then we start an application with known DN (I decided not to set HOSTNAME, but use separate CLUSTER_HOSTNAME variable):

ERL_OPTIONS="-name ${SNAME}@${CLUSTER_HOSTNAME} -setcookie ${ERLANG_COOKIE}"
elixir --erl "${ERL_OPTIONS}" -S mix run --no-halt

Initiating connection

This one’s even simpler: just start this process with any FQDN (so erlang to start with FQDN-mode). I just added some breach for debugging.

Pods are available (through services) at <pod-name>.<service-dn> addresses, like pod-12345-qwerty.non-exclusive-service.project.svc.cluster.local. But we only know service-dn beforehand and pod-name is put into HOSTNAME variable as start.

What should we do? Build effective DN at start:

ERL_OPTIONS="-name ${SNAME}@${HOSTNAME}.${CLUSTER_HOSTNAME} -setcookie ${ERLANG_COOKIE}"
elixir --erl "${ERL_OPTIONS}" -S mix run --no-halt

So if one service will have to instances we’ll be able to communicate to every instance with it’s name. This method could be used with service-discovery. I just use it for debugging.

Ports

Problem #2: port forwarding. There’re to ports required: epmd and erlang process itself

empd

Again, pretty simple.

epmd uses 4369 as listening port. So we need to forward it in services:

apiVersion: v1
kind: Service
# ...
spec:
  ports:
    - name: epmd
      port: 4369
      protocol: TCP
      targetPort: 4369
  selector:
    deploymentconfig: some-service
# ...

Erlang process

That one’s a bit tricky. Every erlang process listens at some random port for OTP connections and registers at empd. For outgoing connections erlang process connects to epmd and asks “which port this process runs at?”

And, as epmd problem is easily solved, “random” port requires some more handling.

Forwarding all 65535 ports is not a good idea for many reasons (including me not finding directive “forward everything, I don’t care”).

To enable erlang processes communication we should forward some exact port and force erlang process to listen on that port.

First one is, again, simple:

apiVersion: v1
kind: Service
# ...
spec:
  ports:
    - name: erlang-process
      port: 43691
      protocol: TCP
      targetPort: 43691
  selector:
    deploymentconfig: some-service
# ...

For the second one we can use inet_dist_listen_min и inet_dist_listen_max start options, which set listening port range, to limit erlang process to exactly one port:

ERL_PORT=43691
ERL_KERNEL_OPTIONS="-kernel inet_dist_listen_min ${ERL_PORT} inet_dist_listen_max ${ERL_PORT}"
ERL_OPTIONS="-name ${SNAME}@${CLUSTER_HOSTNAME} -setcookie ${ERLANG_COOKIE} ${ERL_KERNEL_OPTIONS}"
elixir --erl "${ERL_OPTIONS}" -S mix run --no-halt

And, voila, erlang processes are running and communicating!

Obviously, running multiple OS erlang processes is impossible with this approach. But don’t we use k8s just for that “one process per container”?

Of course, we can just omit empd at all for single-process setup, but that requires copy-pasting some actual erlang code.

Conclusion

This approach is for deploymentconfigs. There’s alternatice using StatefulSets, which, in theory, looks cooler, but required complete resetup for an already running production app (no downtimes allowed, ofc).