Over the past few months, I’ve been digging into what it means to work with a distributed container service. Inspired by Werner Vogel’s latest post about ECS, I decided to show an architecture for deploying a container service in Eucalyptus. As part of my investigations into containers, I have looked at the following platforms that provide the ability to manage container based services:
Each of these provide you with a symmetrical (all components run on all hosts) and scalable (hosts can be added after initial deployment) system for hosting your containerized workloads. They also include mechanisms for service discovery and load balancing. Deis and Flynn are both what I would call a “lightweight PaaS” akin to a private Heroku. Mesos, however, is a more flexible and open ended platform, which comes as a blessing and a curse. I was able to deploy many more applications in Mesos but it took me far longer to get a working platform up and running.
Deis and Flynn are both “batteries included” type systems that once deployed allow you to immediately push your code or container image into the system and have it run your application. Deis and Flynn also install all of their dependencies for you through automated installers. Mesos on the other hand requires you to deploy its prerequisites on your own in order to get going, then requires you to install frameworks on top of it to make it able to schedule and run your applications.
I wanted to make a Mesos implementation that felt as easy to make useful as Deis and Flynn. I have been working with chef-provisioning to deploy clustered applications for a while now so I figured I would use my previous techniques in order to automate the process of deploying a functional and working N node Mesos/Marathon cluster. Over the last month, I have also been able to play with Mesosphere’s DCOS so was able to get a better idea of what it takes to really make Mesos useful to end users. The “batteries included” version of Mesos is architected as follows:
Each of the machines in our Mesos cluster will run all of these components, giving us a nice symmetrical architecture for deployment. Mesos and many of its dependencies rely on a working Zookeeper as a distributed key value store. All of the state for the cluster is stored here. Luckily, for this piece of the deployment puzzle I was able to leverage the Chef community’s Exhibitor cookbook which got my ZK cluster up in a snap. Once Zookeeper was deployed, I was able to get my Mesos masters and slaves connected together and was able to see my CPU, memory and disk resources available within the Mesos cluster.
Mesos itself does not handle creating applications as services so we need to deploy a service management layer. In my case, I chose Marathon as it is intended to manage long running services like the ones I was most interested in deploying (Elasticsearch, Logstash, Kibana, Chronos). Marathon is run outside of Mesos and acts as the bootstrapper for the rest of the services that we would like to use, our distributed init system.
Once applications are deployed into Marathon it is necessary to have a mechanism to discover where other services are running. Although it is possible to pin particular services to particular nodes through the Marathon application definition, I would prefer not to have to think about IP addressing in order to connect applications. The preferred method of service discovery in the Mesos ecosystem is to use Mesos DNS and host it as a service in Marathon across all of your nodes. Each slave node can then use itself as a DNS resolver, wherein queries for services get handled internally and all others are recursed to an upstream DNS server.
Now that the architecture of the container service is laid out for you, you can get to deploying your stack by heading over to the README. This deployment procedure will not only deploy Mesos+Marathon but will also deploy a full ELK into the cluster to demonstrate connecting various services together in order to provide a higher order one.