Building a container service with Mesos and Eucalyptus

    Over the past few months, I’ve been digging into what it means to work with a distributed container service. Inspired by Werner Vogel’s latest post about ECS, I decided to show an architecture for deploying a container service in Eucalyptus. As part of my investigations into containers, I have looked at the following platforms that provide the ability to manage container based services:
Each of these provide you with a symmetrical (all components run on all hosts) and scalable (hosts can be added after initial deployment) system for hosting your containerized workloads. They also include mechanisms for service discovery and load balancing. Deis and Flynn are both what I would call a “lightweight PaaS” akin to a private Heroku. Mesos, however, is a more flexible and open ended platform, which comes as a blessing and a curse. I was able to deploy many more applications in Mesos but it took me far longer to get a working platform up and running.
     Deis and Flynn are both “batteries included” type systems that once deployed allow you to immediately push your code or container image into the system and have it run your application. Deis and Flynn also install all of their dependencies for you through automated installers. Mesos on the other hand requires you to deploy its prerequisites on your own in order to get going, then requires you to install frameworks on top of it to make it able to schedule and run your applications.
     I wanted to make a Mesos implementation that felt as easy to make useful as Deis and Flynn. I have been working with chef-provisioning to deploy clustered applications for a while now so I figured I would use my previous techniques in order to automate the process of deploying a functional and working N node Mesos/Marathon cluster. Over the last month, I have also been able to play with Mesosphere’s DCOS so was able to get a better idea of what it takes to really make Mesos useful to end users. The “batteries included” version of Mesos is architected as follows:
Each of the machines in our Mesos cluster will run all of these components, giving us a nice symmetrical architecture for deployment. Mesos and many of its dependencies rely on a working Zookeeper as a distributed key value store. All of the state for the cluster is stored here. Luckily, for this piece of the deployment puzzle I was able to leverage the Chef community’s Exhibitor cookbook which got my ZK cluster up in a snap. Once Zookeeper was deployed, I was able to get my Mesos masters and slaves connected together and was able to see my CPU, memory and disk resources available within the Mesos cluster.
    Mesos itself does not handle creating applications as services so we need to deploy a service management layer. In my case, I chose Marathon as it is intended to manage long running services like the ones I was most interested in deploying (Elasticsearch, Logstash, Kibana, Chronos). Marathon is run outside of Mesos and acts as the bootstrapper for the rest of the services that we would like to use, our distributed init system.
     Once applications are deployed into Marathon it is necessary to have a mechanism to discover where other services are running. Although it is possible to pin particular services to particular nodes through the Marathon application definition, I would prefer not to have to think about IP addressing in order to connect applications. The preferred method of service discovery in the Mesos ecosystem is to use Mesos DNS and host it as a service in Marathon across all of your nodes. Each slave node can then use itself as a DNS resolver, wherein queries for services  get handled internally and all others are recursed to an upstream DNS server.
     Now that the architecture of the container service is laid out for you, you can get to deploying your stack by heading over to the README. This deployment procedure will not only deploy Mesos+Marathon but will also deploy a full ELK into the cluster to demonstrate connecting various services together in order to provide a higher order one.
Eucalyptus, QA

EucaLoader: Load Testing Your Eucalyptus Cloud



After provisioning a cloud that will be used by many users, it is best practice to do load or burn in testing to ensure that it meets your stability and scale requirements. These activities can be performed manually by running commands to run many  instances or create many volumes for example. In order to perform sustained long term tests it is beneficial to have an automated tool that will not only perform the test actions but also allow you to analyze and interpret the results in a simple way.


Over the last year, I have been working with Locust to provide a load testing framework for Eucalyptus clouds. Locust is generally used for load testing web pages but allows for customizable clients which allowed me to hook in our Eutester library in order to generate load. Once I had created my client, I was able to create Locust “tasks” that map to activities on the cloud. Tasks are user interactions like creating a bucket or deleting a volume. Once the tasks were defined I was able to compose them into user profiles that define which types of actions each simulated user will be able to run as well as weighting their probability so that the load can most closely approximate a real world use case. In order to make the deployment of EucaLoader as simple as possible, I have baked the entire deployment into a CloudFormation template. This means that once you have the basics of your deployment done, you can start stressing your cloud and analyzing the results with minimal effort.

Using EucaLoader


In order to use EucaLoader you will first need to load up an Ubuntu Trusty image into your cloud as follows:

# wget https://cloud-images.ubuntu.com/trusty/current/trusty-server-cloudimg-amd64-disk1.img
# qemu-img convert -O raw trusty-server-cloudimg-amd64-disk1.img trusty-server-cloudimg-amd64-disk1.raw
# euca-install-image -i trusty-server-cloudimg-amd64-disk1.raw -n trusty -r x86_64 -b trusty --virt hvm

We will also need to clone the EucaLoader repository and install its dependencies:

# git clone https://github.com/viglesiasce/euca-loader
# pip install troposphere

Next we will upload credentials for a test account to our objectstore so that our loader can pull them down for Eutester to use:

# euare-accountcreate loader
# euca_conf --get-credentials  loader.zip --cred-account loader
# s3cmd mb s3://loader
# s3cmd put -P loader.zip s3://loader/admin.zip

Launching the stack

Once inside the euca-loader directory we will create our CloudFormation template and then create our stack by passing in the required parameters:

# ./create-locust-cfn-template.py > loader.cfn
# euform-create-stack --template-f loader.cfn  loader -p KeyName=<your-keypair-name> -p CredentialURL='http://<your-user-facing-service-ip>:8773/services/objectstorage/loader/admin.zip' -p ImageID=<emi-id-for-trusty> -p InstanceType=m1.large

At this point you should be able to monitor the stack creation with the following commands

# euform-describe-stacks
# euform-describe-stack-events loader

Once the stack shows as CREATE_COMPLETE, the describe stacks command should show outputs which point you to the Locust web portal (WebPortalUrl) and to your Grafana dashboard for monitoring trends (GrafanaURL).

Starting the tests

In order to start your user simulation, point your web browser to the the WebPortalUrl as defined by the describe stacks output. Once there you can enter the amount of users you’d like to simulate as well as how quickly those users should “hatch”.


Once you’ve started the test, the statistics for each type of requests will begin to show up in the Locust dashboard.


See your results

In order to better visualize the trends in your results, EucaLoader provides a Grafana dashboard that tracks a few of the requests for various metrics. This dashboard is easily customized to your particular test and is meant as a jumping off point.



Eucalyptus & Midokura | AWS VPC on-premise


Nice detailed Euca VPC Tutorial. Great work John!

Originally posted on A sysadmin born in the cloud:

About 2 years ago, AWS passed all new account and migrated existing ones to have “EC2 classic” instances into a VPC.

A lot of new features came out from this but most importantly, VPC would provide the ability for everyone to have backend applications running in Private. No public traffic, no access to and from the internet unless wanted. A keystone for AWS to promote the Public cloud as a safe place.

So, Eucalyptus is now taking VPC into the system as one of the key feature for years to come, and have decided to go with Midokura to orchestrate and manage networking.

Midokura is a SDN software which is used to manage routing between instances, to the internet, security groups etc. The super cool thing about about Midokura is its capacity to be high-available and scalable in time. Of course being originally a networking guy, I also find super…

View original 1,939 more words


Introducing HuevOS+RancherOS


Today is an exciting day in Santa Barbara. We are very pleased to introduce our latest innovation to the world of DevOps. 

HuevOS – the Docker-based open-source operating system for tomorrow’s IT and Dev/Ops professional.  HuevOS 1.0 (SunnySide) is the open-source/free-range/gluten-free solution that forms the perfect complement to RancherOS.  In addition, we’re delighted to begin development on our proprietary blend of Services and Language Software as a Service (SaLSaaS) which, when overlayed atop a HuevOS+RancherOS stack, provides a complete and delicious solution around which your whole day can be centered.

Try HuevOS+RancherOS today and let us know which SaLSaaS we should work on first to ensure your hunger for DevOps is quenched thoroughly.  

To get your first taste visit the following repository which includes our Chef Recipe and a Vagrantfile to get you HuevOS in short order:


If you already have your RancherOS host up its easy to add in our HuevOS to the mix via the Docker Registry:

docker pull viglesiasce/huevos; docker run huevos

A huge thank you to all involved in getting us to this point and being able to ship a 1.0 version of the HuevOS+RancherOS platform.

Happy clucking!!



Deploying Cassandra and Consul with Chef Provisioning



Chef Provisioning (née Chef Metal) is an incredibly flexible way to deploy infrastructure. Its many plugins allow users to develop a single methodology for deploying an application that can then be repeated against many types of infrastructure (AWS, Euca, Openstack, etc). Chef provisioning is especially useful when deploying clusters of machines that make up an application as it allows for machines to be:

  • Staged before deployment
  • Batched for parallelism
  • Deployed in serial when necessary

This level of flexibility means that deploying interesting distributed systems like Cassandra and Consul is a breeze. By leveraging community cookbooks for Consul and Cassandra, we can largely ignore the details of package installation and service management and focus our time on orchestrating the stack in the correct order and configuring the necessary attributes such that our cluster converges properly. For this tutorial we will be deploying:

  • DataStax Cassandra 2.0.x
  • Consul
    • Service discovery via DNS
    • Health checks on a per node basis
  • Consul UI
    • Allows for service health visualization

Once complete we will be able to use Consul’s DNS service to load balance our Cassandra client requests across the cluster as well as use Consul UI in order to keep tabs on our clusters’ health.

In the process of writing up this methodology, I went a step further and created a repository and toolchain for configuring and managing the lifecycle of clustered deployments. The chef-provisioning-recipes repository will allow you to configure your AWS/Euca cloud credentials and images and deploy any of the clustered applications available in the repository.

Steps to reproduce

Install prerequisites

  • Install ChefDK
  • Install package deps (for CentOS 6)
    yum install python-devel gcc git
  • Install python deps:
    easy_install fabric PyYaml
  • Clone the chef-provisioning-recipes repo:
    git clone https://github.com/viglesiasce/chef-provisioning-recipes

Edit config file

The configuration file (config.yml) contains information about how and where to deploy the cluster. There are two main sections in the file:

  1. Profiles
    1. Which credentials/cloud to use
    2. What image to use
    3. What instance type to use
    4. What username to use
  2. Credentials
    1. Cloud endpoints or region
    2. Cloud access and secret keys

Edit the config.yml file found in the repo such that the default profile points to a CentOS 6 image in your cloud and the default credentials point to the proper cloud.

Run the deployment

Once the deployer has been configured we simply need to run it and tell it which cluster we would like to deploy. In this case we’d like to deploy Cassandra so we will run the deployer as follows:

./deployer.py cassandra

This will now automate the following process:

  1. Create a chef repository
  2. Download all necessary cookbooks
  3. Create all necessary instances
  4. Deploy Cassandra and Consul

Once this is complete you should be able to see your instances running in your cloud tagged as follows: cassandra-default-N. In order to access your Consul UI dashboard go to http://instance-pub-ip:8500

You should now also be able to query any of your Consul servers for the IPs of your Cassandra cluster:

nslookup cassandra.service.paas.home &amp;amp;amp;lt;instance-pub-ip&amp;amp;amp;gt;

In order to tear down the cluster simply run:

./deployer.py cassandra --op destroy

Chef Metal with Eucalyptus


My pull request to chef-metal-fog was recently accepted and released in version 0.8.0 so a quick post on how to get up and running on your Eucalyptus Cloud seemed appropriate.

Chef Metal is a new way to provision your infrastructure using Chef recipes. It allows you to use the same convergent design as normal Chef recipes. You can now define your cloud or bare metal deployment in a Chef recipe then deploy, update and destroy it with chef-client. This flexibility is incredibly useful for both development of new Chef cookbooks and in exploring various topologies of distributed systems.

Game time

First, install the Chef Development Kit. This will install chef-client and a few other tools to get you well on your way to Chef bliss.

Once you have installed the Chef DK on your workstation, install the chef-metal gem into the Chef Ruby environment:

chef gem install chef-metal

You will need to create your Chef repo. This repository will contain all the information about how and where your application gets deployed using Chef Metal. In this case we are naming our app “euca-metal”.

chef generate app euca-metal

You should now see a directory structure as follows:

├── README.md
└── cookbooks
 └── euca-metal
   ├── Berksfile
   ├── chefignore
   ├── metadata.rb
   └── recipes
     └── default.rb

Now that the skeleton of our application has been created lets edit cookbooks/euca-metal/recipes/default.rb to look like this:

require 'chef_metal_fog'

### Arbitrary name of our deployment
deployment_name ='chef-metal-test'

### Use the AWS provider to provision the machines
### Here is where we set our endpoint URLs and keys for our Eucalyptus deployment
with_driver 'fog:AWS', :compute_options => { :aws_access_key_id => 'XXXXXXXXXXXXXXX',
                                             :aws_secret_access_key => 'YYYYYYYYYYYYYYYYYYYYYYYYYY',
                                             :ec2_endpoint => 'http://compute.cloud:8773/services/compute',
                                             :iam_endpoint => 'http://euare.cloud:8773/services/objectstorage'

### Create a keypair named after our deployment
fog_key_pair deployment_name do
  allow_overwrite true

### Use the key created above to login as root, all machines below
### will be run using these options
with_machine_options ssh_username: 'root', ssh_timeout: 60, :bootstrap_options => {
  :image_id => 'emi-A6EA57D5',
  :flavor_id => 't1.micro',
  :key_name => deployment_name

### Launch an instance and name it after our deployment
machine deployment_name do
  ### Install Java on the instance using the Java recipe
  recipe 'java'

Once we have defined our deployment we will need to create a local configuration file for chef-client:

mkdir -p .chef; echo 'local_mode true' > .chef/knife.rb

Now that we have defined the deployment and setup chef-client, lets run the damn thing!

chef-client -z cookbooks/euca-metal/recipes/default.rb

You can now see Chef create your keypair, launch your instance, and then attempt to run the “java” recipe as we specified. Unfortunately this has failed. We never told our euca-metal cookbook that it required the Java cookbook nor did we download that cookbook for it to use. Let’s fix that.

First we will tell our euca-metal cookbook that we need it to pull in the ‘java’ cookbook in order to provision the node. We need to add the ‘depends’ line to our cookbook’s metadata.rb file which can be found here: cookbooks/euca-metal/metadata.rb

name 'euca-metal'
maintainer ''
maintainer_email ''
license ''
description 'Installs/Configures euca-metal'
long_description 'Installs/Configures euca-metal'
version '0.1.0'
depends 'java'

Next we will need to actually download that Java cookbook that we now depend on. To do that we need to:

# Change to the euca-metal cookbook directory
cd cookbooks/euca-metal/
# Use berkshelf to download our cookbook dependencies
berks vendor
# Move the berks downloaded cookbooks to our main cookbook repository
# Note that it wont overwrite our euca-metal cookbook
mv berks-cookbooks/* ..
cd ../..
# Rerun our chef-client to deploy Java for realz
chef-client -z cookbooks/euca-metal/recipes/default.rb

You will notice that the machine is not reprovisioned (YAY convergence!). The Java recipe should now be running happily on your existing instance. You can find your ssh keys in the .chef/keys directory.

Happy AWS Compatible Private Cloud Cheffing!!!!

Many thanks to John Keiser for his great work on chef-metal.


Using Comcast CMB for SQS and SNS on Eucalyptus


As part of a service oriented infrastructure there comes a need to coordinate work between services. AWS provides a couple of services which allow for application components to communicate with each other and their users/administrators in a decoupled fashion.

The Simple Queue Service (SQS) is a mechanism for an applications producers to distribute work to their consumers in a scalable, reliable and fault tolerant way. The basic lifecycle in SQS is as follows:

  1. A queue is created
  2. Producers send arbitrary text messages into the queue
  3. Consumers are constantly listing the messages in a queue and when one is available they “check out” the work by reading the message
  4. Once the message is read a timer kicks that makes the message unreadable by other consumers for a certain period of time (called the visibility timeout).
  5. The consumer can then perform the necessary task described by the message and then delete the message from the queue
  6. If the consumer does not complete the task in time or fails for some other reason the message is made visible again in the queue and picked up by another consumer

One simple example of using this service would be for a Web application front end to take image conversion orders from a user and then throw the image conversion task into a queue that can then be serviced by a fleet of worker nodes that do the actual image processing (ie the compute heavy portion).

The Simple Notification Service (SNS) is a service that allows for the coordination of messages that have one or more recipient subscribing endpoints. In this service users create a topic and then other services and users can subscribe to the topic in order to receive notifications about its goings on. In this model the sender of the message does not have to know where messages are actually being sent but rather that all subscribers (ie people/apps who need the message) will receive the message in the form that they have requested. Subscriptions to topics can be made through various transport mechanisms:

  1. HTTP
  2. HTTPS
  3. SMS
  4. Email
  5. Email-json
  6. SQS

By publishing a message to a topic with multiple subscribers you can ensure that both applications and the people managing them are all on the same page.

Eucalyptus currently does not implement SQS and SNS but the folks over at Comcast have created an incredibly useful open source project that mirrors the APIs with absolutely incredible fidelity. Not only did they ensure that their API coverage was accurate and useful but they built the application stack on top of Cassandra and Redis making it not only horizontally scalable but extremely performant to boot. For more information: Comcast CMB.

Running CMB in your Eucalyptus cloud

In order to simplify the process of installing and bootstrapping CMB, I have created an image that you can install on your cloud with all the requisite services in place. All instructions here should be performed from your Eucalyptus CLC with your admin credentials sourced.

  1. Download the image and decompress it
    1. curl http://eucalyptus-images.s3.amazonaws.com/public/cmb.raw.xz > cmb.raw.xz
    2. xz -d cmb.raw.xz
  2. Install the image
    1. euca-install-image –virt hvm -i cmb.raw -r x86_64 -b CMB -n CMB
  3. Launch the image
    1. euca-run-instance -k <my-keypair> <emi-from-step-2>
  4. Once the image is launched login to the admin portal to create your first user and get your credentials
    1. Goto http://<instance-public-ip&gt;:6059/webui
    2. Login with: cns_internal/cns_internal
    3. Create a new user
    4. Take note of the Access and Secret keys for your new users
  5. Start using your new services with your favorite SDK

Example: Interacting with SQS using Boto

In the example below swap change the following variables to fit your environment:

  • cmb_host – Hostname or IP of your CMB server
  • access_key – Taken from step 4D above
  • secret_key – Taken from step 4D above
from boto.sqs.regioninfo import SQSRegionInfo
from boto.sqs.connection import SQSConnection

cmb_host = 'instance-ip'
access_key = 'your-access-key-from-step-4D'
secret_key = 'your-secret-key-from-step-4D'
cmb_sqs_port = 6059

sqs_region = SQSRegionInfo(endpoint=cmb_host, name='home')
cmb_sqs = SQSConnection(aws_access_key_id=access_key, aws_secret_access_key=secret_key,
region=sqs_region, is_secure=False,

queue = cmb_sqs.create_queue('test')
msg = queue.new_message('Hello World')

all_queues = cmb_sqs.get_all_queues()
print 'Current queues: '  + str(all_queues)
for queue in all_queues:
    print 'Messages in queue: ' + str([msg.get_body() for msg in queue.get_messages()])