

Sunday, February 21, 2016

Setting up Nomad and Consul in EC2 instead of Mesophere

Setting up Nomad instead of Mesophere. Notes on me setting up a Nomad set of servers for development in EC2.

Server nodes for Nomad and Consul

Three EC2 Medium (triad) machines.
Each server runs
  • consul server agent,
  • nomad server agent
  • No Docker here

Worker Nodes (aka client nodes)

Three to X client Agent Nodes (EC2 LARGE or better)
Each client agent node runs
  • consul client agent
  • nomad server agent
  • Docker daemon
These connect to home server triads (consul and nomad).

Prod cluster

Server nodes for Nomad and Consul

Five EC2 Large (triad) machines.
Each server runs
  • consul server agent,
  • nomad server agent
  • No Docker here

Worker Nodes (aka client nodes)

Three to X client Agent Nodes (as large as we need, at least one machine per AZ)
Each client agent node runs
  • consul client agent
  • nomad server agent
  • Docker daemon
These connect to home server triads (consul and nomad).

Implementation details

You have four roles
  • server1
  • server2
  • server3
  • worker-node
All worker-nodes are ephemeral. They can get blown away.
The servers: server1server2server3 form a triad cluster. Any triad member can die and be replaced. They should be started back up with the same basic ip address info.
Since we are running consulnomad, and zookeeper on the triad, there is no advantage to using consul for nomad server node discovery because consul is installed on the same triad of servers.
The server server1 is a bit special because it has the script to connect the servers into a cluster. For the most part, server1 is identical to the others.

Server 1

Server1 Vagrantfile

# -*- mode: ruby -*-
# vi: set ft=ruby :

$script = <<SCRIPT
# Update apt and get dependencies
sudo apt-get update
sudo apt-get install -y unzip curl wget vim

# Download Nomad
echo Fetching Nomad...
cd /tmp/
curl -sSL https://releases.hashicorp.com/nomad/0.2.3/nomad_0.2.3_linux_amd64.zip -o nomad.zip

echo Installing Nomad...
unzip nomad.zip
sudo chmod +x nomad
sudo mv nomad /usr/bin/nomad
sudo mkdir -p /etc/nomad.d
sudo chmod a+w /etc/nomad.d
sudo mkdir -p /opt/nomad/data
sudo mkdir -p /var/log/nomad
sudo chmod a+w /var/log/nomad
sudo cp /vagrant/server.hcl /etc/nomad.d/

echo Fetching Consul...
curl -sSL https://releases.hashicorp.com/consul/0.6.3/consul_0.6.3_linux_amd64.zip -o consul.zip
echo Installing Consul...
unzip consul.zip
sudo chmod +x consul
sudo mv consul /usr/bin/consul
sudo mkdir -p /etc/consul.d
sudo chmod a+w /etc/consul.d
sudo mkdir -p /opt/consul/data
sudo mkdir -p /var/log/consul
sudo chmod a+w /var/log/consul
sudo cp /vagrant/consul.json /etc/consul.d/

echo Starting nomad
cd ~
sudo nohup nomad agent -config /etc/nomad.d/server.hcl &>nomad.log  &

echo Starting Consul
sudo nohup consul agent -config-file /etc/consul.d/consul.json &>consul.log  &


Vagrant.configure(2) do |config|
  config.vm.box = "base-box"
  config.vm.hostname = "nomad"
  config.vm.provision "shell", inline: $script, privileged: false
  config.vm.network "private_network", ip: ""

  # Increase memory for Parallels Desktop
  config.vm.provider "parallels" do |p, o|
    p.memory = "1024"

  # Increase memory for Virtualbox
  config.vm.provider "virtualbox" do |vb|
        vb.memory = "1024"

  # Increase memory for VMware
  ["vmware_fusion", "vmware_workstation"].each do |p|
    config.vm.provider p do |v|
      v.vmx["memsize"] = "1024"

  config.vm.provider :aws do |aws, override|
   aws.keypair_name = "my-app-key"
   aws.region = "us-west-2"
   # Ubuntu public Amazon EC2 image for Ubuntu 64 bit
   aws.ami = "ami-9abea4fb"
   override.ssh.username = "ubuntu"
   override.ssh.private_key_path = "/opt/aws/my-app-key.pem"

   aws.tags = {
     'Name' => 'my-app-cluster-server-1'

   # vpc-d14dacb5
   aws.subnet_id = "subnet-abc123ab"
   aws.security_groups = "sg-abc123ab"
   override.vm.hostname = "ip-21-10-0-10"
   # override.ssh.host = "" //NOT EXPOSED TO VPN traffic yet
   # We have to use public IP address because we don't have the VPC tied to vpn traffic
   aws.associate_public_ip = true



bind_addr = ""

advertise {
  # We need to specify our host's IP because we can't
  # advertise to other nodes in our cluster.
  rpc = ""

# Increase log verbosity
log_level = "DEBUG"

# Setup data dir
data_dir = "/opt/nomad/data"

# Enable the server
server {
    enabled = true
    start_join = ["", "", ""]
    retry_join = ["", "", ""]
    retry_interval = "15s"


bind_addr = ""

advertise {
  # We need to specify our host's IP because we can't
  # advertise to other nodes in our cluster.
  rpc = ""

# Increase log verbosity
log_level = "DEBUG"

# Setup data dir
data_dir = "/opt/nomad/data"

# Enable the server
server {
    enabled = true

    # Self-elect, should be 3 or 5 for production
    bootstrap_expect = 3

consul.json server1

  "data_dir": "/opt/consul/data",
  "log_level": "DEBUG",
  "node_name": "master1",
  "server": true,

  "start_join" : ["", "", ""],
  "retry_join" : ["", "", ""],
  "retry_interval" : "15s"


  "data_dir": "/opt/consul/data",
  "log_level": "DEBUG",
  "node_name": "master1",
  "server": true,
  "bootstrap_expect" : 3


sudo pkill consul
sudo pkill nomad
sleep 5s

sudo rm -rf /opt/consul/data/*
sudo rm -rf /opt/nomad/data/*
sleep 5s

sudo nohup consul agent -config-file /vagrant/consul-bootstrap.json &>consul.log  &
sudo nohup nomad agent -config /vagrant/server-bootstrap.hcl &>nomad.log  &
sleep 5s

sudo nomad server-join -address=
sudo nomad server-join -address=
sudo nomad server-members -address=

Server 2 and 3 are simpler

Server 2

Vagrantfile server2

# -*- mode: ruby -*-
# vi: set ft=ruby :

$script = <<SCRIPT
# Update apt and get dependencies
sudo apt-get update
sudo apt-get install -y unzip curl wget vim

# Download Nomad
echo Fetching Nomad...
cd /tmp/
curl -sSL https://releases.hashicorp.com/nomad/0.2.3/nomad_0.2.3_linux_amd64.zip -o nomad.zip

echo Installing Nomad...
unzip nomad.zip
sudo chmod +x nomad
sudo mv nomad /usr/bin/nomad
sudo mkdir -p /etc/nomad.d
sudo chmod a+w /etc/nomad.d
sudo mkdir -p /opt/nomad/data
sudo mkdir -p /var/log/nomad
sudo chmod a+w /var/log/nomad
sudo cp /vagrant/server.hcl /etc/nomad.d/

echo Fetching Consul...
curl -sSL https://releases.hashicorp.com/consul/0.6.3/consul_0.6.3_linux_amd64.zip -o consul.zip
echo Installing Consul...
unzip consul.zip
sudo chmod +x consul
sudo mv consul /usr/bin/consul
sudo mkdir -p /etc/consul.d
sudo chmod a+w /etc/consul.d
sudo mkdir -p /opt/consul/data
sudo mkdir -p /var/log/consul
sudo chmod a+w /var/log/consul
sudo cp /vagrant/consul.json /etc/consul.d/

echo Starting nomad
cd ~
sudo nohup nomad agent -config /etc/nomad.d/server.hcl &>nomad.log  &

echo Starting Consul
sudo nohup consul agent -config-file /etc/consul.d/consul.json &>consul.log  &


Vagrant.configure(2) do |config|
  config.vm.box = "base-box"
  config.vm.hostname = "nomad"
  config.vm.provision "shell", inline: $script, privileged: false
  config.vm.network "private_network", ip: ""

  # Increase memory for Parallels Desktop
  config.vm.provider "parallels" do |p, o|
    p.memory = "1024"

  # Increase memory for Virtualbox
  config.vm.provider "virtualbox" do |vb|
        vb.memory = "1024"

  # Increase memory for VMware
  ["vmware_fusion", "vmware_workstation"].each do |p|
    config.vm.provider p do |v|
      v.vmx["memsize"] = "1024"

  config.vm.provider :aws do |aws, override|
   aws.keypair_name = "my-app-key"
   aws.region = "us-west-2"
   # Ubuntu public Amazon EC2 image for Ubuntu 64 bit
   aws.ami = "ami-9abea4fb"
   override.ssh.username = "ubuntu"
   override.ssh.private_key_path = "/opt/aws/my-app-key.pem"

   aws.tags = {
     'Name' => 'my-app-cluster-server-2'

   # vpc-d14dacb5
   aws.subnet_id = "subnet-abc123ab"
   aws.security_groups = "sg-abc123ab"
   override.vm.hostname = "ip-21-10-0-11"
   #override.ssh.host = "" //NOT EXPOSED TO VPN traffic yet
   # We have to use public IP address because we don't have the VPC tied to vpn traffic
   aws.associate_public_ip = true


consul.json server2

  "data_dir": "/opt/consul/data",
  "log_level": "DEBUG",
  "node_name": "master2",
  "server": true,

  "start_join" : ["", "", ""],
  "retry_join" : ["", "", ""],
  "retry_interval" : "15s"

server.hcl server2 nomad config file

bind_addr = ""

advertise {
  # We need to specify our host's IP because we can't
  # advertise to other nodes in our cluster.
  rpc = ""

# Increase log verbosity
log_level = "DEBUG"

# Setup data dir
data_dir = "/opt/nomad/data"

# Enable the server
server {
    enabled = true
    start_join = ["", "", ""]
    retry_join = ["", "", ""]
    retry_interval = "15s"

Server 3

Vagrantfile server3

bind_addr = ""

advertise {
  # We need to specify our host's IP because we can't
  # advertise to other nodes in our cluster.
  rpc = ""

# Increase log verbosity
log_level = "DEBUG"

# Setup data dir
data_dir = "/opt/nomad/data"

# Enable the server
server {
    enabled = true
    start_join = ["", "", ""]
    retry_join = ["", "", ""]
    retry_interval = "15s"


consul.json server3

  "data_dir": "/opt/consul/data",
  "log_level": "DEBUG",
  "node_name": "master3",
  "server": true,

  "start_join" : ["", "", ""],
  "retry_join" : ["", "", ""],
  "retry_interval" : "15s"

server.hcl server3 nomad config file

# -*- mode: ruby -*-
# vi: set ft=ruby :

$script = <<SCRIPT
# Update apt and get dependencies
sudo apt-get update
sudo apt-get install -y unzip curl wget vim

# Download Nomad
echo Fetching Nomad...
cd /tmp/
curl -sSL https://releases.hashicorp.com/nomad/0.2.3/nomad_0.2.3_linux_amd64.zip -o nomad.zip

echo Installing Nomad...
unzip nomad.zip
sudo chmod +x nomad
sudo mv nomad /usr/bin/nomad
sudo mkdir -p /etc/nomad.d
sudo chmod a+w /etc/nomad.d
sudo mkdir -p /opt/nomad/data
sudo mkdir -p /var/log/nomad
sudo chmod a+w /var/log/nomad
sudo cp /vagrant/server.hcl /etc/nomad.d/

echo Fetching Consul...
curl -sSL https://releases.hashicorp.com/consul/0.6.3/consul_0.6.3_linux_amd64.zip -o consul.zip
echo Installing Consul...
unzip consul.zip
sudo chmod +x consul
sudo mv consul /usr/bin/consul
sudo mkdir -p /etc/consul.d
sudo chmod a+w /etc/consul.d
sudo mkdir -p /opt/consul/data
sudo mkdir -p /var/log/consul
sudo chmod a+w /var/log/consul
sudo cp /vagrant/consul.json /etc/consul.d/

echo Starting nomad
cd ~
sudo nohup nomad agent -config /etc/nomad.d/server.hcl &>nomad.log  &

echo Starting Consul
sudo nohup consul agent -config-file /etc/consul.d/consul.json &>consul.log  &


Vagrant.configure(2) do |config|
  config.vm.box = "base-box"
  config.vm.hostname = "nomad"
  config.vm.provision "shell", inline: $script, privileged: false
  config.vm.network "private_network", ip: ""

  # Increase memory for Parallels Desktop
  config.vm.provider "parallels" do |p, o|
    p.memory = "1024"

  # Increase memory for Virtualbox
  config.vm.provider "virtualbox" do |vb|
        vb.memory = "1024"

  # Increase memory for VMware
  ["vmware_fusion", "vmware_workstation"].each do |p|
    config.vm.provider p do |v|
      v.vmx["memsize"] = "1024"

  config.vm.provider :aws do |aws, override|
   aws.keypair_name = "my-app-key"
   aws.region = "us-west-2"
   # Ubuntu public Amazon EC2 image for Ubuntu 64 bit
   aws.ami = "ami-9abea4fb"
   override.ssh.username = "ubuntu"
   override.ssh.private_key_path = "/opt/aws/my-app-key.pem"

   aws.tags = {
     'Name' => 'my-app-cluster-server-3'

   # vpc-d14dacb5
   aws.subnet_id = "subnet-abc123ab"
   aws.security_groups = "sg-abc123ab"
   override.vm.hostname = "ip-21-10-0-12"
   #override.ssh.host = "" //NOT EXPOSED TO VPN traffic yet
   # We have to use public IP address because we don't have the VPC tied to vpn traffic
   aws.associate_public_ip = true


Worker Node

You may have noticed that the files are about all the same. The Worker Nodes are always the same. They startup and then connect consul client and nomad client to the cluster. They can handle work from the nomad servers.

consul.json for worker node

  "data_dir": "/opt/consul/data",
  "log_level": "DEBUG",
  "server": false,
  "start_join" : ["", "", ""],
  "retry_join" : ["", "", ""],
  "retry_interval" : "15s"
Note that these are just the initial servers to contact. If/when we added more servers, the instances would learn about the instances via serf.

nomad.hcl for worker node

# Increase log verbosity
log_level = "DEBUG"

# Setup data dir
data_dir = "/opt/nomad/data"

# Enable the server
client {
    enabled = true
    servers=["", "", ""]
Note that these are just the initial servers to contact. If/when we added more servers, the instances would learn about the instances via serf.

Vagrantfile for worker node

# -*- mode: ruby -*-
# vi: set ft=ruby :

$script = <<SCRIPT
# Update apt and get dependencies
sudo apt-get update
sudo apt-get install -y unzip curl wget vim

# Download Nomad
echo Fetching Nomad...
cd /tmp/
curl -sSL https://releases.hashicorp.com/nomad/0.2.3/nomad_0.2.3_linux_amd64.zip -o nomad.zip

echo Installing Nomad...
unzip nomad.zip
sudo chmod +x nomad
sudo mv nomad /usr/bin/nomad
sudo mkdir -p /etc/nomad.d
sudo chmod a+w /etc/nomad.d
sudo mkdir -p /opt/nomad/data
sudo mkdir -p /var/log/nomad
sudo chmod a+w /var/log/nomad
sudo cp /vagrant/nomad.hcl /etc/nomad.d/

echo Fetching Consul...
curl -sSL https://releases.hashicorp.com/consul/0.6.3/consul_0.6.3_linux_amd64.zip -o consul.zip
echo Installing Consul...
unzip consul.zip
sudo chmod +x consul
sudo mv consul /usr/bin/consul
sudo mkdir -p /etc/consul.d
sudo chmod a+w /etc/consul.d
sudo mkdir -p /opt/consul/data
sudo mkdir -p /var/log/consul
sudo chmod a+w /var/log/consul
sudo cp /vagrant/consul.json /etc/consul.d/

echo Starting nomad
cd ~
sudo nohup nomad agent -config /etc/nomad.d/nomad.hcl &>nomad.log  &

echo Starting Consul
export BIND_ADDRESS=`/sbin/ifconfig eth0 | grep 'inet addr:' | cut -d: -f2 | awk '{ print $1}'`
sudo nohup consul agent -bind $BIND_ADDRESS -config-file /etc/consul.d/consul.json &>consul.log  &


Vagrant.configure(2) do |config|
  config.vm.box = "base-box"
  config.vm.hostname = "nomad"
  config.vm.provision "docker" # Just install it
  config.vm.provision "shell", inline: $script, privileged: false

  # Increase memory for Parallels Desktop
  config.vm.provider "parallels" do |p, o|
    p.memory = "1024"

  # Increase memory for Virtualbox
  config.vm.provider "virtualbox" do |vb|
        vb.memory = "1024"

  # Increase memory for VMware
  ["vmware_fusion", "vmware_workstation"].each do |p|
    config.vm.provider p do |v|
      v.vmx["memsize"] = "1024"

  config.vm.provider :aws do |aws, override|
   aws.keypair_name = "my-app-key"
   aws.region = "us-west-2"
   # Ubuntu public Amazon EC2 image for Ubuntu 64 bit
   aws.ami = "ami-9abea4fb"
   override.ssh.username = "ubuntu"
   override.ssh.private_key_path = "/opt/aws/my-app-key.pem"

   aws.tags = {
     'Name' => 'my-app-cluster-worker-node'

   # vpc-d14dacb5
   aws.subnet_id = "subnet-abc123ab"
   aws.security_groups = "sg-abc123ab"
   aws.instance_type ="c3.8xlarge" # decide what is the best
   aws.associate_public_ip = true

Notice that the startup script is nearly identical except that no IP address is hardwired.

Using Vagrant to work with EC2

The tutorial for Nomad used Vagrant so we started with Vagrant to setup out cluster.

Install AWS plugin

$ vagrant plugin install vagrant-aws

Setup a base-box for EC2

$ vagrant box add base-box https://github.com/mitchellh/vagrant-aws/raw/master/dummy.box

Startup vagrant

Navigate to the directory corresponding to the role mentioned above.

Startup vagrant instance in EC2

$ vagrant up --provider=aws

SSH into vagrant box

Navigate to the directory corresponding to the role mentioned above.

ssh into vagrant instance in EC2

$ vagrant ssh

destroy instance

Navigate to the directory corresponding to the role mentioned above.

terminate/destroy vagrant instance in EC2

$ vagrant destroy

Nomad commands

Deploy a job

$ nomad run example.nomad 
==> Monitoring evaluation "9d05879b-d338-fa99-f2a1-9e8cdf45fc71"
    Evaluation triggered by job "example"
    Allocation "bccc1da9-7b0a-0a4d-17c1-345215044002" created: node "a07b38db-3be1-5a9f-d0dc-2d757991f2c4", group "cache"
    Evaluation status changed: "pending" -> "complete"
==> Evaluation "9d05879b-d338-fa99-f2a1-9e8cdf45fc71" finished with status "complete"

Checking status of a job

$  nomad status example
ID          = example
Name        = example
Type        = service
Priority    = 50
Datacenters = dc1
Status      = <none>

==> Evaluations
ID                                    Priority  TriggeredBy   Status
9d05879b-d338-fa99-f2a1-9e8cdf45fc71  50        job-register  complete

==> Allocations
ID                                    EvalID                                NodeID                                TaskGroup  Desired  Status
bccc1da9-7b0a-0a4d-17c1-345215044002  9d05879b-d338-fa99-f2a1-9e8cdf45fc71  a07b38db-3be1-5a9f-d0dc-2d757991f2c4  cache      run      running

Show active Nomad Master Servers

$ nomad client-config -servers

It should be easy to deploy from artifactory.
At this point, you can deploy to Nomad like you would Mesophere.

No comments:

Post a Comment

Kafka and Cassandra support, training for AWS EC2 Cassandra 3.0 Training