Setting up Nomad instead of Mesophere. Notes on me setting up a Nomad set of servers for development in EC2.
Server nodes for Nomad and Consul
Three EC2 Medium (triad) machines.
Each server runs
- consul server agent,
- nomad server agent
- No Docker here
Worker Nodes (aka client nodes)
Three to X client Agent Nodes (EC2 LARGE or better)
Each client agent node runs
- consul client agent
- nomad server agent
- Docker daemon
These connect to home server triads (consul and nomad).
Prod cluster
Server nodes for Nomad and Consul
Five EC2 Large (triad) machines.
Each server runs
- consul server agent,
- nomad server agent
- No Docker here
Worker Nodes (aka client nodes)
Three to X client Agent Nodes (as large as we need, at least one machine per AZ)
Each client agent node runs
- consul client agent
- nomad server agent
- Docker daemon
These connect to home server triads (consul and nomad).
Implementation details
You have four roles
server1
server2
server3
worker-node
All worker-nodes are ephemeral. They can get blown away.
The servers:
server1
, server2
, server3
form a triad cluster. Any triad member can die and be replaced. They should be started back up with the same basic ip address info.
Since we are running
consul
, nomad
, and zookeeper
on the triad, there is no advantage to using consul
for nomad
server node discovery because consul
is installed on the same triad of servers.
The server
server1
is a bit special because it has the script to connect the servers into a cluster. For the most part, server1
is identical to the others.Server 1
Server1 Vagrantfile
# -*- mode: ruby -*-
# vi: set ft=ruby :
$script = <<SCRIPT
# Update apt and get dependencies
sudo apt-get update
sudo apt-get install -y unzip curl wget vim
# Download Nomad
echo Fetching Nomad...
cd /tmp/
curl -sSL https://releases.hashicorp.com/nomad/0.2.3/nomad_0.2.3_linux_amd64.zip -o nomad.zip
echo Installing Nomad...
unzip nomad.zip
sudo chmod +x nomad
sudo mv nomad /usr/bin/nomad
sudo mkdir -p /etc/nomad.d
sudo chmod a+w /etc/nomad.d
sudo mkdir -p /opt/nomad/data
sudo mkdir -p /var/log/nomad
sudo chmod a+w /var/log/nomad
sudo cp /vagrant/server.hcl /etc/nomad.d/
echo Fetching Consul...
curl -sSL https://releases.hashicorp.com/consul/0.6.3/consul_0.6.3_linux_amd64.zip -o consul.zip
echo Installing Consul...
unzip consul.zip
sudo chmod +x consul
sudo mv consul /usr/bin/consul
sudo mkdir -p /etc/consul.d
sudo chmod a+w /etc/consul.d
sudo mkdir -p /opt/consul/data
sudo mkdir -p /var/log/consul
sudo chmod a+w /var/log/consul
sudo cp /vagrant/consul.json /etc/consul.d/
echo Starting nomad
cd ~
sudo nohup nomad agent -config /etc/nomad.d/server.hcl &>nomad.log &
echo Starting Consul
sudo nohup consul agent -config-file /etc/consul.d/consul.json &>consul.log &
SCRIPT
Vagrant.configure(2) do |config|
config.vm.box = "base-box"
config.vm.hostname = "nomad"
config.vm.provision "shell", inline: $script, privileged: false
config.vm.network "private_network", ip: "10.21.0.10"
# Increase memory for Parallels Desktop
config.vm.provider "parallels" do |p, o|
p.memory = "1024"
end
# Increase memory for Virtualbox
config.vm.provider "virtualbox" do |vb|
vb.memory = "1024"
end
# Increase memory for VMware
["vmware_fusion", "vmware_workstation"].each do |p|
config.vm.provider p do |v|
v.vmx["memsize"] = "1024"
end
end
config.vm.provider :aws do |aws, override|
aws.keypair_name = "my-app-key"
aws.region = "us-west-2"
# Ubuntu public Amazon EC2 image for Ubuntu 64 bit
aws.ami = "ami-9abea4fb"
override.ssh.username = "ubuntu"
override.ssh.private_key_path = "/opt/aws/my-app-key.pem"
aws.tags = {
'Name' => 'my-app-cluster-server-1'
}
# vpc-d14dacb5
aws.subnet_id = "subnet-abc123ab"
aws.security_groups = "sg-abc123ab"
aws.private_ip_address="10.21.0.10"
override.vm.hostname = "ip-21-10-0-10"
# override.ssh.host = "10.20.0.10" //NOT EXPOSED TO VPN traffic yet
# We have to use public IP address because we don't have the VPC tied to vpn traffic
aws.associate_public_ip = true
end
end
server.hcl
bind_addr = "10.21.0.10"
advertise {
# We need to specify our host's IP because we can't
# advertise 0.0.0.0 to other nodes in our cluster.
rpc = "10.21.0.10:4647"
}
# Increase log verbosity
log_level = "DEBUG"
# Setup data dir
data_dir = "/opt/nomad/data"
# Enable the server
server {
enabled = true
start_join = ["10.21.0.11", "10.21.0.10", "10.21.0.12"]
retry_join = ["10.21.0.11", "10.21.0.10", "10.21.0.12"]
retry_interval = "15s"
}
server-bootstrap.hcl
bind_addr = "10.21.0.10"
advertise {
# We need to specify our host's IP because we can't
# advertise 0.0.0.0 to other nodes in our cluster.
rpc = "10.21.0.10:4647"
}
# Increase log verbosity
log_level = "DEBUG"
# Setup data dir
data_dir = "/opt/nomad/data"
# Enable the server
server {
enabled = true
# Self-elect, should be 3 or 5 for production
bootstrap_expect = 3
}
consul.json server1
{
"data_dir": "/opt/consul/data",
"log_level": "DEBUG",
"node_name": "master1",
"server": true,
"start_join" : ["10.21.0.11", "10.21.0.10", "10.21.0.12"],
"retry_join" : ["10.21.0.11", "10.21.0.10", "10.21.0.12"],
"retry_interval" : "15s"
}
consul-bootstrap.json
{
"data_dir": "/opt/consul/data",
"log_level": "DEBUG",
"node_name": "master1",
"server": true,
"bootstrap_expect" : 3
}
connect-cluster.sh
sudo pkill consul
sudo pkill nomad
sleep 5s
sudo rm -rf /opt/consul/data/*
sudo rm -rf /opt/nomad/data/*
sleep 5s
sudo nohup consul agent -config-file /vagrant/consul-bootstrap.json &>consul.log &
sudo nohup nomad agent -config /vagrant/server-bootstrap.hcl &>nomad.log &
sleep 5s
sudo nomad server-join -address=http://10.21.0.10:4646 10.21.0.11
sudo nomad server-join -address=http://10.21.0.10:4646 10.21.0.12
sudo nomad server-members -address=http://10.21.0.10:4646
Server 2 and 3 are simpler
Server 2
Vagrantfile server2
# -*- mode: ruby -*-
# vi: set ft=ruby :
$script = <<SCRIPT
# Update apt and get dependencies
sudo apt-get update
sudo apt-get install -y unzip curl wget vim
# Download Nomad
echo Fetching Nomad...
cd /tmp/
curl -sSL https://releases.hashicorp.com/nomad/0.2.3/nomad_0.2.3_linux_amd64.zip -o nomad.zip
echo Installing Nomad...
unzip nomad.zip
sudo chmod +x nomad
sudo mv nomad /usr/bin/nomad
sudo mkdir -p /etc/nomad.d
sudo chmod a+w /etc/nomad.d
sudo mkdir -p /opt/nomad/data
sudo mkdir -p /var/log/nomad
sudo chmod a+w /var/log/nomad
sudo cp /vagrant/server.hcl /etc/nomad.d/
echo Fetching Consul...
curl -sSL https://releases.hashicorp.com/consul/0.6.3/consul_0.6.3_linux_amd64.zip -o consul.zip
echo Installing Consul...
unzip consul.zip
sudo chmod +x consul
sudo mv consul /usr/bin/consul
sudo mkdir -p /etc/consul.d
sudo chmod a+w /etc/consul.d
sudo mkdir -p /opt/consul/data
sudo mkdir -p /var/log/consul
sudo chmod a+w /var/log/consul
sudo cp /vagrant/consul.json /etc/consul.d/
echo Starting nomad
cd ~
sudo nohup nomad agent -config /etc/nomad.d/server.hcl &>nomad.log &
echo Starting Consul
sudo nohup consul agent -config-file /etc/consul.d/consul.json &>consul.log &
SCRIPT
Vagrant.configure(2) do |config|
config.vm.box = "base-box"
config.vm.hostname = "nomad"
config.vm.provision "shell", inline: $script, privileged: false
config.vm.network "private_network", ip: "10.21.0.11"
# Increase memory for Parallels Desktop
config.vm.provider "parallels" do |p, o|
p.memory = "1024"
end
# Increase memory for Virtualbox
config.vm.provider "virtualbox" do |vb|
vb.memory = "1024"
end
# Increase memory for VMware
["vmware_fusion", "vmware_workstation"].each do |p|
config.vm.provider p do |v|
v.vmx["memsize"] = "1024"
end
end
config.vm.provider :aws do |aws, override|
aws.keypair_name = "my-app-key"
aws.region = "us-west-2"
# Ubuntu public Amazon EC2 image for Ubuntu 64 bit
aws.ami = "ami-9abea4fb"
override.ssh.username = "ubuntu"
override.ssh.private_key_path = "/opt/aws/my-app-key.pem"
aws.tags = {
'Name' => 'my-app-cluster-server-2'
}
# vpc-d14dacb5
aws.subnet_id = "subnet-abc123ab"
aws.security_groups = "sg-abc123ab"
aws.private_ip_address="10.21.0.11"
override.vm.hostname = "ip-21-10-0-11"
#override.ssh.host = "10.20.0.10" //NOT EXPOSED TO VPN traffic yet
# We have to use public IP address because we don't have the VPC tied to vpn traffic
aws.associate_public_ip = true
end
end
consul.json server2
{
"data_dir": "/opt/consul/data",
"log_level": "DEBUG",
"node_name": "master2",
"server": true,
"start_join" : ["10.21.0.11", "10.21.0.10", "10.21.0.12"],
"retry_join" : ["10.21.0.11", "10.21.0.10", "10.21.0.12"],
"retry_interval" : "15s"
}
server.hcl server2 nomad config file
bind_addr = "10.21.0.11"
advertise {
# We need to specify our host's IP because we can't
# advertise 0.0.0.0 to other nodes in our cluster.
rpc = "10.21.0.11:4647"
}
# Increase log verbosity
log_level = "DEBUG"
# Setup data dir
data_dir = "/opt/nomad/data"
# Enable the server
server {
enabled = true
start_join = ["10.21.0.11", "10.21.0.10", "10.21.0.12"]
retry_join = ["10.21.0.11", "10.21.0.10", "10.21.0.12"]
retry_interval = "15s"
}
Server 3
Vagrantfile server3
bind_addr = "10.21.0.12"
advertise {
# We need to specify our host's IP because we can't
# advertise 0.0.0.0 to other nodes in our cluster.
rpc = "10.21.0.12:4647"
}
# Increase log verbosity
log_level = "DEBUG"
# Setup data dir
data_dir = "/opt/nomad/data"
# Enable the server
server {
enabled = true
start_join = ["10.21.0.11", "10.21.0.10", "10.21.0.12"]
retry_join = ["10.21.0.11", "10.21.0.10", "10.21.0.12"]
retry_interval = "15s"
}
consul.json server3
{
"data_dir": "/opt/consul/data",
"log_level": "DEBUG",
"node_name": "master3",
"server": true,
"start_join" : ["10.21.0.11", "10.21.0.10", "10.21.0.12"],
"retry_join" : ["10.21.0.11", "10.21.0.10", "10.21.0.12"],
"retry_interval" : "15s"
}
server.hcl server3 nomad config file
# -*- mode: ruby -*-
# vi: set ft=ruby :
$script = <<SCRIPT
# Update apt and get dependencies
sudo apt-get update
sudo apt-get install -y unzip curl wget vim
# Download Nomad
echo Fetching Nomad...
cd /tmp/
curl -sSL https://releases.hashicorp.com/nomad/0.2.3/nomad_0.2.3_linux_amd64.zip -o nomad.zip
echo Installing Nomad...
unzip nomad.zip
sudo chmod +x nomad
sudo mv nomad /usr/bin/nomad
sudo mkdir -p /etc/nomad.d
sudo chmod a+w /etc/nomad.d
sudo mkdir -p /opt/nomad/data
sudo mkdir -p /var/log/nomad
sudo chmod a+w /var/log/nomad
sudo cp /vagrant/server.hcl /etc/nomad.d/
echo Fetching Consul...
curl -sSL https://releases.hashicorp.com/consul/0.6.3/consul_0.6.3_linux_amd64.zip -o consul.zip
echo Installing Consul...
unzip consul.zip
sudo chmod +x consul
sudo mv consul /usr/bin/consul
sudo mkdir -p /etc/consul.d
sudo chmod a+w /etc/consul.d
sudo mkdir -p /opt/consul/data
sudo mkdir -p /var/log/consul
sudo chmod a+w /var/log/consul
sudo cp /vagrant/consul.json /etc/consul.d/
echo Starting nomad
cd ~
sudo nohup nomad agent -config /etc/nomad.d/server.hcl &>nomad.log &
echo Starting Consul
sudo nohup consul agent -config-file /etc/consul.d/consul.json &>consul.log &
SCRIPT
Vagrant.configure(2) do |config|
config.vm.box = "base-box"
config.vm.hostname = "nomad"
config.vm.provision "shell", inline: $script, privileged: false
config.vm.network "private_network", ip: "10.21.0.12"
# Increase memory for Parallels Desktop
config.vm.provider "parallels" do |p, o|
p.memory = "1024"
end
# Increase memory for Virtualbox
config.vm.provider "virtualbox" do |vb|
vb.memory = "1024"
end
# Increase memory for VMware
["vmware_fusion", "vmware_workstation"].each do |p|
config.vm.provider p do |v|
v.vmx["memsize"] = "1024"
end
end
config.vm.provider :aws do |aws, override|
aws.keypair_name = "my-app-key"
aws.region = "us-west-2"
# Ubuntu public Amazon EC2 image for Ubuntu 64 bit
aws.ami = "ami-9abea4fb"
override.ssh.username = "ubuntu"
override.ssh.private_key_path = "/opt/aws/my-app-key.pem"
aws.tags = {
'Name' => 'my-app-cluster-server-3'
}
# vpc-d14dacb5
aws.subnet_id = "subnet-abc123ab"
aws.security_groups = "sg-abc123ab"
aws.private_ip_address="10.21.0.12"
override.vm.hostname = "ip-21-10-0-12"
#override.ssh.host = "10.20.0.10" //NOT EXPOSED TO VPN traffic yet
# We have to use public IP address because we don't have the VPC tied to vpn traffic
aws.associate_public_ip = true
end
end
Worker Node
You may have noticed that the files are about all the same. The
Worker Node
s are always the same. They startup and then connect consul
client and nomad
client to the cluster. They can handle work from the nomad
servers.consul.json for worker node
{
"data_dir": "/opt/consul/data",
"log_level": "DEBUG",
"server": false,
"start_join" : ["10.21.0.11", "10.21.0.10", "10.21.0.12"],
"retry_join" : ["10.21.0.11", "10.21.0.10", "10.21.0.12"],
"retry_interval" : "15s"
}
Note that these are just the initial servers to contact. If/when we added more servers, the instances would learn about the instances via serf.
nomad.hcl for worker node
# Increase log verbosity
log_level = "DEBUG"
# Setup data dir
data_dir = "/opt/nomad/data"
# Enable the server
client {
enabled = true
servers=["10.21.0.10:4647", "10.21.0.11:4647", "10.21.0.12:4647"]
}
Note that these are just the initial servers to contact. If/when we added more servers, the instances would learn about the instances via serf.
Vagrantfile for worker node
# -*- mode: ruby -*-
# vi: set ft=ruby :
$script = <<SCRIPT
# Update apt and get dependencies
sudo apt-get update
sudo apt-get install -y unzip curl wget vim
# Download Nomad
echo Fetching Nomad...
cd /tmp/
curl -sSL https://releases.hashicorp.com/nomad/0.2.3/nomad_0.2.3_linux_amd64.zip -o nomad.zip
echo Installing Nomad...
unzip nomad.zip
sudo chmod +x nomad
sudo mv nomad /usr/bin/nomad
sudo mkdir -p /etc/nomad.d
sudo chmod a+w /etc/nomad.d
sudo mkdir -p /opt/nomad/data
sudo mkdir -p /var/log/nomad
sudo chmod a+w /var/log/nomad
sudo cp /vagrant/nomad.hcl /etc/nomad.d/
echo Fetching Consul...
curl -sSL https://releases.hashicorp.com/consul/0.6.3/consul_0.6.3_linux_amd64.zip -o consul.zip
echo Installing Consul...
unzip consul.zip
sudo chmod +x consul
sudo mv consul /usr/bin/consul
sudo mkdir -p /etc/consul.d
sudo chmod a+w /etc/consul.d
sudo mkdir -p /opt/consul/data
sudo mkdir -p /var/log/consul
sudo chmod a+w /var/log/consul
sudo cp /vagrant/consul.json /etc/consul.d/
echo Starting nomad
cd ~
sudo nohup nomad agent -config /etc/nomad.d/nomad.hcl &>nomad.log &
echo Starting Consul
export BIND_ADDRESS=`/sbin/ifconfig eth0 | grep 'inet addr:' | cut -d: -f2 | awk '{ print $1}'`
sudo nohup consul agent -bind $BIND_ADDRESS -config-file /etc/consul.d/consul.json &>consul.log &
SCRIPT
Vagrant.configure(2) do |config|
config.vm.box = "base-box"
config.vm.hostname = "nomad"
config.vm.provision "docker" # Just install it
config.vm.provision "shell", inline: $script, privileged: false
# Increase memory for Parallels Desktop
config.vm.provider "parallels" do |p, o|
p.memory = "1024"
end
# Increase memory for Virtualbox
config.vm.provider "virtualbox" do |vb|
vb.memory = "1024"
end
# Increase memory for VMware
["vmware_fusion", "vmware_workstation"].each do |p|
config.vm.provider p do |v|
v.vmx["memsize"] = "1024"
end
end
config.vm.provider :aws do |aws, override|
aws.keypair_name = "my-app-key"
aws.region = "us-west-2"
# Ubuntu public Amazon EC2 image for Ubuntu 64 bit
aws.ami = "ami-9abea4fb"
override.ssh.username = "ubuntu"
override.ssh.private_key_path = "/opt/aws/my-app-key.pem"
aws.tags = {
'Name' => 'my-app-cluster-worker-node'
}
# vpc-d14dacb5
aws.subnet_id = "subnet-abc123ab"
aws.security_groups = "sg-abc123ab"
aws.instance_type ="c3.8xlarge" # decide what is the best
aws.associate_public_ip = true
end
end
Notice that the startup script is nearly identical except that no IP address is hardwired.
Using Vagrant to work with EC2
The tutorial for Nomad used Vagrant so we started with Vagrant to setup out cluster.
Install AWS plugin
$ vagrant plugin install vagrant-aws
Setup a base-box for EC2
$ vagrant box add base-box https://github.com/mitchellh/vagrant-aws/raw/master/dummy.box
Startup vagrant
Navigate to the directory corresponding to the role mentioned above.
Startup vagrant instance in EC2
$ vagrant up --provider=aws
SSH into vagrant box
Navigate to the directory corresponding to the role mentioned above.
ssh into vagrant instance in EC2
$ vagrant ssh
destroy instance
Navigate to the directory corresponding to the role mentioned above.
terminate/destroy vagrant instance in EC2
$ vagrant destroy
Nomad commands
Deploy a job
$ nomad run example.nomad
==> Monitoring evaluation "9d05879b-d338-fa99-f2a1-9e8cdf45fc71"
Evaluation triggered by job "example"
Allocation "bccc1da9-7b0a-0a4d-17c1-345215044002" created: node "a07b38db-3be1-5a9f-d0dc-2d757991f2c4", group "cache"
Evaluation status changed: "pending" -> "complete"
==> Evaluation "9d05879b-d338-fa99-f2a1-9e8cdf45fc71" finished with status "complete"
Checking status of a job
$ nomad status example
ID = example
Name = example
Type = service
Priority = 50
Datacenters = dc1
Status = <none>
==> Evaluations
ID Priority TriggeredBy Status
9d05879b-d338-fa99-f2a1-9e8cdf45fc71 50 job-register complete
==> Allocations
ID EvalID NodeID TaskGroup Desired Status
bccc1da9-7b0a-0a4d-17c1-345215044002 9d05879b-d338-fa99-f2a1-9e8cdf45fc71 a07b38db-3be1-5a9f-d0dc-2d757991f2c4 cache run running
Show active Nomad Master Servers
$ nomad client-config -servers
10.21.0.11:4647
10.21.0.12:4647
10.21.0.10:4647
It should be easy to deploy from artifactory.
At this point, you can deploy to Nomad like you would Mesophere.
No comments:
Post a Comment