Automating the test of your infrastructure code

Infrastructure test automation

Quality must be everyone’s role. All the teams must be aware of this responsibility. Assure the quality of every code we produce is not a kind of task we should delegate. We must take the ownership of our work and deliver it with quality.

The infrastructure test automation, asides any application test automation, is important in the process of delivering code. Every change you make in your Ansible playbook, or any file of your infrastructure project, must be followed by the test of the entire project.

The tests can be done either manually or automatically. The advantage of automating the test is obviously save time and make it reproducible at any time. Although you have to invest some time in developing the automation, you get rid of manually repeating the tests. With automation, it becomes a simple matter of a click of a button.

The test script

You can use a tool like Molecule to test your Ansible playbooks, or simply use shell scripts. The test.sh file below is and example of the use of shell script to automate an entire Ansible project. More about the project you can find in the article How to deal with the same configuration file with different content in different environments. You can also clone the Codeyourinfra repository and take a look at the same_cfgfile_diff_content directory.

#!/bin/sh
tmpfile=$(mktemp)

teardown()
{
	vagrant destroy -f
	rm -rf .vagrant/ *.retry "$tmpfile"
}

. ../common/test-library.sh

# turn on the environment
vagrant up

# check the solution playbook syntax
checkPlaybookSyntax playbook.yml hosts

# execute the solution
ansible-playbook playbook.yml -i hosts | tee ${tmpfile}
assertEquals 3 $(tail -5 ${tmpfile} | grep -c "failed=0")

# validate the solution
ansible qa -i hosts -m shell -a "cat /etc/conf" | tee ${tmpfile}
assertEquals "prop1=Aprop2=B" $(awk '/qa1/ {for(i=1; i<=2; i++) {getline; printf}}' ${tmpfile})
assertEquals "prop1=Cprop2=D" $(awk '/qa2/ {for(i=1; i<=2; i++) {getline; printf}}' ${tmpfile})
assertEquals "prop1=Eprop2=F" $(awk '/qa3/ {for(i=1; i<=2; i++) {getline; printf}}' ${tmpfile})

# turn off the environment and exit
teardown
exit 0

The script is quite simple. It basically turns on the required environment for testing, do the tests and turn off the environment. If everything goes as expected, the script exits with the code 0. Otherwise, the exit code is 1. (Here is a great article about exit codes)

The environment for testing is managed by Vagrant. The command up turns the environment on, while the command destroy turns it down. Vagrant can manage both local virtual machines and AWS’ EC2 instances. When the test is done in the cloud, there’s an additional step of gathering the IP addresses from AWS. Ansible requires these IPv4 addresses in order to connect with the remote hosts through SSH. If you want more details, please take a look at the previous article Bringing the Ansible development to the cloud.

Notice that the environment is turned off and all the auxiliary files are removed in the teardown function. Other functions are also used within the script, loaded from the test-library.sh file. They are as follows:

  • checkPlaybookSyntax – uses the –check-syntax option of the ansible-playbook command in order to validate the playbook YML file;
  • assertEquals – compares an expected value with the actual one in order to validate what was supposed to happen;
  • assertFileExists – checks if a required file exists.

The script also creates a temporary file. In to the temporary file the command tee writes the output of the command ansible-playbook executions. Right after each execution, some assertions are made, in order to check if everything has just gone fine. The example below shows the output of the playbook.yml execution.

ansible-playbook playbook.yml -i hosts

PLAY [qa] **************************************************************************************************************************************************************************************************

TASK [Gathering Facts] *************************************************************************************************************************************************************************************
ok: [qa1]
ok: [qa2]
ok: [qa3]

TASK [Set the configuration file content] ******************************************************************************************************************************************************************
changed: [qa1] => (item={'key': u'prop1', 'value': u'A'})
changed: [qa3] => (item={'key': u'prop1', 'value': u'E'})
changed: [qa2] => (item={'key': u'prop1', 'value': u'C'})
changed: [qa1] => (item={'key': u'prop2', 'value': u'B'})
changed: [qa2] => (item={'key': u'prop2', 'value': u'D'})
changed: [qa3] => (item={'key': u'prop2', 'value': u'F'})

PLAY RECAP *************************************************************************************************************************************************************************************************
qa1                        : ok=2    changed=1    unreachable=0    failed=0   
qa2                        : ok=2    changed=1    unreachable=0    failed=0   
qa3                        : ok=2    changed=1    unreachable=0    failed=0

The command tail gets the last five (-5) lines of the temporary file, and the command grep counts (-c) how many lines have “failed=0”. Ansible outputs the result at the end, and it’s expected success (failed=0) in the performing of the tasks in all of the three target hosts (3).

In a single execution, Ansible is able to do tasks in multiple hosts. The Ansible ad-hoc command bellow executes the command cat /etc/conf in each of the hosts that belong to the test environment (q1, q2 and q3). The goal is validate the prior playbook’s execution. The content of the configuration file of each host must be as defined in the config.json file.

ansible qa -i hosts -m shell -a "cat /etc/conf"

qa2 | SUCCESS | rc=0 >>
prop1=C
prop2=D

qa3 | SUCCESS | rc=0 >>
prop1=E
prop2=F

qa1 | SUCCESS | rc=0 >>
prop1=A
prop2=B

The command awk finds a specific line by a pattern (/hostname/) and gets the two lines below in a single line. This way is possible compare the configuration file content obtained from each host with the expected content.

Conclusion

Every Codeyourinfra project’s solution has its own automated tests. You can check it out by navigating through the repository directories. The test.sh file of each folder does the job, including those which are in the aws subdirectories. In this case, the test environment is turned on in an AWS region of your choice.

Shell scripting is just an example of how you can implement your infrastructure test automation. You can use Docker containers instead of virtual machines managed by Vagrant, too. The important is having a consistent and reproducible way to guarantee the quality of your infrastructure code.

The next step is create an integration continuous process for developing your infrastructure. But it’s the subject of the next article. Stay tuned!

Before I forget, I must reinforce it: the purpose of the Codeyourinfra project is help you. So, don’t hesitate to tell the problems you face as a sysadmin.

Bringing the Ansible development to the cloud

How to use Vagrant to smoothly bring the Ansible development environment to the cloud

It’s very important that you, as a sysadmin, have your own environment, where you can develop and test Ansible playbooks. Like any dev guy’s environment, your environment must be of your total control, because you will certainly need to recreate it from the scratch many times. The environment must be not shared as well, therefore with no risk of being on an unexpected state, after someone intervention.

Vagrant is an excellent tool to manage the Ansible development environment. Its default integration with VirtualBox, amongst other hypervisors, allows you to have virtual machines in your own host. Through its simple command-line interface, you are able to create and destroy them, over and over, at your will.

Vagrant uses specific images, known as boxes. Most of them you can find in Vagrant Cloud. There are Vagrant boxes for several providers, like VirtualBox. In addition, there are boxes of all sort of Linux distributions, as well as with other open source software installed. You too can provision your local virtual machine with software and configuration, package it as a Vagrant box and share it in Vagrant Cloud, as explained in the article Choosing between baked and fried provisioning.

Besides handling your local virtual machines, Vagrant can also manage EC2 instances in AWS. If you have hardware limitations, why not benefit from the cloud? If your environment demands more resources than the available on your host, it’s a great idea to bring the Ansible development environment to AWS. Don’t forget that AWS charge you, if you are not eligible to the AWS Free Tier or if you have exceeded your free tier usage limit.

AWS Setup

In order to bring the Ansible development environment to AWS, you must follow some steps:

  1. First of all, you must have an AWS account;
  2. Then, you must create an user with full access to the EC2 resources (eg: AmazonEC2FullAccess permission) through the IAM (Identity and Access Management) console;
  3. After that, you must create an access key in the Security credential tab. Set the local environment variables AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY with the respective values from the access key id and the secret access key, generated during the access key creation;
  4. Alternatively, install the AWS CLI (Command Line Interface) tool and configure your local environment through the command aws configure. This command prompts for information about the access key and the default profile’s region, storing them inside the .aws directory of your home folder.

AWS Region Setup

On AWS, you can create EC2 instances in different regions. You must choose one specific region to create your EC2 instances at a time. It can be done by defining the AWS_REGION or the EC2_REGION environment variables, or through the cited aws configure command. The environment variables have precedence, when both configuration are made.

Before creating the EC2 instances, you must create a security group and key pairs in the chosen region. You can do it:

  • manually, through the EC2 console;
  • through the command line, by using the AWS CLI tool;
  • programmatically, by using the Amazon EC2 API;
  • automatically, by using Ansible AWS modules!

The playbook-aws-region-configuration.yml file below is a good example of using Ansible to automate the configuration of a specific AWS region. The playbook is responsible for creating the required resources and gathering information from the AWS region for later use by Vagrant. If you want to run the Codeyourinfra project’s solutions on AWS, you must execute the playbook previously, for your chosen AWS region.

---
- hosts: localhost
  connection: local
  gather_facts: false
  vars_prompt:
    - name: "aws_region"
      prompt: "AWS Region"
      default: "sa-east-1"
      private: no
  tasks:
  - name: Create the AWS directory if it doesn't exist
    file:
      path: '{{aws_region}}'
      state: directory
  - name: Get the VPCs
    ec2_vpc_net_facts:
      region: '{{aws_region}}'
    register: ec2_vpc_net_facts_results
  - name: Create the Vagrant security group
    ec2_group:
      name: vagrant
      description: Security Group for EC2 instances managed by Vagrant
      region: '{{aws_region}}'
      vpc_id: '{{default_vpc.id}}'
      rules:
        - proto: tcp
          ports:
            - 22
            - 80
            - 3000
            - 8080
            - 8086
          cidr_ip: 0.0.0.0/0
    vars:
      default_vpc: '{{(ec2_vpc_net_facts_results|json_query("vpcs[?is_default]"))[0]}}'
    register: ec2_group_result
  - name: Store the security group's data
    copy:
      content: '{{ec2_group_result|to_nice_json}}'
      dest: '{{aws_region}}/security-group.json'
  - name: Get the default VPC's subnets
    ec2_vpc_subnet_facts:
      region: '{{aws_region}}'
      filters:
        vpc-id: '{{ec2_group_result.vpc_id}}'
    register: ec2_vpc_subnet_facts_results
  - name: Store the VPC subnet's data
    copy:
      content: '{{(ec2_vpc_subnet_facts_results.subnets|sort(attribute="availability_zone"))[0]|to_nice_json}}'
      dest: '{{aws_region}}/subnet.json'
  - name: Create the key pairs
    ec2_key:
      name: codeyourinfra-aws-key
      region: '{{aws_region}}'
    register: ec2_key_result
  - name: Store the private key
    copy:
      content: '{{ec2_key_result.key.private_key}}'
      dest: '{{aws_region}}/codeyourinfra-aws-key.pem'
      mode: 0400
    when: ec2_key_result.key.private_key is defined
  - name: Find Ubuntu Server 14.04 LTS AMIs
    ec2_ami_find:
      name: 'ubuntu/images/hvm-ssd/ubuntu-trusty-14.04-amd64-server-*'
      region: '{{aws_region}}'
      owner: 099720109477
      sort: name
      sort_order: descending
      sort_end: 1
    register: ec2_ami_find_result
  - name: Store the Ubuntu AMI's data
    copy:
      content: '{{ec2_ami_find_result.results[0]|to_nice_json}}'
      dest: '{{aws_region}}/ubuntu-ami.json'
    when: ec2_ami_find_result.results[0] is defined

Behind the scenes the used Ansible modules interact with the Amazon EC2 API. Here are some details about the playbook:

  • the ec2_vpc_net_facts module is used to get the default VPC (Virtual Private Cloud) of the chosen region;
  • the ec2_group module is used to create the required security group, whose data is stored in the security-group.json file, for later use by Vagrant;
  • the ec2_vpc_subnet_facts module is used to select a subnet, and store its data in the subnet.json file, for later use by Vagrant;
  • the ec2_key module is used to create the required key pairs, and store the private key in the codeyourinfra-aws-key.pem file, for later use by Vagrant;
  • the ec2_ami_find module is used to select an Ubuntu AMI (Amazon Machine Image), and store its data in the ubuntu-ami.json file, for later use by Vagrant.

If you haven’t cloned yet the Codeyourinfra project’s repository, do it right now 🙂 You will find the playbook-aws-region-configuration.yml file inside the cloud/aws directory. Go to the folder and run the following command, informing your AWS region of preference when prompted:

ansible-playbook playbook-aws-region-configuration.yml

Vagrant up

In order to make Vagrant manage EC2 instances, you must install the AWS plugin. Execute the command vagrant plugin install vagrant-aws. You can find details about the plugin on its Github repository.

Every Codeyourinfra project’s solution has an aws subdirectory, where is placed the specific Vagrantfile for managing EC2 instances.  One example is the Vagrantfile below, that creates on AWS the Ansible development environment of the solution explained in the article How to unarchive different files in different servers in just one shot.

Notice that the Vagrantfile handles the environment variables introduced in the Codeyourinfra project’s release 1.4. The APPEND_TIMESTAMP and the PROVISIONING_OPTION environment variables are explained in detail by the blog post Choosing between fried and baked provisioning.

If you initialize the EC2 environment with the baked provisioning option, the São Paulo region (sa-east-1) will be used, because it is the region where the repo server AMI (ami-b86627d4) is available. Otherwise, the AWS region where the EC2 instances will be created is taken from either the AWS_REGION or the EC2_REGION environment variable.

# -*- mode: ruby -*-
# vi: set ft=ruby :

load File.join("..", "..", "common", "timestamp-appender.rb")

provisioning_option = ENV['PROVISIONING_OPTION'] || "fried"
if provisioning_option != "baked" && provisioning_option != "fried"
  puts 'PROVISIONING_OPTION must be \'baked\' or \'fried\'.'
  abort
end

if provisioning_option == "baked"
  aws_region = "sa-east-1"
else
  aws_region = ENV['AWS_REGION'] || ENV['EC2_REGION'] || "sa-east-1"
end
relative_path = File.join("..", "..", "cloud", "aws", aws_region)
security_group  = JSON.parse(File.read(File.join(relative_path, "security-group.json")))
subnet = JSON.parse(File.read(File.join(relative_path, "subnet.json")))
ubuntu_ami = JSON.parse(File.read(File.join(relative_path, "ubuntu-ami.json")))
ec2_instances = JSON.parse('[{"name": "repo", "role": "repository"}, {"name": "server1", "role": "server"}, {"name": "server2", "role": "server"}]')

Vagrant.configure("2") do |config|
  config.vm.box = "dummy"
  config.vm.box_url = "https://github.com/mitchellh/vagrant-aws/raw/master/dummy.box"
  config.vm.synced_folder ".", "/vagrant", disabled: true

  ec2_instances.each do |ec2_instance|
    config.vm.define ec2_instance["name"] do |ec2_config|
      ec2_config.vm.provider "aws" do |aws, override|
        aws.region = aws_region
        if ec2_instance["name"] == "repo" && provisioning_option == "baked"
          aws.ami = "ami-b86627d4"
        else
          aws.ami = ubuntu_ami['ami_id']
        end
        aws.instance_type = "t2.micro"
        aws.keypair_name = "codeyourinfra-aws-key"
        aws.security_groups = security_group['group_id']
        aws.subnet_id = subnet['id']
        aws.tags = {"Name" => ec2_instance["name"], "Role" => ec2_instance["role"], "Solution" => "unarchive_from_url_param"}
        override.ssh.username = "ubuntu"
        override.ssh.private_key_path = File.join(relative_path, "codeyourinfra-aws-key.pem")
        override.nfs.functional = false
      end
      if ec2_instance["name"] == "repo" && provisioning_option == "fried"
        ec2_config.vm.provision "ansible" do |ansible|
          ansible.playbook = File.join("..", "playbook-repo.yml")
        end
      end
    end
  end
end

Besides the region, other required AWS provider-specific configuration options are defined:

  • ami – the AMI id to boot. If you initialize the environment with the baked provisioning option, the AMI is the one previously prepared, as mentioned.  (If you have installed the AWS CLI tool and would like to know the AMIs provided by the Codeyourinfra project, just execute the command aws ec2 describe-images –owners 334305766942Otherwise, the AMI is the one selected during the AWS Region Setup phase, obtained from the ubuntu-ami.json file.
  • instance_type – AWS provides a wide range of EC2 instance types, for different use cases. For our testing purposes, the T2 instances are more than sufficient. Besides that, the t2.micro instance type is eligible to the AWS Free Tier.
  • keypair_name – the name of keypair created during the AWS Region Setup phase, when the playbook-aws-region-configuration.yml was executed. The path of the stored private key file (codeyourinfra-aws-key.pem) is then configured by overriding the default Vagrant ssh.private_key_path configuration.
  • security_groups – the id of the security group created during the AWS Region Setup phase, obtained from the security-group.json file.  The security group was created exclusively for EC2 instances managed by Vagrant.
  • subnet_id – the id of the subnet selected during the AWS Region Setup phase, obtained from the subnet.json file. The subnet was selected from the default VPC’s subnets list, ordered by availability zone.
  • tags – a hash of tags to set on the EC2 instance. The tags are very useful for later EC2 instances identification.

Now that you have checked out the Vagrantfile which is in the unarchive_from_url_param/aws directory of the Codeyourinfra project’s repository, stay there and run the command below in order to see the magic happens!

vagrant up

Ansible inventory

The Ansible inventory file is where you group your machines. Ansible needs the information placed there in order to connect to the hosts through SSH. It enables the agentless characteristic of the tool and makes possible a task be executed in several servers in a single execution.

When you use AWS, each time you create an EC2 instance, it gets a different IP address. Differently than when you create local virtual machines, you are not able to define the IP address in the Vagrantfile. The Vagrant AWS plugin highlights in the output: “Warning! The AWS provider doesn’t support any of the Vagrant high-level network configurations (‘config.vm.network‘). They will be silent ignored.”

For that reason, inside the aws directory below every Codeyourinfra project folder in the repository, you will find two more files: playbook-ec2-instances-inventory.yml and ec2_hosts_template. The Ansible playbook is responsible for discovering the IP addresses of the just created EC2 instances and, based on the template, generating the ec2_hosts file.

You must execute the playbook right after the EC2 instances have been created. Just run:

ansible-playbook playbook-ec2-instances-inventory.yml

Once generated the ec2_hosts file, you can use it as the inventory option (-i) of either ansible or ansible-playbook commands. For example, run the following Ansible ad-hoc command:

ansible ec2_instances -m ping -i ec2_hosts

It will use the ping module to test the connection with all of the EC2 instances up and running.

Ansible development

Finally, you have your Ansible development environment on AWS. You can edit any playbook and test them against the created EC2 instances. If something goes wrong, you can recreate the entire environment from the scratch. You have the autonomy to do what you want, because the environment is yours.

The next step is automate the test of your infrastructure code. But it’s the subject of the next article. Stay tuned!

Before I forget, I must reinforce it: the purpose of the Codeyourinfra project is help you. So, don’t hesitate to tell the problems you face as a sysadmin.

Choosing between baked and fried provisioning

Eggs to be baked or fried, like provisioning

Provisioning always requires resources from somewhere. The resources are packages in remote repositories, compressed files from Internet addresses, they have all sizes and formats. Depending on where they are and the available bandwidth, the download process can last more than expected. If provisioning is a repetitive task, like in automated tests, you might want to use baked images, in order to save time.

Baked images

Baked images are previously prepared with software and configuration. For this reason, they are usually bigger than the ones used in fried provisioning.  In order to maintain a baked images repository, storage is really a point of consideration, mainly if the images are versioned. Downloading and uploading baked images has also its cost, so it’s better minimizing it as much as possible.

Analogously to baked eggs, baked images are ready to be consumed, there’s no need of adding something special. For sure it requires some effort in advance, but it pays off if you have to use a virtual machine right away.

Baked images also empower the use of immutable servers, because most of the time they don’t require extra intervention after instantiation. In addition, if something goes wrong with the image instance, it’s better recreate it, rather than repair it. That makes baked images preferable to be used in autoscaling, once they are rapidly instantiated and ready.

Fried provisioning

On the other hand, fried provisioning is based on raw images, usually with just the operating system installed. These lightweight images, once instantiated, must be provisioned with all the required software and configuration, in order to be at the ready-to-use state. Analogously to fried eggs, you must follow the recipe and combine all the ingredients to the point they are ready to be consumed.

One concern about fried provisioning, when it is executed repeatedly, is avoid breaking it. During the process, a package manager, like apt, is usually used to install the required softwares. Unless you are specific on what version the package manager must install, the latest one will be installed. Unexpected behaviors can happen with untested newest versions, including a break in the provisioning process. For that reason, always be specific on what version must be installed.

Codeyourinfra provisioning options

Since the version 1.4.0 of the Codeyourinfra project on Github, the development environment can be initialized with both provisioning options: fried, the default, and baked. It means that the original base image, a minimized version of a Vagrant box with Ubuntu 14.04.3 LTS, can now be replaced by a baked one. The baked images are available at Vagrant Cloud, and can be downloaded not only by those who want to use the Codeyourinfra’s development environment, but also by the ones who want an image ready to use.

It’s quite simple choosing one provisioning option or the other. If you want to use the baked image, set the environment variable PROVISIONING_OPTION to baked, otherwise let it unset, because the fried option is the default, or specify the environment variable as fried.

Baking the image

The process of building the baked images was simple. I could have used a tool like Packer for automating it, but I manually followed this steps:

1.  vagrant up <machine>, where <machine> is the name of the VM defined in the Vagrantfile. The VM is then initialized from the minimal/trusty64 Vagrant box and provisioned by Ansible.

2. vagrant ssh <machine>, in order to connect to the VM through SSH. The user is vagrant. The VM is ready to use, the perfect moment to take a snapshot of the image. Before that, in order to get a smaller image, it’s recommended freeing up disk space:

sudo apt-get clean
sudo dd if=/dev/zero of=/EMPTY bs=1M
sudo rm -f /EMPTY
cat /dev/null > ~/.bash_history && history -c && exit

3. vagrant package <machine> –output baked.box, for finally creating the baked image file, which was then uploaded to the Vagrant Cloud.

The initialization duration

Vagrant by default does not show, along the command’s output, a timestamp in each step executed. Hence you are not able to easily know how long the environment initialization takes. In order to overcome this limitation, another environment variable was introduced: APPEND_TIMESTAMP. If it is set to true, the current datetime is prepended in every output line, so you can measure the initialization duration.

Each Vagrantfile, when executed, now loads right in the beginning the Ruby code below, that overrides the default Vagrant output behavior if the APPEND_TIMESTAMP flag is turned on. Actually, Vagrant has already an issue on Github addressing such enhancement, where this code was presented as a turnaround solution.

append_timestamp = ENV['APPEND_TIMESTAMP'] || 'false'
if append_timestamp != 'true' && append_timestamp != 'false'
  puts 'APPEND_TIMESTAMP must be \'true\' or \'false\'.'
  abort
end
if append_timestamp == 'true'
  def $stdout.write string
    log_datas=string
    if log_datas.gsub(/\r?\n/, "") != ''
      log_datas=::Time.now.strftime("%d/%m/%Y %T")+" "+log_datas.gsub(/\r\n/, "\n")
    end
    super log_datas
  end
  def $stderr.write string
    log_datas=string
    if log_datas.gsub(/\r?\n/, "") != ''
      log_datas=::Time.now.strftime("%d/%m/%Y %T")+" "+log_datas.gsub(/\r\n/, "\n")
    end
    super log_datas
  end
end

Feel free to experiment the provisioning options along with the timestamp appending flag set to true! You now have a better environment to try the Codeyourinfra project solutions.

And don’t forget to tell me your problem! For sure we can find a solution together 🙂

 

 

How to get metrics for alerting in advance and preventing trouble

Eventual warning symbol of a monitoring service

Although we all have to deal with unexpected events, we also have tools to prevent them. Like mentioned in the last post, log files must be accessible upfront, otherwise the troubleshooting is compromised. Before any issue occurs, there’s a lot we can do, in order to be aware of what’s going on, act proactively and don’t let the problem become reality.

Most of the companies have already implemented a monitoring solution. Usually my sysadmin friends are the people in charge of such solutions. If you have this responsibility, you know how difficult is gather all the metrics, show them in fancy dashboards, and properly send alerts to the ones who must react in case of some evidence of trouble. Maybe, more often than you would like to, you have to justify why some metric wasn’t considered, or wasn’t shown, or some alert wasn’t sent. The bigger the monitoring service, the more likely to happen this kind of situation.

Don’t let your avoiding problems task become a problem itself. You can use open source tools and get a monitoring server ready to do the job. Once up and running, you will be able to easily plug any other server into the monitoring service, with no need of an installed agent. In addition, you will be able to send alert notifications through instant messaging apps, like Slack and Telegram, instead of by email.

The solution combines InfluxDB, a high performance time series database, Grafana, a time series analytics and monitoring tool, and Ansible, an agentless automation tool. With Ansible is possible to extract constantly the servers’ hardware metrics and store them in the InfluxDB database. With Grafana is possible to connect to InfluxDB database and show the metrics in dashboards, define thresholds and configure alerts. The solution can be checked out on Github, and the details are shown right below..

The development environment

The monitored environment was reproduced using local VirtualBox machines, one representing the monitoring server (monitor) and the other two as servers that could be plugged into the monitoring service (server1 and server2). Vagrant was used to manage this development environment. With the Vagrantfile below, it’s possible to smoothly turn on and provision the monitoring server, by executing the command vagrant up monitor. Notice that the VMs server1 and server2 are also defined, but they can be booted up later, if you want to plug just one or both into the monitoring service.

Vagrant.configure("2") do |config|
  config.vm.box = "minimal/trusty64"

  config.vm.define "monitor" do |monitor|
    monitor.vm.hostname = "monitor.local"
    monitor.vm.network "private_network", ip: "192.168.33.10"
    monitor.vm.provision "ansible" do |ansible|
      ansible.playbook = "playbook-monitor.yml"
    end
  end

  (1..2).each do |i|
    config.vm.define "server#{i}" do |server|
      server.vm.hostname = "server#{i}.local"
      server.vm.network "private_network", ip: "192.168.33.#{i+1}0"
    end
  end
end

The monitoring server provisioning is done by Ansible, and is divided in two basic parts: installation of the tools (InfluxDB, Grafana and Ansible) and configuration of the monitoring service. Notice that Ansible is used to install Ansible! The playbook-monitor.yml below shows that.

Besides, rather than putting all the tasks in a big unique file, each tool installation’s tasks were placed in a specific YML file, in order to get the code clean, organized and easy to understand. The grouped tasks then can be dynamically included in the main playbook through the include_tasks statement.

---
- hosts: monitor
  become: yes
  gather_facts: no
  tasks:
  - name: Install apt-transport-https (required for the apt_repository task)
    apt:
      name: apt-transport-https
      update_cache: yes
    tags:
      - installation
  - name: Install InfluxDB
    include_tasks: influxdb-installation.yml
    tags:
      - installation
  - name: Install Grafana
    include_tasks: grafana-installation.yml
    tags:
      - installation
  - name: Install Ansible
    include_tasks: ansible-installation.yml
    tags:
      - installation
  - name: Configure monitoring
    include_tasks: monitoring-configuration.yml
    tags:
      - configuration

The monitoring service configuration

The monitoring service configuration is composed by some steps, as shown in the monitoring-configuration.yml file below. First and foremost, the InfluxDB database, named monitor, is created. InfluxDB provides a very useful API, which can be used for a variety of database operations. For interacting with webservices, the Ansible uri module is the most indicated. All the metrics extracted from the monitored servers are stored in the monitor database.

After that, the Grafana data source that connects to the InfluxDB database is created. That way Grafana is able to access all the stored metrics data. Like InfluxDB, Grafana has an API which allows make most if not all of the configuration, through JSON-formatted content.  Besides the data source creation, the Slack notification channel and the first dashboard are also created. Notice that, in order to assume as successful the task when the playbook is executed again, and guarantee the idempotence, responses statuses other than 200 are considered as well.

The configured Slack notification channel points to a test Slack workspace. Of course you can join, but I’m pretty sure you will want to create your own, and invite the troubleshooting guys to join. Don’t forget to create in your Slack workspace a incoming webhook and replace the JSON url field value by the generated webhook URL.

The initial dashboard shows the used memory percentage metric. Other metrics can be added to it, or you can create new dashboards, at your will. A threshold of 95% was defined, so you can visually know when the metric exceeded such limit. An alert was also defined, and a notification is sent to the configured Slack channel when the last five metric values are greater than or equal to the limit of 95%. The alert also send a notification when the server health is restabilized.

With Ansible you can perform tasks in several servers at the same time. It’s possible because everything is done through SSH from a master host, even if it’s your own machine. Besides that, Ansible knows the target servers through the inventory file (/etc/ansible/hosts), where they are defined and also grouped. During the monitoring service configuration, the group monitored_servers is created in the inventory file. Every server once in this group is automatically monitored. Plugging a server into the monitoring service is as simple as adding a line in the file. The first server monitored is the monitoring server itself (localhost).

In order to prevent Ansible from checking the SSH key of the servers plugged into the monitoring service, it’s necessary to disable the default behavior in the Ansible configuration file (/etc/ansible/ansible.cfg). This way Ansible won’t have problems in collecting metrics from any new server through SSH.

Finally, an Ansible playbook (playbook-get-metrics.yml) is used to connect to all monitored servers and extract all the relevant metrics needed. It’s placed in the /etc/ansible/playbooks directory and configured in CRON to be executed every minute. Just to sum up, every minute the metrics are collected, stored, shown and in case of evidence of trouble, an alert is sent. Isn’t it awesome!

---
- name: Create the InfluxDB database
  uri:
    url: http://localhost:8086/query
    method: POST
    body: "q=CREATE DATABASE monitor"
- name: Create the Grafana datasource
  uri:
    url: http://localhost:3000/api/datasources
    method: POST
    user: admin
    password: admin
    force_basic_auth: yes
    body: "{{lookup('file','monitor-datasource.json')}}"
    body_format: json
  register: response
  failed_when: response.status != 200 and response.status != 409
- name: Create the Slack notification channel
  uri:
    url: http://localhost:3000/api/alert-notifications
    method: POST
    user: admin
    password: admin
    force_basic_auth: yes
    body: "{{lookup('file','slack-notification-channel.json')}}"
    body_format: json
  register: response
  failed_when: response.status != 200 and response.status != 500
- name: Create the Grafana dashboard
  uri:
    url: http://localhost:3000/api/dashboards/db
    method: POST
    user: admin
    password: admin
    force_basic_auth: yes
    body: "{{lookup('file','used_mem_pct-dashboard.json')}}"
    body_format: json
  register: response
  failed_when: response.status != 200 and response.status != 412
- name: Add localhost to Ansible inventory
  blockinfile:
    path: /etc/ansible/hosts
    block: |
      [monitored_servers]
      localhost ansible_connection=local
- name: Disable SSH key host checking
  ini_file:
    path: /etc/ansible/ansible.cfg
    section: defaults
    option: host_key_checking
    value: False
- name: Create the Ansible playbooks directory if it doesn't exist
  file:
    path: /etc/ansible/playbooks
    state: directory
- name: Copy the playbook-get-metrics.yml
  copy:
    src: playbook-get-metrics.yml
    dest: /etc/ansible/playbooks/playbook-get-metrics.yml
    owner: root
    group: root
    mode: 0644
- name: Get metrics from monitored servers every minute
  cron:
    name: "get metrics"
    job: "ansible-playbook /etc/ansible/playbooks/playbook-get-metrics.yml"

Collecting the metrics

The playbook-get-metrics.yml file below is responsible for extracting from the monitored_servers all the important metrics and storing them in the monitor database. Initially the only extracted metric is the used memory percentage, but you can easily start to extract more metrics adding tasks in the playbook.

Notice that the InfluxDB writing data API is used to store the metric in the monitor database. 192.168.33.10 is the IP address of the monitoring server and 8086 is the port where InfluxDB is on. The used memory percentage has the key used_mem_pct in the database, and you must choose an appropriate key for each metric you start to extract.

Ansible by default collects information about the target host. It’s an initial step before the tasks execution. The collected data is then available to be used by the tasks. The hostname (ansible_hostname) is one of those, essential to differentiate the server from where the metric is extracted. By the way, the used memory percentage is calculated also using two of the data gathered by Ansible: the used real memory in megabytes (ansible_memory_mb.real.used) and the total real memory in megabytes too (ansible_memory_mb.real.total). If you want to know all of such data, execute the command ansible monitor -m setup -u vagrant -k -i hosts, and type vagrant when prompted the SSH password. Notice that the information is JSON-formatted, and the values can be accessed through dot-notation.

---
- hosts: monitored_servers
  tasks:
  - name: Used memory percentage
    uri:
      url: http://192.168.33.10:8086/write?db=monitor
      method: POST
      body: "used_mem_pct,host={{ansible_hostname}} value={{ansible_memory_mb.real.used / ansible_memory_mb.real.total * 100}}"
      status_code: 204

Plugging a server into the monitoring service

Probably you’ve already executed the command vagrant up monitor, in order to get the monitoring server up and running. If not, do it right now. It demands some time, depending on how fast is your Internet connection. You can follow the output and see each step of the server provisioning.

When finished, open your browser and access the Grafana web application by typing the URL http://192.168.33.10:3000.  The user and the password to log in are the same: admin. Click in the used_mem_pct dashboard link, and take a look at the values concerning the monitoring server in the presented line chart. You may need to wait a few minutes until having enough values to track.

Ok, you may now want to plug another server into the monitoring service, and see its values in the line chart too. So, turn on the server1, for example, executing the command vagrant up server1. After that, execute the Ansible playbook below through the command ansible-playbook playbook-add-server.yml -u vagrant -k -i hosts. The -u argument defines the SSH user, the -k argument prompts for password input (vagrant, too), and the -i argument points to the hosts file, where the monitoring server is defined.

You will be prompted to inform the new server’s IP address and the SSH credentials, in order to enable Ansible to connect to the server. That’s enough to plug the server into the monitoring service, simply by inserting a line in the monitoring server’s /etc/ansible/hosts file. The next time CRON execute the playbook-get-metrics.yml, one minute later, server1 will be also considered a monitored server, so its metrics will be extracted, stored and shown in the dashboard too.

---
- hosts: monitor
  become: yes
  gather_facts: no
  vars_prompt:
  - name: "host"
    prompt: "Enter host"
    private: no
  - name: "user"
    prompt: "Enter user"
    private: no
  - name: "password"
    prompt: "Enter password"
    private: yes
  tasks:
  - name: Add the server into the monitored_servers group
    lineinfile:
      path: /etc/ansible/hosts
      insertafter: "[monitored_servers]"
      line: "{{host}} ansible_user={{user}} ansible_ssh_pass={{password}}"

Conclusion

Monitoring is key in high performance organizations. It’s one of the pillars of DevOps. Better monitoring solutions shorten feedback cycles, foster the continuous learning and the continuous improving.

Among the variety of monitoring solutions, the one just described aims to be cheap, flexible and easy to implement. Some benefits of its adoption are:

  • the solution does not require installing an agent in every monitored server, taking advantage from the agentless feature of Ansible;
  • it stores all the metrics data in InfluxDB, a high performance time series database;
  • it centralizes the data presentation and the alerts configuration in Grafana, a powerful data analytics and monitoring tool.

I hope this solution can solve at least one of your pain points in your monitoring tasks. Experiment it and improve it and share it at your will.

Finally, if you want my help in automating something, please give me more details, tell me your problem. It may be a problem of someone else too.

How to check log files in a server without logging in the server

Accessing log files for troubleshooting purposes

My sysadmin friends spend part of their time helping the developers in troubleshooting. Sometimes, when there’s a big problem, it increases a lot. When it happens, it’s not difficult to feel overwhelmed, by the pressure of solving the problem itself, and unfortunately by the setbacks are faced throughout the troubleshooting process.

Many companies have strict security policies that prevent the developers from accessing servers through SSH. The problem is when they need to check log files that exist in such servers, during an outage, for example. When a crisis happens, there’s no time to spend with bureaucracies, the log files must be accessible right away for troubleshooting.

One solution to that is provide the log files to the developers or anyone in charge of troubleshooting with no need of logging in the servers. The security policies are followed and the required availability of the log files is met. It’s possible by installing and configuring the Apache HTTP Server in a way that the log files are accessible through a web browser.

The solution can be checked out on Github. It uses Ansible to automate the task of making the log files accessible, and Vagrant + VirtualBox to create the development and testing environment for such automation.

The development environment

The development environment is very important to create. It must be created locally in your own computer. It’s needless to develop and test Ansible playbooks other way. You might ask why not use some server to do such task, but be aware servers are usually shared, and someone may accidentally mess your stuff.

Furthermore, coding is very dynamic. You need an environment to experiment, and make mistakes (trial-and-error method). Some code you will sure throw away until find the solution. So imagine if you test your code against a real server and leave it in a state hard to rollback? With your own environment you can easily recreate VMs and retest your code from the scratch, over and over, at your will.

Vagrant is an awesome tool to build your development environment. Its default integration with VirtualBox simplifies a lot managing VMs. Through command line, you can create, provision, connect via SSH to and destroy VMs, just a few operations. The command vagrant up, for example, puts your environment up and running, based on the Vagrantfile, like the one below.

Vagrant.configure("2") do |config|
  config.vm.define "jenkins" do |jenkins|
    jenkins.vm.box = "minimal/trusty64"
    jenkins.vm.hostname = "jenkins.local"
    jenkins.vm.network "private_network", ip: "192.168.33.10"
    jenkins.vm.provision "ansible" do |ansible|
      ansible.playbook = "playbook-jenkins.yml"
    end
  end
end

In order to simulate a server where an application runs and adds data into log files, only one VM was used. It’s important to have a VM as similar as possible to your real servers. For that reason, use VMs with the same OS and even with the same basic configuration. Packer is a great tool to create VM images that are alike your servers. In the solution scope, a reduced version of an Ubuntu VM was used (minimal/trusty64).

Notice that the VM is provisioned during its booting up. Vagrant has integration with several provisioners, including Ansible. In the VM is basically installed the Oracle Java and Jenkins, in this order. Jenkins is an open source automation server, broadly used for delivering software, and with the adoption of Infrastructure as Code, can be used for delivering infrastructure as well. If your delivering process is done by Jenkins, for sure you will need to take a look to the tool log files once in a while.

---
- hosts: jenkins
  become: yes
  gather_facts: no
  tasks:
  - name: Install apt-transport-https (required for the apt_repository task)
    apt:
      name: apt-transport-https
      update_cache: yes
  - name: Install Oracle Java 8 (required for Jenkins installation)
    include_tasks: oracle-java8-installation.yml
  - name: Install Jenkins
    include_tasks: jenkins-installation.yml

During the playbook-jenkins.yml execution, the tasks related to the Oracle Java installation (oracle-java8-installation.yml) and the ones concerning the Jenkins installation (jenkins-installation.yml) are included dynamically through the include_tasks statement. It’s a good practice of code organizing, once keeps everything in its right place, and maintain the playbook files as small as possible. Moreover, it’s a great way of enabling code reusing.

The solution implementation

Right after the Jenkins server is turned on, you can open your web browser and type the URL http://192.168.33.10:8080. You will see the Jenkins configuration initial page. It asks for the auto-generated administrator password, informed in the jenkins.log file. Please don’t get the password, accessing the VM through SSH. Remember that’s what we want to prevent. So keep calm and implement the solution before.

Jenkins stores its log files in the /var/log/jenkins directory.  Then, we must to configure the Apache HTTP Server to expose such folder. This is done by using the apache-logs.conf file shown below. This is a template that can be used for any directory you want to make visible through the web browser.

If you want more details on how this configuration works, take a look at the Directory and the Alias directives documentation. For now, all we need to know is that the {{directory}} and the {{alias}} will be replaced respectively by the log files folder and the alias required to complement the URL address.

<Directory "{{directory}}">
    Options Indexes FollowSymLinks
    AllowOverride None
    Require all granted
</Directory>

Alias "{{alias}}" "{{directory}}"

The variables defined in the playbook-jenkins.logs.yml below are used in such replacement. Notice that the directory variable points to the cited Jenkins log files folder, and the alias value is /logs/jenkins. The other variable (conf) defines the configuration file resultant that will be placed in the Apache folders reserved for configuration files (/etc/apache2/conf*).

The Ansible playbook can be easily adapted to meet your needs. If some developer come to you asking for help, because he or she have to check inaccessible log files, just change the variables values, and execute the playbook against the server where the files are.

Ok, let’s finally implement the solution. Execute the command ansible-playbook playbook-jenkins-logs.yml -u vagrant -k -i hosts.  The -u argument defines the SSH user, the -k argument prompts for password input (vagrant, too), and the -i argument points to the hosts file, where Ansible can find the Jenkins server IP address.

---
- hosts: jenkins
  become: yes
  gather_facts: no
  vars:
  - directory: /var/log/jenkins
  - alias: /logs/jenkins
  - conf: jenkins-logs.conf
  tasks:
  - name: Install Apache 2
    apt:
      name: apache2
      update_cache: yes
  - name: Config Apache logs
    template:
      src: apache-logs.conf
      dest: /etc/apache2/conf-available/{{conf}}
      owner: root
      group: root
      mode: 0644
  - name: Enable new config
    file:
      src: ../conf-available/{{conf}}
      dest: /etc/apache2/conf-enabled/{{conf}}
      owner: root
      group: root
      state: link
  - name: Restart Apache 2
    service:
      name: apache2
      state: restarted

During the execution the Apache HTTP Server is installed, and the configuration file is placed with the right values in the /etc/apache2/conf-available. The file content can be verified through the Ansible ad-hoc command ansible jenkins -m shell -a “cat /etc/apache2/conf-available/jenkins-logs.conf” -u vagrant -k -i hosts. After that, the configuration is enabled by creating a symbolic link in /etc/apache2/conf-enabled folder, pointing right to the configuration file. Lastly, the Apache HTTP server is restarted.

Now open a new tab in your web browser and type the URL http://192.168.33.10/logs/jenkins. You will see all the content of the Jenkins server /var/log/jenkins folder, including the jenkins.log file! Notice that the URL has the /logs/jenkins configured alias. You can after all open the log file in order to get the auto-generated administrator password. Just copy it, go back to the Jenkins configuration initial page, paste the password and continue.

Conclusion

Despite the fact we must follow the company security policies, we must facilitate the troubleshooting process too. DevOps also means one problem is everyone’s problem, so let’s work together in order to solve all of them. If you enjoyed the solution, share it right now!

Before I forget, if you want my help in automating something, please give me more details, tell me your problem. It may be a problem of someone else too.

How to deal with the same configuration file with different content in different environments

Configuration in multiples services at once

Different from the previous post, in this case it was a demand of a dev friend. His application required a specific properties file in order to get the database connection string, an URL to connect to the MongoDB instance. The problem was that each environment had its own MongoDB instance, so the properties file content was different, depending on where it was placed.

The common approach for such problem is to have different versions of the same file, each version with the appropriate content for the related environment. What differentiates one file from another are the directories in the filesystem or the branches in the SCM repository where the files are put in, because they are named based on the environments’ names. When this approach is adopted, the right version of the configuration file is usually embedded to the application package during the deployment process.

The solution tried to eliminate that complexity, decoupling the configuration from the application, and centralizing all the needed configuration in just one file. The solution can be checked out on Github. It was developed using Ansible, and tested in a VM environment built using Vagrant and the VirtualBox hypervisor. The details are shown right below.

The test environment

In order to simulate my friend’s QA environment, with different servers where the application is deployed, 3 VMs were booted up locally: qa1, qa2 and qa3. This way it was possible to test the Ansible playbook during its development, before executing it directly to the real servers.

The Vagrantfile below was used to build such test environment. Notice this is Ruby, each VM was defined within a loop, and received an IP address. The VM image (box) used was minimal/trusty64, a reduced version of Ubuntu, for a faster first-time download and set up during the vagrant up command execution.

Vagrant.configure("2") do |config|
  config.vm.box = "minimal/trusty64"

  (1..3).each do |i|
    config.vm.define "qa#{i}" do |qa|
      qa.vm.hostname = "qa#{i}.local"
      qa.vm.network "private_network", ip: "192.168.33.#{i}0"
    end
  end
end

The playbook execution

With Ansible you can perform tasks in several servers at the same time. It’s possible because everything is done through SSH from a master host, even if it’s your own machine. Besides that, Ansible knows the target servers through the inventory file (hosts), where they are defined and also grouped. In the hosts file below the QA servers were defined inside the group qa.

[qa]
192.168.33.10
192.168.33.20
192.168.33.30

The core of the solution is undoubtedly the config.json file. It concentrates all the needed configuration for each QA server. If my friend’s application requires more parameters, they can be easily added. The host element identifies the target server, and the items are the properties the application has to have in order to run appropriately.

[
  {
    "host": "qa1",
    "items": [
      {
        "key": "prop1",
        "value": "A"
      },
      {
        "key": "prop2",
        "value": "B"
      }
    ]
  },
  {
    "host": "qa2",
    "items": [
      {
        "key": "prop1",
        "value": "C"
      },
      {
        "key": "prop2",
        "value": "D"
      }
    ]
  },
  {
    "host": "qa3",
    "items": [
      {
        "key": "prop1",
        "value": "E"
      },
      {
        "key": "prop2",
        "value": "F"
      }
    ]
  }
]

In the solution, the configuration file is /etc/conf, but it could have any name and could be placed in any directory of the application server. The etc folder has root permissions, so it requires that the SSH user is able to become root (become: yes).

The playbook.yml below is pointing to the qa group previously defined in the hosts file (hosts: qa). Ansible then can execute it against the 3 VMs: qa1, qa2 and qa3. Each one is found out during the gathering facts phase, when the hostname variable is set.

The config variable points to the config.json file content, and the items_query variable is necessary to find inside the JSON content the properties key/value pairs of the respective server. The task ensures that there will be a line in the configuration file for each property.

---
- hosts: qa
  become: yes
  vars:
    hostname: "{{ansible_hostname}}"
    config: "{{lookup('file', 'config.json')}}"
    items_query: "[?host=='{{hostname}}'].items"
  tasks:
  - name: Set the configuration file content
    lineinfile:
      path: /etc/conf
      create: yes
      regexp: "^{{item.key}}=.*$"
      line: "{{item.key}}={{item.value}}"
    with_items: "{{config|json_query(items_query)}}"

The execution of the playbook.yml has the following output. The -u parameter defines the SSH user and the -k parameter prompts for vagrant password (vagrant too). All Vagrant boxes have the vagrant user. Finally, the -i parameter points to the hosts file where the QA servers were defined.

Notice that the changes are made by Ansible in parallel in the servers. If the ansible-playbook command is executed several times, you will have different outputs, because Ansible forks the main process in order to perform the tasks simultaneously in the servers.

ansible-playbook playbook.yml -u vagrant -k -i hosts
SSH password: 

PLAY [qa] **************************************************************************************************************************************************************************************************

TASK [Gathering Facts] *************************************************************************************************************************************************************************************
ok: [192.168.33.10]
ok: [192.168.33.30]
ok: [192.168.33.20]

TASK [Set the configuration file content] ******************************************************************************************************************************************************************
changed: [192.168.33.30] => (item={'value': u'E', 'key': u'prop1'})
changed: [192.168.33.20] => (item={'value': u'C', 'key': u'prop1'})
changed: [192.168.33.10] => (item={'value': u'A', 'key': u'prop1'})
changed: [192.168.33.20] => (item={'value': u'D', 'key': u'prop2'})
changed: [192.168.33.30] => (item={'value': u'F', 'key': u'prop2'})
changed: [192.168.33.10] => (item={'value': u'B', 'key': u'prop2'})

PLAY RECAP *************************************************************************************************************************************************************************************************
192.168.33.10              : ok=2    changed=1    unreachable=0    failed=0   
192.168.33.20              : ok=2    changed=1    unreachable=0    failed=0   
192.168.33.30              : ok=2    changed=1    unreachable=0    failed=0

Finally, you can validate the playbook execution by using Ansible ad-hoc commands, like the one shown below. The command cat /etc/conf was used to ensure that each configuration file content is as expected. Ad-hoc commands are excellent to know what you want about several servers in just one shot.

ansible qa -m shell -a "cat /etc/conf" -u vagrant -k -i hosts
SSH password: 
192.168.33.30 | SUCCESS | rc=0 >>
prop1=E
prop2=F

192.168.33.10 | SUCCESS | rc=0 >>
prop1=A
prop2=B

192.168.33.20 | SUCCESS | rc=0 >>
prop1=C
prop2=D

One interesting aspect of this solution is the capacity of the playbook be executed over and over keeping the same results. In other words, even if someone inadvertently change the configuration file content, it will be fixed right in the next time the playbook is once more executed. It’s called idempotence.

Conclusion

Once again, I helped a friend, and I’m happy for that. Instead of maintaining several files, he maintains a single one, and it turns the configuration much simpler.

This solution can be applied in many use cases, so share it because certainly you will help someone else. And don’t forget to tell me your problem, I want to help you too.

How to unarchive different files in different servers in just one shot

Unarchive multiple files in just one shotIt would be simpler if you had to unarchive just one file in several servers, but what about different files in different servers? A sysadmin friend of mine reached out me with such challenge, once quite often he had to place specific files in a bunch of servers, for monitoring purposes.

He had a routine to package all the needed files, for each server, in TAR.GZ files. After the packaging step, he put all the tarball files in an Apache server, in a way they could be accessed for downloading, each one by an URL. Finally, no matter how long it would take, he logged in server by server, downloaded the specific compressed file, and extracted it to a directory.  It was needless to say there was a better way.

The solution can be checked out on Github. It was developed using Ansible, and tested in a VM environment built using Vagrant and the VirtualBox hypervisor. The details are shown right below.

The environment

In order to simulate my friend’s environment, 3 VMs were used: 1 representing the Apache server, called repo, and 2 representing the different servers: server1 and server2. Each one received an IP address, and the communication between them was established through a private network. Vagrant was the VM management tool used to turn them all on in just one command: vagrant up.  The Vagrantfile below was required by Vagrant to do such task.

Vagrant.configure("2") do |config|
  config.vm.box = "minimal/trusty64"

  config.vm.define "repo" do |repo|
    repo.vm.hostname = "repo.local"
    repo.vm.network "private_network", ip: "192.168.33.10"
    repo.vm.provision "ansible" do |ansible|
      ansible.playbook = "playbook-repo.yml"
    end
  end

  config.vm.define "server1" do |server1|
    server1.vm.hostname = "server1.local"
    server1.vm.network "private_network", ip: "192.168.33.20"
  end

  config.vm.define "server2" do |server2|
    server2.vm.hostname = "server2.local"
    server2.vm.network "private_network", ip: "192.168.33.30"
  end
end

Notice that in the Vagrantfile were defined:

  • The VM image (box) to be used: minimal/trusty64 (requires the Oracle VM VirtualBox Extension Pack), with a reduced version of Ubuntu (faster download and boot);
  • The hostname and the IP of each VM, including how they communicate with each other: private_network;
  • The provisioning of the repo VM, done by Ansible, automation tool required to be installed in the Vagrant host machine beforehand.

The repo server provisioning

The repo server is provisioned by Ansible during the vagrant up execution. The Apache HTTP Server is installed and 2 compressed files are obtained from the Internet. The objective is make the files available for downloading internally, by their URLs. The playbook-repo.yml below is executed by Ansible in order to do such task.

---
- hosts: repo
  become: yes
  gather_facts: no
  tasks:
  - name: Install Apache 2
    apt:
      name: apache2
      update_cache: yes
  - name: Download files
    get_url:
      url: "{{item.url}}"
      dest: "/var/www/html/{{item.dest}}"
    with_items: [{"url": "https://archive.apache.org/dist/maven/maven-3/3.5.0/binaries/apache-maven-3.5.0-bin.tar.gz", "dest": "server1.tar.gz"},
                 {"url": "https://archive.apache.org/dist/ant/binaries/apache-ant-1.10.1-bin.zip", "dest": "server2.zip"}]

Some details about the playbook-repo.yml execution:

  • The VM user must become root, in order to install the Apache Server, hence the become: yes;
  • Ansible by default collects information about the target host. It’s an initial step before the tasks execution. When such information is not necessary, the step can be bypassed. The gather_facts : no in this case is recommended to save time, too;
  • The installation of the Apache Server was done through apt_get, the package management tool of Ubuntu. If the OS were CentOS, for example, it could be installed through yum;
  • Both files are downloaded in just one task. It’s possible because Ansible allows the use of loops, through the with_items statement.

The playbook-servers.yml execution

Ansible can be used for executing tasks in several target hosts in just one shot. It’s possible because of the inventory file, where groups of hosts can be defined. In the hosts file below was defined the servers group, composed by  server1 (192.168.33.20) and server2 (192.168.33.30).

[repo]
192.168.33.10

[servers]
192.168.33.20
192.168.33.30

An important part of the solution was separate all the needed parameters in a specific file, called params.json. In this file, each server has its compressed file URL defined, as long as its target directory, where the downloaded file will be extracted, like shown below. Notice that both URLs point to the repo server (192.168.33.10), and each one to the file previously provided during the provisioning phase.

[
  {
    "host": "server1",
    "url": "http://192.168.33.10/server1.tar.gz",
    "target": "/var/target"
  },
  {
    "host": "server2",
    "url": "http://192.168.33.10/server2.zip",
    "target": "/var/target"
  }
]

With the environment up and the parameters defined, we can finally unarchive different files in different servers in just one shot, executing the command ansible-playbook playbook-servers.yml -u vagrant -k -i hosts. The -u argument defines the SSH user, the -k argument prompts for password input (vagrant, too), and the -i argument points to the hosts file, commented earlier, instead of the default /etc/ansible/hosts.

---
- hosts: servers
  become: yes
  vars:
    hostname: "{{ansible_hostname}}"
    params: "{{lookup('file', 'params.json')}}"
    url_query: "[?host=='{{hostname}}'].url"
    url_param: "{{(params|json_query(url_query))[0]}}"
    target_query: "[?host=='{{hostname}}'].target"
    target_param: "{{(params|json_query(target_query))[0]}}"
  tasks:
  - name: Create the target directory if it doesn't exist
    file:
      path: "{{target_param}}"
      state: directory
  - name: Install unzip
    apt:
      name: unzip
      update_cache: yes
    when: url_param | match(".*\.zip$")
  - name: Unarchive from url
    unarchive:
      src: "{{url_param}}"
      dest: "{{target_param}}"
      remote_src: yes

Some details about the playbook-servers.yml execution:

  • By pointing to the group servers (hosts: servers), Ansible is able to execute the same playbook for both servers: server1 and server2;
  • The parameters of each server are obtained through variables:
    • hostname – the name of the current host found by Ansible during the gathering facts phase;
    • params – the params.json file content, returned by the lookup function;
    • url_query – the query to find the URL parameter defined for the current host;
    • url_param – the URL parameter defined for the current host, returned by the json_query filter;
    • target_query – the query to find the target parameter defined for the current host;
    • target_param – the target directory defined for the current host, returned by the json_query filter.
  • The target directory is created, if it doesn’t exist yet. It’s required by the unarchive task. Otherwise an error occurs;
  • The unzip tool is installed, only if the remote file has the extension ZIP. This step is necessary because that’s the case of the server2’s remote file, and the subsequent unarchive task can extract files compressed through different algorithms. If the when statement condition is not met, the task is skipped;
  • Finally, the compressed file is downloaded from the repo server and extracted to the target directory.
ansible-playbook playbook-servers.yml -u vagrant -k -i hosts
SSH password: 

PLAY [servers] *********************************************************************************************************************************************************************************************

TASK [Gathering Facts] *************************************************************************************************************************************************************************************
ok: [192.168.33.30]
ok: [192.168.33.20]

TASK [Create the target directory if it doesn't exist] *****************************************************************************************************************************************************
changed: [192.168.33.20]
changed: [192.168.33.30]

TASK [Install unzip] ***************************************************************************************************************************************************************************************
skipping: [192.168.33.20]
changed: [192.168.33.30]

TASK [Unarchive from url] **********************************************************************************************************************************************************************************
changed: [192.168.33.20]
changed: [192.168.33.30]

PLAY RECAP *************************************************************************************************************************************************************************************************
192.168.33.20              : ok=3    changed=2    unreachable=0    failed=0   
192.168.33.30              : ok=4    changed=3    unreachable=0    failed=0

Conclusion

My friend became really happy to save a lot of his time using such automation, and I’m sure other sysadmins with the same or similar tasks can benefit from it. So, if you enjoyed the solution, or think it’s useful for some friend of yours, don’t hesitate and share it.

Regardless its utility, bear in mind this solution is a work in progress, so feel free to collaborate and to improve it. After all, that’s the open source way.

Finally, if you want my help in automating something, please give me more details, tell me your problem. It may be a problem of someone else too.