How to get metrics for alerting in advance and preventing trouble

Eventual warning symbol of a monitoring service

Although we all have to deal with unexpected events, we also have tools to prevent them. Like mentioned in the last post, log files must be accessible upfront, otherwise the troubleshooting is compromised. Before any issue occurs, there’s a lot we can do, in order to be aware of what’s going on, act proactively and don’t let the problem become reality.

Most of the companies have already implemented a monitoring solution. Usually my sysadmin friends are the people in charge of such solutions. If you have this responsibility, you know how difficult is gather all the metrics, show them in fancy dashboards, and properly send alerts to the ones who must react in case of some evidence of trouble. Maybe, more often than you would like to, you have to justify why some metric wasn’t considered, or wasn’t shown, or some alert wasn’t sent. The bigger the monitoring service, the more likely to happen this kind of situation.

Don’t let your avoiding problems task become a problem itself. You can use open source tools and get a monitoring server ready to do the job. Once up and running, you will be able to easily plug any other server into the monitoring service, with no need of an installed agent. In addition, you will be able to send alert notifications through instant messaging apps, like Slack and Telegram, instead of by email.

The solution combines InfluxDB, a high performance time series database, Grafana, a time series analytics and monitoring tool, and Ansible, an agentless automation tool. With Ansible is possible to extract constantly the servers’ hardware metrics and store them in the InfluxDB database. With Grafana is possible to connect to InfluxDB database and show the metrics in dashboards, define thresholds and configure alerts. The solution can be checked out on Github, and the details are shown right below..

The development environment

The monitored environment was reproduced using local VirtualBox machines, one representing the monitoring server (monitor) and the other two as servers that could be plugged into the monitoring service (server1 and server2). Vagrant was used to manage this development environment. With the Vagrantfile below, it’s possible to smoothly turn on and provision the monitoring server, by executing the command vagrant up monitor. Notice that the VMs server1 and server2 are also defined, but they can be booted up later, if you want to plug just one or both into the monitoring service.

Vagrant.configure("2") do |config|
  config.vm.box = "minimal/trusty64"

  config.vm.define "monitor" do |monitor|
    monitor.vm.hostname = "monitor.local"
    monitor.vm.network "private_network", ip: "192.168.33.10"
    monitor.vm.provision "ansible" do |ansible|
      ansible.playbook = "playbook-monitor.yml"
    end
  end

  (1..2).each do |i|
    config.vm.define "server#{i}" do |server|
      server.vm.hostname = "server#{i}.local"
      server.vm.network "private_network", ip: "192.168.33.#{i+1}0"
    end
  end
end

The monitoring server provisioning is done by Ansible, and is divided in two basic parts: installation of the tools (InfluxDB, Grafana and Ansible) and configuration of the monitoring service. Notice that Ansible is used to install Ansible! The playbook-monitor.yml below shows that.

Besides, rather than putting all the tasks in a big unique file, each tool installation’s tasks were placed in a specific YML file, in order to get the code clean, organized and easy to understand. The grouped tasks then can be dynamically included in the main playbook through the include_tasks statement.

---
- hosts: monitor
  become: yes
  gather_facts: no
  tasks:
  - name: Install apt-transport-https (required for the apt_repository task)
    apt:
      name: apt-transport-https
      update_cache: yes
    tags:
      - installation
  - name: Install InfluxDB
    include_tasks: influxdb-installation.yml
    tags:
      - installation
  - name: Install Grafana
    include_tasks: grafana-installation.yml
    tags:
      - installation
  - name: Install Ansible
    include_tasks: ansible-installation.yml
    tags:
      - installation
  - name: Configure monitoring
    include_tasks: monitoring-configuration.yml
    tags:
      - configuration

The monitoring service configuration

The monitoring service configuration is composed by some steps, as shown in the monitoring-configuration.yml file below. First and foremost, the InfluxDB database, named monitor, is created. InfluxDB provides a very useful API, which can be used for a variety of database operations. For interacting with webservices, the Ansible uri module is the most indicated. All the metrics extracted from the monitored servers are stored in the monitor database.

After that, the Grafana data source that connects to the InfluxDB database is created. That way Grafana is able to access all the stored metrics data. Like InfluxDB, Grafana has an API which allows make most if not all of the configuration, through JSON-formatted content.  Besides the data source creation, the Slack notification channel and the first dashboard are also created. Notice that, in order to assume as successful the task when the playbook is executed again, and guarantee the idempotence, responses statuses other than 200 are considered as well.

The configured Slack notification channel points to a test Slack workspace. Of course you can join, but I’m pretty sure you will want to create your own, and invite the troubleshooting guys to join. Don’t forget to create in your Slack workspace a incoming webhook and replace the JSON url field value by the generated webhook URL.

The initial dashboard shows the used memory percentage metric. Other metrics can be added to it, or you can create new dashboards, at your will. A threshold of 95% was defined, so you can visually know when the metric exceeded such limit. An alert was also defined, and a notification is sent to the configured Slack channel when the last five metric values are greater than or equal to the limit of 95%. The alert also send a notification when the server health is restabilized.

With Ansible you can perform tasks in several servers at the same time. It’s possible because everything is done through SSH from a master host, even if it’s your own machine. Besides that, Ansible knows the target servers through the inventory file (/etc/ansible/hosts), where they are defined and also grouped. During the monitoring service configuration, the group monitored_servers is created in the inventory file. Every server once in this group is automatically monitored. Plugging a server into the monitoring service is as simple as adding a line in the file. The first server monitored is the monitoring server itself (localhost).

In order to prevent Ansible from checking the SSH key of the servers plugged into the monitoring service, it’s necessary to disable the default behavior in the Ansible configuration file (/etc/ansible/ansible.cfg). This way Ansible won’t have problems in collecting metrics from any new server through SSH.

Finally, an Ansible playbook (playbook-get-metrics.yml) is used to connect to all monitored servers and extract all the relevant metrics needed. It’s placed in the /etc/ansible/playbooks directory and configured in CRON to be executed every minute. Just to sum up, every minute the metrics are collected, stored, shown and in case of evidence of trouble, an alert is sent. Isn’t it awesome!

---
- name: Create the InfluxDB database
  uri:
    url: http://localhost:8086/query
    method: POST
    body: "q=CREATE DATABASE monitor"
- name: Create the Grafana datasource
  uri:
    url: http://localhost:3000/api/datasources
    method: POST
    user: admin
    password: admin
    force_basic_auth: yes
    body: "{{lookup('file','monitor-datasource.json')}}"
    body_format: json
  register: response
  failed_when: response.status != 200 and response.status != 409
- name: Create the Slack notification channel
  uri:
    url: http://localhost:3000/api/alert-notifications
    method: POST
    user: admin
    password: admin
    force_basic_auth: yes
    body: "{{lookup('file','slack-notification-channel.json')}}"
    body_format: json
  register: response
  failed_when: response.status != 200 and response.status != 500
- name: Create the Grafana dashboard
  uri:
    url: http://localhost:3000/api/dashboards/db
    method: POST
    user: admin
    password: admin
    force_basic_auth: yes
    body: "{{lookup('file','used_mem_pct-dashboard.json')}}"
    body_format: json
  register: response
  failed_when: response.status != 200 and response.status != 412
- name: Add localhost to Ansible inventory
  blockinfile:
    path: /etc/ansible/hosts
    block: |
      [monitored_servers]
      localhost ansible_connection=local
- name: Disable SSH key host checking
  ini_file:
    path: /etc/ansible/ansible.cfg
    section: defaults
    option: host_key_checking
    value: False
- name: Create the Ansible playbooks directory if it doesn't exist
  file:
    path: /etc/ansible/playbooks
    state: directory
- name: Copy the playbook-get-metrics.yml
  copy:
    src: playbook-get-metrics.yml
    dest: /etc/ansible/playbooks/playbook-get-metrics.yml
    owner: root
    group: root
    mode: 0644
- name: Get metrics from monitored servers every minute
  cron:
    name: "get metrics"
    job: "ansible-playbook /etc/ansible/playbooks/playbook-get-metrics.yml"

Collecting the metrics

The playbook-get-metrics.yml file below is responsible for extracting from the monitored_servers all the important metrics and storing them in the monitor database. Initially the only extracted metric is the used memory percentage, but you can easily start to extract more metrics adding tasks in the playbook.

Notice that the InfluxDB writing data API is used to store the metric in the monitor database. 192.168.33.10 is the IP address of the monitoring server and 8086 is the port where InfluxDB is on. The used memory percentage has the key used_mem_pct in the database, and you must choose an appropriate key for each metric you start to extract.

Ansible by default collects information about the target host. It’s an initial step before the tasks execution. The collected data is then available to be used by the tasks. The hostname (ansible_hostname) is one of those, essential to differentiate the server from where the metric is extracted. By the way, the used memory percentage is calculated also using two of the data gathered by Ansible: the used real memory in megabytes (ansible_memory_mb.real.used) and the total real memory in megabytes too (ansible_memory_mb.real.total). If you want to know all of such data, execute the command ansible monitor -m setup -u vagrant -k -i hosts, and type vagrant when prompted the SSH password. Notice that the information is JSON-formatted, and the values can be accessed through dot-notation.

---
- hosts: monitored_servers
  tasks:
  - name: Used memory percentage
    uri:
      url: http://192.168.33.10:8086/write?db=monitor
      method: POST
      body: "used_mem_pct,host={{ansible_hostname}} value={{ansible_memory_mb.real.used / ansible_memory_mb.real.total * 100}}"
      status_code: 204

Plugging a server into the monitoring service

Probably you’ve already executed the command vagrant up monitor, in order to get the monitoring server up and running. If not, do it right now. It demands some time, depending on how fast is your Internet connection. You can follow the output and see each step of the server provisioning.

When finished, open your browser and access the Grafana web application by typing the URL http://192.168.33.10:3000.  The user and the password to log in are the same: admin. Click in the used_mem_pct dashboard link, and take a look at the values concerning the monitoring server in the presented line chart. You may need to wait a few minutes until having enough values to track.

Ok, you may now want to plug another server into the monitoring service, and see its values in the line chart too. So, turn on the server1, for example, executing the command vagrant up server1. After that, execute the Ansible playbook below through the command ansible-playbook playbook-add-server.yml -u vagrant -k -i hosts. The -u argument defines the SSH user, the -k argument prompts for password input (vagrant, too), and the -i argument points to the hosts file, where the monitoring server is defined.

You will be prompted to inform the new server’s IP address and the SSH credentials, in order to enable Ansible to connect to the server. That’s enough to plug the server into the monitoring service, simply by inserting a line in the monitoring server’s /etc/ansible/hosts file. The next time CRON execute the playbook-get-metrics.yml, one minute later, server1 will be also considered a monitored server, so its metrics will be extracted, stored and shown in the dashboard too.

---
- hosts: monitor
  become: yes
  gather_facts: no
  vars_prompt:
  - name: "host"
    prompt: "Enter host"
    private: no
  - name: "user"
    prompt: "Enter user"
    private: no
  - name: "password"
    prompt: "Enter password"
    private: yes
  tasks:
  - name: Add the server into the monitored_servers group
    lineinfile:
      path: /etc/ansible/hosts
      insertafter: "[monitored_servers]"
      line: "{{host}} ansible_user={{user}} ansible_ssh_pass={{password}}"

Conclusion

Monitoring is key in high performance organizations. It’s one of the pillars of DevOps. Better monitoring solutions shorten feedback cycles, foster the continuous learning and the continuous improving.

Among the variety of monitoring solutions, the one just described aims to be cheap, flexible and easy to implement. Some benefits of its adoption are:

  • the solution does not require installing an agent in every monitored server, taking advantage from the agentless feature of Ansible;
  • it stores all the metrics data in InfluxDB, a high performance time series database;
  • it centralizes the data presentation and the alerts configuration in Grafana, a powerful data analytics and monitoring tool.

I hope this solution can solve at least one of your pain points in your monitoring tasks. Experiment it and improve it and share it at your will.

Finally, if you want my help in automating something, please give me more details, tell me your problem. It may be a problem of someone else too.

How to check log files in a server without logging in the server

Accessing log files for troubleshooting purposes

My sysadmin friends spend part of their time helping the developers in troubleshooting. Sometimes, when there’s a big problem, it increases a lot. When it happens, it’s not difficult to feel overwhelmed, by the pressure of solving the problem itself, and unfortunately by the setbacks are faced throughout the troubleshooting process.

Many companies have strict security policies that prevent the developers from accessing servers through SSH. The problem is when they need to check log files that exist in such servers, during an outage, for example. When a crisis happens, there’s no time to spend with bureaucracies, the log files must be accessible right away for troubleshooting.

One solution to that is provide the log files to the developers or anyone in charge of troubleshooting with no need of logging in the servers. The security policies are followed and the required availability of the log files is met. It’s possible by installing and configuring the Apache HTTP Server in a way that the log files are accessible through a web browser.

The solution can be checked out on Github. It uses Ansible to automate the task of making the log files accessible, and Vagrant + VirtualBox to create the development and testing environment for such automation.

The development environment

The development environment is very important to create. It must be created locally in your own computer. It’s needless to develop and test Ansible playbooks other way. You might ask why not use some server to do such task, but be aware servers are usually shared, and someone may accidentally mess your stuff.

Furthermore, coding is very dynamic. You need an environment to experiment, and make mistakes (trial-and-error method). Some code you will sure throw away until find the solution. So imagine if you test your code against a real server and leave it in a state hard to rollback? With your own environment you can easily recreate VMs and retest your code from the scratch, over and over, at your will.

Vagrant is an awesome tool to build your development environment. Its default integration with VirtualBox simplifies a lot managing VMs. Through command line, you can create, provision, connect via SSH to and destroy VMs, just a few operations. The command vagrant up, for example, puts your environment up and running, based on the Vagrantfile, like the one below.

Vagrant.configure("2") do |config|
  config.vm.define "jenkins" do |jenkins|
    jenkins.vm.box = "minimal/trusty64"
    jenkins.vm.hostname = "jenkins.local"
    jenkins.vm.network "private_network", ip: "192.168.33.10"
    jenkins.vm.provision "ansible" do |ansible|
      ansible.playbook = "playbook-jenkins.yml"
    end
  end
end

In order to simulate a server where an application runs and adds data into log files, only one VM was used. It’s important to have a VM as similar as possible to your real servers. For that reason, use VMs with the same OS and even with the same basic configuration. Packer is a great tool to create VM images that are alike your servers. In the solution scope, a reduced version of an Ubuntu VM was used (minimal/trusty64).

Notice that the VM is provisioned during its booting up. Vagrant has integration with several provisioners, including Ansible. In the VM is basically installed the Oracle Java and Jenkins, in this order. Jenkins is an open source automation server, broadly used for delivering software, and with the adoption of Infrastructure as Code, can be used for delivering infrastructure as well. If your delivering process is done by Jenkins, for sure you will need to take a look to the tool log files once in a while.

---
- hosts: jenkins
  become: yes
  gather_facts: no
  tasks:
  - name: Install apt-transport-https (required for the apt_repository task)
    apt:
      name: apt-transport-https
      update_cache: yes
  - name: Install Oracle Java 8 (required for Jenkins installation)
    include_tasks: oracle-java8-installation.yml
  - name: Install Jenkins
    include_tasks: jenkins-installation.yml

During the playbook-jenkins.yml execution, the tasks related to the Oracle Java installation (oracle-java8-installation.yml) and the ones concerning the Jenkins installation (jenkins-installation.yml) are included dynamically through the include_tasks statement. It’s a good practice of code organizing, once keeps everything in its right place, and maintain the playbook files as small as possible. Moreover, it’s a great way of enabling code reusing.

The solution implementation

Right after the Jenkins server is turned on, you can open your web browser and type the URL http://192.168.33.10:8080. You will see the Jenkins configuration initial page. It asks for the auto-generated administrator password, informed in the jenkins.log file. Please don’t get the password, accessing the VM through SSH. Remember that’s what we want to prevent. So keep calm and implement the solution before.

Jenkins stores its log files in the /var/log/jenkins directory.  Then, we must to configure the Apache HTTP Server to expose such folder. This is done by using the apache-logs.conf file shown below. This is a template that can be used for any directory you want to make visible through the web browser.

If you want more details on how this configuration works, take a look at the Directory and the Alias directives documentation. For now, all we need to know is that the {{directory}} and the {{alias}} will be replaced respectively by the log files folder and the alias required to complement the URL address.

<Directory "{{directory}}">
    Options Indexes FollowSymLinks
    AllowOverride None
    Require all granted
</Directory>

Alias "{{alias}}" "{{directory}}"

The variables defined in the playbook-jenkins.logs.yml below are used in such replacement. Notice that the directory variable points to the cited Jenkins log files folder, and the alias value is /logs/jenkins. The other variable (conf) defines the configuration file resultant that will be placed in the Apache folders reserved for configuration files (/etc/apache2/conf*).

The Ansible playbook can be easily adapted to meet your needs. If some developer come to you asking for help, because he or she have to check inaccessible log files, just change the variables values, and execute the playbook against the server where the files are.

Ok, let’s finally implement the solution. Execute the command ansible-playbook playbook-jenkins-logs.yml -u vagrant -k -i hosts.  The -u argument defines the SSH user, the -k argument prompts for password input (vagrant, too), and the -i argument points to the hosts file, where Ansible can find the Jenkins server IP address.

---
- hosts: jenkins
  become: yes
  gather_facts: no
  vars:
  - directory: /var/log/jenkins
  - alias: /logs/jenkins
  - conf: jenkins-logs.conf
  tasks:
  - name: Install Apache 2
    apt:
      name: apache2
      update_cache: yes
  - name: Config Apache logs
    template:
      src: apache-logs.conf
      dest: /etc/apache2/conf-available/{{conf}}
      owner: root
      group: root
      mode: 0644
  - name: Enable new config
    file:
      src: ../conf-available/{{conf}}
      dest: /etc/apache2/conf-enabled/{{conf}}
      owner: root
      group: root
      state: link
  - name: Restart Apache 2
    service:
      name: apache2
      state: restarted

During the execution the Apache HTTP Server is installed, and the configuration file is placed with the right values in the /etc/apache2/conf-available. The file content can be verified through the Ansible ad-hoc command ansible jenkins -m shell -a “cat /etc/apache2/conf-available/jenkins-logs.conf” -u vagrant -k -i hosts. After that, the configuration is enabled by creating a symbolic link in /etc/apache2/conf-enabled folder, pointing right to the configuration file. Lastly, the Apache HTTP server is restarted.

Now open a new tab in your web browser and type the URL http://192.168.33.10/logs/jenkins. You will see all the content of the Jenkins server /var/log/jenkins folder, including the jenkins.log file! Notice that the URL has the /logs/jenkins configured alias. You can after all open the log file in order to get the auto-generated administrator password. Just copy it, go back to the Jenkins configuration initial page, paste the password and continue.

Conclusion

Despite the fact we must follow the company security policies, we must facilitate the troubleshooting process too. DevOps also means one problem is everyone’s problem, so let’s work together in order to solve all of them. If you enjoyed the solution, share it right now!

Before I forget, if you want my help in automating something, please give me more details, tell me your problem. It may be a problem of someone else too.

How to deal with the same configuration file with different content in different environments

Configuration in multiples services at once

Different from the previous post, in this case it was a demand of a dev friend. His application required a specific properties file in order to get the database connection string, an URL to connect to the MongoDB instance. The problem was that each environment had its own MongoDB instance, so the properties file content was different, depending on where it was placed.

The common approach for such problem is to have different versions of the same file, each version with the appropriate content for the related environment. What differentiates one file from another are the directories in the filesystem or the branches in the SCM repository where the files are put in, because they are named based on the environments’ names. When this approach is adopted, the right version of the configuration file is usually embedded to the application package during the deployment process.

The solution tried to eliminate that complexity, decoupling the configuration from the application, and centralizing all the needed configuration in just one file. The solution can be checked out on Github. It was developed using Ansible, and tested in a VM environment built using Vagrant and the VirtualBox hypervisor. The details are shown right below.

The test environment

In order to simulate my friend’s QA environment, with different servers where the application is deployed, 3 VMs were booted up locally: qa1, qa2 and qa3. This way it was possible to test the Ansible playbook during its development, before executing it directly to the real servers.

The Vagrantfile below was used to build such test environment. Notice this is Ruby, each VM was defined within a loop, and received an IP address. The VM image (box) used was minimal/trusty64, a reduced version of Ubuntu, for a faster first-time download and set up during the vagrant up command execution.

Vagrant.configure("2") do |config|
  config.vm.box = "minimal/trusty64"

  (1..3).each do |i|
    config.vm.define "qa#{i}" do |qa|
      qa.vm.hostname = "qa#{i}.local"
      qa.vm.network "private_network", ip: "192.168.33.#{i}0"
    end
  end
end

The playbook execution

With Ansible you can perform tasks in several servers at the same time. It’s possible because everything is done through SSH from a master host, even if it’s your own machine. Besides that, Ansible knows the target servers through the inventory file (hosts), where they are defined and also grouped. In the hosts file below the QA servers were defined inside the group qa.

[qa]
192.168.33.10
192.168.33.20
192.168.33.30

The core of the solution is undoubtedly the config.json file. It concentrates all the needed configuration for each QA server. If my friend’s application requires more parameters, they can be easily added. The host element identifies the target server, and the items are the properties the application has to have in order to run appropriately.

[
  {
    "host": "qa1",
    "items": [
      {
        "key": "prop1",
        "value": "A"
      },
      {
        "key": "prop2",
        "value": "B"
      }
    ]
  },
  {
    "host": "qa2",
    "items": [
      {
        "key": "prop1",
        "value": "C"
      },
      {
        "key": "prop2",
        "value": "D"
      }
    ]
  },
  {
    "host": "qa3",
    "items": [
      {
        "key": "prop1",
        "value": "E"
      },
      {
        "key": "prop2",
        "value": "F"
      }
    ]
  }
]

In the solution, the configuration file is /etc/conf, but it could have any name and could be placed in any directory of the application server. The etc folder has root permissions, so it requires that the SSH user is able to become root (become: yes).

The playbook.yml below is pointing to the qa group previously defined in the hosts file (hosts: qa). Ansible then can execute it against the 3 VMs: qa1, qa2 and qa3. Each one is found out during the gathering facts phase, when the hostname variable is set.

The config variable points to the config.json file content, and the items_query variable is necessary to find inside the JSON content the properties key/value pairs of the respective server. The task ensures that there will be a line in the configuration file for each property.

---
- hosts: qa
  become: yes
  vars:
    hostname: "{{ansible_hostname}}"
    config: "{{lookup('file', 'config.json')}}"
    items_query: "[?host=='{{hostname}}'].items"
  tasks:
  - name: Set the configuration file content
    lineinfile:
      path: /etc/conf
      create: yes
      regexp: "^{{item.key}}=.*$"
      line: "{{item.key}}={{item.value}}"
    with_items: "{{config|json_query(items_query)}}"

The execution of the playbook.yml has the following output. The -u parameter defines the SSH user and the -k parameter prompts for vagrant password (vagrant too). All Vagrant boxes have the vagrant user. Finally, the -i parameter points to the hosts file where the QA servers were defined.

Notice that the changes are made by Ansible in parallel in the servers. If the ansible-playbook command is executed several times, you will have different outputs, because Ansible forks the main process in order to perform the tasks simultaneously in the servers.

ansible-playbook playbook.yml -u vagrant -k -i hosts
SSH password: 

PLAY [qa] **************************************************************************************************************************************************************************************************

TASK [Gathering Facts] *************************************************************************************************************************************************************************************
ok: [192.168.33.10]
ok: [192.168.33.30]
ok: [192.168.33.20]

TASK [Set the configuration file content] ******************************************************************************************************************************************************************
changed: [192.168.33.30] => (item={'value': u'E', 'key': u'prop1'})
changed: [192.168.33.20] => (item={'value': u'C', 'key': u'prop1'})
changed: [192.168.33.10] => (item={'value': u'A', 'key': u'prop1'})
changed: [192.168.33.20] => (item={'value': u'D', 'key': u'prop2'})
changed: [192.168.33.30] => (item={'value': u'F', 'key': u'prop2'})
changed: [192.168.33.10] => (item={'value': u'B', 'key': u'prop2'})

PLAY RECAP *************************************************************************************************************************************************************************************************
192.168.33.10              : ok=2    changed=1    unreachable=0    failed=0   
192.168.33.20              : ok=2    changed=1    unreachable=0    failed=0   
192.168.33.30              : ok=2    changed=1    unreachable=0    failed=0

Finally, you can validate the playbook execution by using Ansible ad-hoc commands, like the one shown below. The command cat /etc/conf was used to ensure that each configuration file content is as expected. Ad-hoc commands are excellent to know what you want about several servers in just one shot.

ansible qa -m shell -a "cat /etc/conf" -u vagrant -k -i hosts
SSH password: 
192.168.33.30 | SUCCESS | rc=0 >>
prop1=E
prop2=F

192.168.33.10 | SUCCESS | rc=0 >>
prop1=A
prop2=B

192.168.33.20 | SUCCESS | rc=0 >>
prop1=C
prop2=D

One interesting aspect of this solution is the capacity of the playbook be executed over and over keeping the same results. In other words, even if someone inadvertently change the configuration file content, it will be fixed right in the next time the playbook is once more executed. It’s called idempotence.

Conclusion

Once again, I helped a friend, and I’m happy for that. Instead of maintaining several files, he maintains a single one, and it turns the configuration much simpler.

This solution can be applied in many use cases, so share it because certainly you will help someone else. And don’t forget to tell me your problem, I want to help you too.

How to unarchive different files in different servers in just one shot

Unarchive multiple files in just one shotIt would be simpler if you had to unarchive just one file in several servers, but what about different files in different servers? A sysadmin friend of mine reached out me with such challenge, once quite often he had to place specific files in a bunch of servers, for monitoring purposes.

He had a routine to package all the needed files, for each server, in TAR.GZ files. After the packaging step, he put all the tarball files in an Apache server, in a way they could be accessed for downloading, each one by an URL. Finally, no matter how long it would take, he logged in server by server, downloaded the specific compressed file, and extracted it to a directory.  It was needless to say there was a better way.

The solution can be checked out on Github. It was developed using Ansible, and tested in a VM environment built using Vagrant and the VirtualBox hypervisor. The details are shown right below.

The environment

In order to simulate my friend’s environment, 3 VMs were used: 1 representing the Apache server, called repo, and 2 representing the different servers: server1 and server2. Each one received an IP address, and the communication between them was established through a private network. Vagrant was the VM management tool used to turn them all on in just one command: vagrant up.  The Vagrantfile below was required by Vagrant to do such task.

Vagrant.configure("2") do |config|
  config.vm.box = "minimal/trusty64"

  config.vm.define "repo" do |repo|
    repo.vm.hostname = "repo.local"
    repo.vm.network "private_network", ip: "192.168.33.10"
    repo.vm.provision "ansible" do |ansible|
      ansible.playbook = "playbook-repo.yml"
    end
  end

  config.vm.define "server1" do |server1|
    server1.vm.hostname = "server1.local"
    server1.vm.network "private_network", ip: "192.168.33.20"
  end

  config.vm.define "server2" do |server2|
    server2.vm.hostname = "server2.local"
    server2.vm.network "private_network", ip: "192.168.33.30"
  end
end

Notice that in the Vagrantfile were defined:

  • The VM image (box) to be used: minimal/trusty64 (requires the Oracle VM VirtualBox Extension Pack), with a reduced version of Ubuntu (faster download and boot);
  • The hostname and the IP of each VM, including how they communicate with each other: private_network;
  • The provisioning of the repo VM, done by Ansible, automation tool required to be installed in the Vagrant host machine beforehand.

The repo server provisioning

The repo server is provisioned by Ansible during the vagrant up execution. The Apache HTTP Server is installed and 2 compressed files are obtained from the Internet. The objective is make the files available for downloading internally, by their URLs. The playbook-repo.yml below is executed by Ansible in order to do such task.

---
- hosts: repo
  become: yes
  gather_facts: no
  tasks:
  - name: Install Apache 2
    apt:
      name: apache2
      update_cache: yes
  - name: Download files
    get_url:
      url: "{{item.url}}"
      dest: "/var/www/html/{{item.dest}}"
    with_items: [{"url": "https://archive.apache.org/dist/maven/maven-3/3.5.0/binaries/apache-maven-3.5.0-bin.tar.gz", "dest": "server1.tar.gz"},
                 {"url": "https://archive.apache.org/dist/ant/binaries/apache-ant-1.10.1-bin.zip", "dest": "server2.zip"}]

Some details about the playbook-repo.yml execution:

  • The VM user must become root, in order to install the Apache Server, hence the become: yes;
  • Ansible by default collects information about the target host. It’s an initial step before the tasks execution. When such information is not necessary, the step can be bypassed. The gather_facts : no in this case is recommended to save time, too;
  • The installation of the Apache Server was done through apt_get, the package management tool of Ubuntu. If the OS were CentOS, for example, it could be installed through yum;
  • Both files are downloaded in just one task. It’s possible because Ansible allows the use of loops, through the with_items statement.

The playbook-servers.yml execution

Ansible can be used for executing tasks in several target hosts in just one shot. It’s possible because of the inventory file, where groups of hosts can be defined. In the hosts file below was defined the servers group, composed by  server1 (192.168.33.20) and server2 (192.168.33.30).

[repo]
192.168.33.10

[servers]
192.168.33.20
192.168.33.30

An important part of the solution was separate all the needed parameters in a specific file, called params.json. In this file, each server has its compressed file URL defined, as long as its target directory, where the downloaded file will be extracted, like shown below. Notice that both URLs point to the repo server (192.168.33.10), and each one to the file previously provided during the provisioning phase.

[
  {
    "host": "server1",
    "url": "http://192.168.33.10/server1.tar.gz",
    "target": "/var/target"
  },
  {
    "host": "server2",
    "url": "http://192.168.33.10/server2.zip",
    "target": "/var/target"
  }
]

With the environment up and the parameters defined, we can finally unarchive different files in different servers in just one shot, executing the command ansible-playbook playbook-servers.yml -u vagrant -k -i hosts. The -u argument defines the SSH user, the -k argument prompts for password input (vagrant, too), and the -i argument points to the hosts file, commented earlier, instead of the default /etc/ansible/hosts.

---
- hosts: servers
  become: yes
  vars:
    hostname: "{{ansible_hostname}}"
    params: "{{lookup('file', 'params.json')}}"
    url_query: "[?host=='{{hostname}}'].url"
    url_param: "{{(params|json_query(url_query))[0]}}"
    target_query: "[?host=='{{hostname}}'].target"
    target_param: "{{(params|json_query(target_query))[0]}}"
  tasks:
  - name: Create the target directory if it doesn't exist
    file:
      path: "{{target_param}}"
      state: directory
  - name: Install unzip
    apt:
      name: unzip
      update_cache: yes
    when: url_param | match(".*\.zip$")
  - name: Unarchive from url
    unarchive:
      src: "{{url_param}}"
      dest: "{{target_param}}"
      remote_src: yes

Some details about the playbook-servers.yml execution:

  • By pointing to the group servers (hosts: servers), Ansible is able to execute the same playbook for both servers: server1 and server2;
  • The parameters of each server are obtained through variables:
    • hostname – the name of the current host found by Ansible during the gathering facts phase;
    • params – the params.json file content, returned by the lookup function;
    • url_query – the query to find the URL parameter defined for the current host;
    • url_param – the URL parameter defined for the current host, returned by the json_query filter;
    • target_query – the query to find the target parameter defined for the current host;
    • target_param – the target directory defined for the current host, returned by the json_query filter.
  • The target directory is created, if it doesn’t exist yet. It’s required by the unarchive task. Otherwise an error occurs;
  • The unzip tool is installed, only if the remote file has the extension ZIP. This step is necessary because that’s the case of the server2’s remote file, and the subsequent unarchive task can extract files compressed through different algorithms. If the when statement condition is not met, the task is skipped;
  • Finally, the compressed file is downloaded from the repo server and extracted to the target directory.
ansible-playbook playbook-servers.yml -u vagrant -k -i hosts
SSH password: 

PLAY [servers] *********************************************************************************************************************************************************************************************

TASK [Gathering Facts] *************************************************************************************************************************************************************************************
ok: [192.168.33.30]
ok: [192.168.33.20]

TASK [Create the target directory if it doesn't exist] *****************************************************************************************************************************************************
changed: [192.168.33.20]
changed: [192.168.33.30]

TASK [Install unzip] ***************************************************************************************************************************************************************************************
skipping: [192.168.33.20]
changed: [192.168.33.30]

TASK [Unarchive from url] **********************************************************************************************************************************************************************************
changed: [192.168.33.20]
changed: [192.168.33.30]

PLAY RECAP *************************************************************************************************************************************************************************************************
192.168.33.20              : ok=3    changed=2    unreachable=0    failed=0   
192.168.33.30              : ok=4    changed=3    unreachable=0    failed=0

Conclusion

My friend became really happy to save a lot of his time using such automation, and I’m sure other sysadmins with the same or similar tasks can benefit from it. So, if you enjoyed the solution, or think it’s useful for some friend of yours, don’t hesitate and share it.

Regardless its utility, bear in mind this solution is a work in progress, so feel free to collaborate and to improve it. After all, that’s the open source way.

Finally, if you want my help in automating something, please give me more details, tell me your problem. It may be a problem of someone else too.