Ansible
The top priority in any organization is automating software and application deployment and server configuration. Today there are more than twenty configuration management systems; the best known are Chef, CFEngine, and Puppet, but Ansible, which arrived later in 2012, has become the most popular. The reasons are a low barrier to entry, a very simple workflow, and strong security. It’s agentless on remote hosts—everything runs over SSH. You typically set up key-based, passwordless authentication for access; LDAP and Kerberos are supported as well.
You can run a single command, a script, or virtually any task on remote hosts—things you’d normally do by hand: status checks, installing or removing packages, creating accounts and setting permissions, copying data, managing the system and services, and much more. Most capabilities are provided through modules that simplify writing tasks; you can call system commands directly if needed, but that’s recommended only when it’s truly necessary. A list of modules is available on the project site.
In addition to Linux, it runs on other operating systems, including Windows. It supports cloud services such as AWS, Azure, and DigitalOcean, as well as network equipment from several vendors. Messaging/notifications and more are available. Tasks can be executed sequentially on each node or synchronously in parallel.
In Ansible you typically work with two files: an inventory that lists hosts organized into groups, and a playbook that defines the tasks to run. Projects are usually kept in separate directories. By default, Ansible uses /etc/ansible/hosts as the inventory, but you can override it on the command line, so when you have multiple projects it’s common to keep the inventory inside the project directory and specify it with the -i option. You can group hosts however you like—by role, purpose, or location.
[Web]
192.168.1.1
192.168.1.2
db.example.com
[Mail]
192.168.1.1
mail.example.com
There are two additional non-SSH connection types: local and docker. It supports nested groups and variables, as well as dynamically generating the host list via a script. Connection variables let you specify a non-standard port, a specific account to run commands, a login key, and so on:
db.example.com ansible_port=1234 ansible_host=192.0.0.5 ansible_user=user
All the magic happens in the playbooks. Playbooks use the YAML data serialization format, which is easy to read. For example, installing nginx using the apt module on Ubuntu/Debian looks like this:
- name: Install the nginx packages
apt:
name: nginx
state: present
update_cache: yes
when: ansible_os_family == "Debian"
The when operator lets you add arbitrary checks to a rule.
Getting started is easy thanks to the hub, which offers community-contributed roles for nearly any task. Just pick one, fetch it from GitHub manually or via ansible-galaxy, and you’ve got a solid starting point.
ansible-galaxy install username.rolename
Each role includes a list of tasks, along with templates and files to be copied. Template variables let you inject different parameters when rendering and copying them to hosts. Tasks run sequentially: the next one starts only after the current one completes successfully. If a host errors out, it’s dropped from the run while execution continues on the remaining hosts. This helps avoid partially applied playbooks on failed hosts, but for interdependent tasks (e.g., deploying a cluster), losing a host can make further execution pointless. For example, let’s check the reachability of all hosts defined in inventory.ini using the ping module:
$ ansible all --inventory-file=inventory.ini --module-name ping
List the hosts and validate the playbook syntax:
$ ansible-playbook -i inventory.ini playbook.yml --list-hosts
$ ansible-playbook -i inventory.ini playbook.yml --syntax-check
Run:
$ ansible-playbook -i inventory.ini playbook.yml
Ansible lets you fully embrace Infrastructure as Code and hand off typically complex operations to a non-specialist, who after initial setup only needs to run a single command. That said, the person writing the playbook still has to understand the process: installing one or two simple roles is usually trouble-free, but once you’ve got a dozen, execution order matters. For example, if a clustered service relies on GlusterFS, it makes sense to install GlusterFS first and only then the service.
Best practices also don’t always fit every application. You’re generally advised to handle service restarts via handlers rather than issuing restart commands directly, so they run at sensible times. But when deploying a MariaDB master–master cluster, it’s often better to control restarts manually, because handlers have an unfortunate tendency to bounce services right when they’re synchronizing.
The documentation is very comprehensive and includes numerous examples.


Prometheus + Grafana
Without a monitoring system, any application is a black box. As load increases, it’s hard to tell what’s happening inside. Most apps are no longer monoliths: components talk to each other via APIs, and their operation depends not just on LAMP but also on services like Elasticsearch and RabbitMQ. Metrics let you see how components behave over time and identify bottlenecks.
One of the best-fit solutions for metrics collection and monitoring in today’s dynamic networks is the combination of Prometheus and Grafana. Prometheus uses a decentralized architecture that makes it easy to add services and servers. Agents installed on remote hosts use predefined configurations to automatically discover running applications on a node, including those in virtualized environments, which greatly simplifies administration. It supports alerting and basic charts for visualizing collected data. Agents are available for the host itself (node_exporter), MySQL, Memcached, HAProxy, Consul, Blackbox, SNMP, and more. Prometheus can also ingest metrics from third-party clients. The most popular is Telegraf, which supports around 80 plugins to collect metrics from Apache, Nginx, Varnish, DBMSs, Docker, Kubernetes, logparser, and so on.
Prometheus is available in the repositories of major distributions, but those packages are often outdated. It’s better to use the official binary releases from the project’s website. You then define all data sources in /etc/prometheus/prometheus.yml, in the following form:
scrape_configs:
- job_name: 'node'
static_configs:
- targets: ['localhost:9090']
labels: {'host': 'prometheus'}
You group hosts with the same settings under job_name, and use labels to filter metrics later by additional parameters.
After configuration, verify there are no errors using the promtool utility.
$ promtool check-config /etc/prometheus/prometheus.yml
And run it once in the console to see the output:
$ prometheus -config.file /etc/prometheus/prometheus.yml
By opening localhost:9090 in your browser, you can view Prometheus’s operational status, inspect the collected metrics as raw data or graphs, and check the status of agents.
Prometheus collects a lot of metrics, but its built-in visualization interface is quite limited. This is where Grafana comes in: out of the box it can display metrics from Prometheus, Graphite, InfluxDB, Elasticsearch, AWS, and many others via plugins. After selecting your Data Sources, you set up dashboards. It supports multiple chart types, and its query language lets you retrieve any data you need. The project website also provides ready-made dashboards (JSON) that are easy to import and edit if necessary. To find the right metrics, use the Metric lookup search located to the right of the Query field. Most importantly, templating is supported (Manage Dashboard → Templating). For example, by defining a host variable (e.g., $host = label_values(host)), you can then use it in metrics instead of a node name or IP:
cpu_usage_system{host="$host"}
After that, just select the desired node in the dashboard. Alerts are available with delivery via email, HipChat, Slack, Telegram, and others. To set one up, define the metric threshold by entering a value manually or dragging the heart icon to the right of the chart, then enable the notification method under Alerting → Notification List. However, in version 4.2 alerts don’t support templated dashboards. You’ll need to create a separate, non-templated dashboard and define only the alerts you want to receive there. Typically, you simply duplicate an existing panel and remove the variables.
The project provides source code and builds for Linux, Windows, macOS, and a Docker image. For Ubuntu, there’s a ready-made package and repository, so installation is straightforward. By default, SQLite is used to store settings and data. For larger deployments with many nodes, consider using MySQL or PostgreSQL.


Concourse CI
Automatic project builds (Continuous Integration) triggered by code updates save a lot of time because you can immediately see the outcome—whether there’s progress and whether any errors popped up. Docker makes this even easier by letting you test across multiple environments. There are plenty of CI tools today, but many are paid and fairly complex, often requiring a specialist to set them up. Concourse CI lets you roll out continuous integration quickly; it’s easy to deploy and doesn’t require a long learning curve. You can figure out how to build Docker images on Git changes in just a couple of hours. It also supports AWS S3 integration, email and HipChat notifications, running commands, and more.
Concourse CI is built around three core concepts: tasks, resources, and jobs. A task is, broadly, any command executed during a build. A resource is any external object whose state and version can be tracked—which is what enables automation. Git is the default example, but it could just as well be a timer. The full list of official and community resource types is available on the site: https://concourse.ci/resource-types.html. A job defines what runs when tracked resources change or when triggered manually. The actual steps are defined in a build plan—running tests, executing commands, building a Docker image. Resources and jobs are connected via pipelines. Pipeline metadata and build logs are stored in PostgreSQL, so you can always audit who did what.
You can set parameters either directly on the command line or in a YAML configuration file. The website provides plenty of examples to help you get up to speed. For managing and manually triggering jobs, use the fly CLI; you can also view results and start jobs through the web interface.
The project provides binaries for Linux, macOS, and Windows, along with prebuilt Docker and Vagrant images.

Functional Testing
Deploying a project is only half the job; without verifying that the application works, it’s almost pointless. That’s why any halfway serious project includes a QA engineer who runs a test suite and records the results. If there’s just a single application, you can probably get away with doing some basic checks by hand on interim builds. But if you’re producing a dozen images for a variety of configurations, you’ll need at least some automation. Among web app testing tools, Selenium is especially popular and has effectively become the standard. At its core is the Selenium browser automation library (formerly Selenium WebDriver), which includes client libraries in multiple languages and browser drivers. Today there are drivers for Firefox, Chrome, IE, Opera, Safari, and a range of mobile platforms. They’re at different stages of maturity and therefore require varying levels of care. The project also provides Selenium IDE as a Firefox extension that lets you record, save, and replay test scenarios for any browser-accessible application. Recorded scenarios are saved as an editable HTML table. You can export them into formats understood by other test frameworks—NUnit, TestNG, JUnit—although, truth be told, practitioners rarely use auto-generated tests in those environments and usually write their own.
The workflow is straightforward. After installing Selenium IDE, you’ll find its launcher under the Tools menu (Ctrl + Alt + S). Open the target site in your browser and start recording, then perform the required steps in sequence. Once recorded, you can run the script either manually or on a schedule. You can also set breakpoints, adjust execution speed, and more.
For small projects, this is usually sufficient, though it’s not the whole story of Selenium automation. Another component in the ecosystem is Selenium Server, which executes browser commands driven by a test script running on a local or remote machine. At this point, it’s also worth getting familiar with the Behat testing framework—you’ll likely need it. Multiple Selenium servers can be organized into a distributed network (Selenium Grid), making it easy to scale your automation setup by running different tests in parallel on different remote hosts, which cuts overall test time. A bonus of this approach is that you can start Selenium Server instances with different parameters, and the appropriate node will be selected for each test automatically. The core topics are covered in the project’s documentation.

Supervisor
On a server—especially a development box—you often need to run a bunch of programs that aren’t installed via the system package manager, but must stay up continuously and restart on failure or after a reboot. Think Node.js apps, Selenium, and custom scripts. You could write init/systemd units, but that usually takes more time and doesn’t always give you the control you want. A solid way to handle this is the Supervisor process manager, which provides a simple, reliable way to manage such applications. The supervisord daemon starts processes as children, allowing it to monitor them and automatically restart when needed. For monitoring and live configuration changes, you can use the supervisorctl console utility and a web interface (enabled via inet_http_server). The required package is already in most distro repositories, so installation should be straightforward:
$ sudo apt install supervisor
Configuration files live under /etc/supervisor/conf.d and must use the .conf extension. By convention, each service is configured in its own file. That said, if you need to run multiple instances with different settings, it’s fine to define them all in a single file for convenience. As an example, let’s configure Selenium to run under Supervisor.
$ sudo nano /etc/supervisor/conf.d/selenium.conf
[program:selenium]
command=java -Dwebdriver.chrome.driver=/usr/local/bin/chromedriver -jar /opt/selenium/selenium-server-standalone.jar -port %(ENV_SELENIUM_PORT)s
priority=10
user=selenium
directory=/home/selenium
environment=HOME="/home/selenium"
autostart=true
autorestart=true
The parameters are straightforward. In program:selenium you set the name under which the service will appear in supervisorctl. command specifies the full launch command with all arguments, and user is the account under which the program will run. Next is the working directory it will start in (it must already exist). The autostart and autorestart options control whether the program starts at OS boot and whether it restarts if it stops. If set to true, the program will always restart after it exits, even if it finished cleanly. If you only want it to restart on failure, use the unexpected setting. The documentation lists many more options for different scenarios. Reload the configuration:
$ sudo supervisorctl reread
If there are no errors, apply the settings:
$ sudo supervisorctl update
If you need to disable a service, it’s best to do it via the interactive mode:
$ sudo supervisorctl
A prompt will appear; type help to list all available commands:
supervisor> help
Let’s restart Selenium.
supervisor> restart selenium
supervisor> status selenium
supervisor> quit
Conclusion
This is essential to know. But it’s just the tip of the iceberg: over the past eight years, DevOps has grown a vast ecosystem of tools and technologies, and the expectations for specialists vary. So you’ll need to keep learning continuously.