Automation is probably the hottest word in IT.  It encompasses significantly more than just 'using ansible'... anything we do to take a previously manual task and hand it off to a machine to do it for us, counts as automation.  In that sense, this article doesn't necessarily have to reference ansible, but fair warning, the examples I'll use will be about ansible.  Just remember that the statements are still true regardless of what your automation methodology might be.

Even though our automation tools have generally been around for 5-10 years, it can be a real challenge to properly apply modern automation techniques to our challenges.  Part of the challenge is that the focus is on those manual tasks, rather than the intent of the task.  The talented individuals being asked to automate things take the request from the person who wants the thing automated, and they act on it.  The person doing the requesting likely has no idea how the automation works or what it can do.  So often what you get is an automated process that does the EXACT set of tasks that were previously manual... with NO value added.  So yes, we saved some time because Bob doesn't have to go touch all 10 of those servers every 6 months.

But if we understand how the automation works, what it CAN do, what it CAN'T do, etc... it allows us to potentially take a step back and ask the question: What is the ultimate goal behind this task?  Sometimes a specific task takes on a life of its own due to events that happen over the years.  If we peel that situation back, we may find that the REASON bob was logging into each of the database servers every Friday and deleting logs... is because there's no log rotation software in use.  Rather than writing automation to just delete the logs, it likely makes better sense to use system components that will rotate the logs automatically.  Maybe the better automation task would be to capture those rotated logs and copy them off to a central log store for analysis..

Instead of having a compliance practice that leverages a system scanner twice a year to scan a system and generate a report that gets sent off for review and ultimate generation of a remediation plan, automation lets us establish a desired 'compliant' configuration, and ENFORCE that configuration every day..  This requires a deviation from the policy, because the policy is built around the scanning/reporting/remediation process.  Getting to the 'continuous compliance' practice requires us to step back from the current multi-team process and ask "What is it we're ultimately trying to accomplish?".  Is the ultimate goal to create a report that gets turned into a remediation report?  Or is the ultimate goal to keep the machine compliant?  If it's the latter, would it make more sense to automate the report generation and communication to remove 2 steps from the process, or would it be more effective to have automation that runs on a schedule which knows the desired configuration and checks all systems to make sure they're in that state..and if they are not..fixes them. 

Even in situations where there may be shell scripts that are known good, that have been used for years...those shell scripts likely predate automation and likely do not take advantage of the capabilities inherent to the automation system.  If we take the time to rewrite the shell scripts in native automation language, we do gain significant advantages.  Using ansible as my example here, when you use native ansible modules to perform tasks, those tasks are performed in a mode referred to as idempotent.  This is a big word that basically means, you can run the task over and over without ever having to worry about anything breaking.  It checks to see whether the task needs to be done before it does anything.  If the system is already in the desired state, it gives the green light, does nothing, then moves on to the next task in the play.

This becomes even more important when we start building complex automations where we chain multiple tasks on top of each other.  

  • register a system
  • set up software repositories
  • install application software
  • create users
  • create groups
  • configure application software
  • integrate application with central identity management

When your task is idempotent, you don't need to go back and comment out the first five steps if they completed successfuly but the sixth failed.  Just fix the failure on item 6, and re-run the whole playbook.  It will see that those first five tasks are already completed, and pick up with the sixth.

It also does these across a selected inventory of hosts, rather than trying to make scripts designed to run on one host, work against multiple hosts.

Summary

So two ideas to take away from this:

  • before blindly automating legacy tasks, take a step back to identify what the task is ultimately trying to accomplish, and focus on automating THAT.  It may end up being the same things that were previously scripted... or it may end up going a completely new direction.
  • if the old scripts are actually what need to be run, rewriting them in native automation language will provide new efficiencies and allow integration points that did not exist previously.