Skip to content

corndeladmin/DevOps-Unit-6-Workshop

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 

Repository files navigation

DevOps Unit 6 Workshop

Welcome to the Unit 6 Workshop!

Our objectives

The objectives for today are:

  • Get experience connecting to virtual machines
  • Gain familiarity in working with virtual machines at the command line
  • Become comfortable using configuration management tools to provision virtual machines

Pre-flight checks

Today we'll be using SSH to connect to a virtual machine, operate it remotely using command-line tools, perform some setup on that virtual machine - representative of setup you might need to perform as part of enabling various features you might want in your application in a real-world scenario, and then use Ansible to automate that setup.

The virtual machines we're going to be working with are going to be spun up using a tool called CloudFormation. This is an AWS tool that lets you specify resources and infrastructure you want using a YAML file. This YAML file describes the things you want, and CloudFormation will take this YAML file and create the resources for you. Sound familiar? That's right, it's infrastructure as code in action!

That said, we won't look in too much detail yet at the CloudFormation template we'll be deploying - it involves a few cloud specific concepts that we haven't covered yet. This preamble is mostly to explain that the bulk of the technical stuff that'll happen today will be happening on remote virtual machines, and in order for you to complete this workshop you'll only need:

  • A terminal with the ability to SSH into an EC2 VM (you could use the ACG terminal if needed)
  • An ACG account to spin up an AWS sandbox with (or an AWS account you're happy to create resources in)

If you're using an ACG account please don't forget that your AWS account automatically gets cleaned up after 4 hours. You can extend its lifespan 1 hour before cleanup from the cloud sandboxes page (it might be worth setting an alarm for this!).

Using SSH

If you haven't yet done so, you should make sure you have created SSH keys and are able to use them to connect to a remote machine (or if you are unable to, that you are happy to use an alternative like ACG terminal). Try creating an ACG cloud server and SSHing into it!

Github has a comprehensive guide for all major operating systems: Connecting to Github with SSH

Morning (Part 1)

Setting up the environment

We're going to start by spinning up our workshop's virtual machines using CloudFormation.

  • Log into your ACG account and create an AWS cloud sandbox.
  • Using an incognito window and the credentials provided, log into your AWS cloud sandbox and navigate to the CloudFormation service. You can do this by clicking in the giant search box at the top of the AWS console and typing "CloudFormation".
  • The CloudFormation service has a "Create stack" button. Click this and upload the chimera-cloudformation-template.yml file from this repository. You can clone this repo locally to your machine first if that's your preferred way of getting this file on your machine.
    • After this workshop, it might be worth having a quick look through this file - just to get a bit of an idea how the structure of YAML helps describe resources, and to see how much of it you think you can understand, or make educated guesses at how it works.
  • The template will ask you for a few parameters. You can leave these as default, but you'll need to specify a name and resource key - these can be an alphanumeric string of your choice, something like "my-unit-6-workshop" is sufficient, for example. If you like, you can choose the password you'd like the virtual machine user to be configured with. Either way, make a note of the password before proceeding.
  • Follow the prompts to create the stack. You don't need to make any further changes.
  • Once you've started, you'll be taken to a list of "stacks", where you should be able to see the new one you created, labelled with "CREATE_IN_PROGRESS". This process will take a few minutes, so it's not a bad time to make a cup of tea! Normally it should take a bit less than 5 minutes. If it takes more than 10, you should consider trying again by creating a new stack. You can click the refresh button in the "Events" tab to see the progress of the stack creation.
  • After a bit, the progress of the stack creation will change from "CREATE_IN_PROGRESS" to "CREATE_COMPLETE". At this point, you want to navigate to the list of EC2 instances that exist, so that you can find out what's been created, and what their IP addresses are. Use the AWS console search box at the top to navigate to EC2.
  • The EC2 dashboard should show "Instances (running): 3" and "Auto Scaling Groups: 2" amongst other pieces of information. We want to see a list of our EC2 instances, so click on "Instances (running)".
  • You should see a list of 3 EC2 instances, 1 of which belongs to Part 1 of this workshop, and 2 of which belong to Part 2. By clicking on any of them, you'll see a panel that provides lots of information about them - including their IP address (you're looking for "Public IPv4 address"). You want to make a note of the IP addresses for each of these machines - note down the IP address for your Part 1 VM, as well as the 2 IP addresses for your Part 2 VMs.
  • Let's test these VMs! Open your terminal and try doing ping 100.100.100.100 (except use the IP address for your Part 1 VM instead of 100.100.100.100) - this command sends out packets to the VM that the VM should respond to, and by getting these responses, you learn (1) that the VM is awake and connected to the internet, and (2) from how long it took the VM to respond, you know something about the quality of your network performance between your computer and the VM, or something about how far away it is (as a rule of thumb, pinging equipment inside your own network should be sub-10ms, pinging servers inside Europe should be a 2 digit number of milliseconds, and pinging across the Atlantic to the US will likely take at least 100ms).

SSH to your VM

You now have a virtual machine for Part 1. In order to connect to this VM, you'll need (1) credentials, and (2) a hostname or IP address. Fortunately, you noted down the IP address of your VM, and the password that you chose for it (unless you left it at the default) was specified at the time you created the CloudFormation stack.

  • In addition to those, we can tell you that the username you can connect to the VMs with is ec2-user.
  • Use these credentials to connect to your VM over ssh.
  ssh user@hostname

You should now be connected to the remote VM. Try a few simple shell commands to confirm you're connected (pwd, ls /, uname -a for example).

The preferred method of connecting is actually with SSH keys. This removes the need to provide a username and password every time, is more secure and is easier to administrate. Let's learn how to do this now.

If you don't have an SSH key you can set one up following this guide

  • Only follow the "Generating a new SSH key" section
  • For the purposes of the workshop we suggest you do not use a password on your key (leave it blank)

You should add both your public and private keys to your ~/.ssh folder (the ssh-keygen CLI tool will do this by default).

You should add your public key on a new line of the authorized_keys file on the remote VM.

# Opening the authorized_keys file for editing
vim ~/.ssh/authorized_keys

Reminder: How to edit and save a file with vim:

  • You start in Command mode and cannot simply type text, though you can still paste from the clipboard, with Shift + Insert.
  • Press i to enter Insert mode to type text. Press Escape to return to Command mode.
  • From Command mode you can save & exit by typing :wq and then pressing Enter.

Once this is complete try disconnecting and reconnecting to the VM over SSH (if you are not prompted for a password you know it has worked!).

Getting started with Chimera

This workshop revolves around fixing up a tool called Chimera, which comprises a web application and a command line tool. The web application displays data on a map, and the command line tool generates datasets for the web application to display.

The VM has a (partially) working version of Chimera on it. There is pre-existing data that you should be able to find if You can see this data by visiting the website that is hosted by the web app. Simply go to your web browser, and navigate to the URL http://100.100.100.100 (except with your Part 1 VM's IP address rather than 100.100.100.100).

Make sure you use HTTP and not HTTPS (i.e. no s before the :// in the URL)!

Your web browser will find and connect to the virtual machine with that IP address, ask for the webpage, and return it to you and render it. If you navigate to this URL, you should be able to see the page it produces. This page should say "Chimera Reporting Server" at the top with a world map taking up most of the page.

The webapp part appears to be functioning correctly, so let's focus on the other part of Chimera - the command line app.

On the VM you should be able to find cliapp. This is a command line program with minimal documentation (see cliapp_reference.md). Its purpose is to generate "datasets" for the webapp to display. You should be able to run it like so:

cliapp --version

This should print the version number of the program to the console. If you see this, you can move onto the next section.

Input Data

If you run cliapp without any arguments it will print its output to the console. You should see something like this:

{
  "datasetName": "edge-throw-except",
  "data": [],
  "generationTime": 1596458170057,
  "centre": "[0, 0]",
  "zoom": null
}

This is an empty dataset with a randomly generated name. To produce a useful dataset we need to provide an input file in the correct format.

Create a new file using the touch command and pass it to cliapp using the -i flag.

touch some_file_name
cliapp -i some_file_name

The result should be very similar to running the program without any arguments. This isn't surprising as our input file is currently blank.

Open your input file using vim and add the following line.

59.0916|-137.8717|111 km WSW of Covenant Life, Alaska|6.4

Now run cliapp -i some_file_name again. What happens?

Hopefully cliapp generated a non-empty dataset. If you copy the dataset name and paste it onto the end of your URI in your web browser, you should see something on the map.

http://{hostname}/{datasetName}

Data Manipulation

Now we know how to feed data into Chimera, we need to work out how to do this more efficiently.

We want to use USGS as our source. They have various regularly updated feeds. Start off using the hourly feed. Try using the command line to fetch the data and then convert it to the pipe delimited format that cliapp can read.

Tips:

  • Use curl to save a copy of the JSON feed locally (it's quicker than downloading it every time). Use the -o option to specify a file to save it to.
  • The command line program jq is very useful for manipulating JSON data. You can find its manual here.
    • There is also this Cheatsheet that provides helpful examples.
    • String interpolation is a convenient way to build a string. When operating on some data like: {"stringProp": "foobar", "intProp": 1}, then the jq expression "\(.stringProp) raw text \(.intProp)" will evaluate to "foobar raw text 1".
    • The jq command itself has an option -r to return string outputs "raw", meaning without wrapping it in quotes.
  • Use > to direct the output of a command to a file. E.g. echo "something" > output.txt
If you're interested in experimenting with jq and working with this dataset, you can take on this task by yourself by looking at the jq documentation in order to build the right command. Otherwise, or if you get stuck, you can open this panel to receive hints, or the answer.

Read your json file and feed it into jq:

cat earthquakes.json | jq
Next step

The downloaded JSON contains far more data than you need. The outer object has a field called "features" which is the actual list of earthquakes.

cat earthquakes.json | jq '.features'
Next step

To loop over the list and evaluate an expression on each, you can use [] and jq's own pipe operator.

cat earthquakes.json | jq '.features[] | .geometry'
Next step

Access nested values by chaining dots. Combine values with string interpolation.

cat earthquakes.json | jq '.features[] | "\(.geometry.coordinates[0])|\(.properties.place)"'
Complete command
cat earthquakes.json | jq -r '.features[] | "\(.geometry.coordinates[1])|\(.geometry.coordinates[0])|\(.properties.place)|\(.properties.mag)"' > earthquakes.psv

When you have a working command, put it in a bash script. This can simply be a text file containing your commands. but the file extension should be .sh.

You will often see a "shebang" such as #!/bin/bash -e as the first line in a .sh file. This explicitly specifies what should run the file, and the -e is a useful option that means the script will exit if any command exits with an error.

Make it executable (run chmod +x your_script.sh) and then try executing it (run ./your_script.sh) to check it works. The script should:

  • Fetch fresh data
  • Convert it to the correct format
  • Use cliapp to generate a dataset that the web app can display.

Automation: Stage 1

When you've written your script we need to automate executing it. You'll want to use crontab to call the script on a schedule.

The VMs do not come with crontab installed so you'll need to install & setup crontab with the commands:

sudo yum install cronie cronie-anacron
sudo service crond restart

A summary of crontab and some tips:

  • Use crontab -e to edit the crontab, which is simply a file listing scheduled jobs. This will open the file in vim.
  • Each line is a job, consisting of a cron expression (i.e. when it should run) followed by a shell command to run.
  • Each user has their own crontab. It will execute as them, and in their home directory.
  • An important difference between the cronjob execution and running the command yourself in bash is that the cronjob will not have run the ~/.bash_profile file beforehand, which sets up some important environment variables.
  • You can experiment with the meaning of cron expressions here.
  • You might be wondering how to check the output of your cron jobs. After it runs, and you next interact with the terminal, you will see a message in the terminal saying "You have new mail" and a filepath. You can read the file with cat or tail. E.g. cat /var/spool/mail/ec2-user.
Troubleshooting Hints for Crontab

If you're not seeing your cronjob succeeding, do you know if it's running at all?

How can we tell if the job is being attempted?

There are a few things we could try:

  • Keep an eye out for messages stating you have mail - if you do then take a look to see if it has any hints on what's happening!
  • Check any outputs that your script has - are they updating? For example, you can see the modified time of files using ls -l
  • If you're not seeing either of the above, check that your cron expression is described by crontab guru as what you'd expect
How can I output the logs to my own log file?

Good question, let's ask the internet!

The job is running but it fails

First of all, double check that the script runs successfully when you run it directly. If it doesn't, start by fixing it up.

If it's running directly but not through cron, check the error messages that you receive. Clearly we have some differences in how cron is running the job; we may need to provide some extra specifics within our script to help cron know how to run the job.

In particular, the following may be of assistance:

  • The whereis command can help you find the absolute path for commands
  • You can set environment variables within a script using export VAR="value"

New Requirements (stretch goal)

We've got some requirements from the CEO:

  • A dataset should be generated every five minutes, containing earthquakes in the last hour. It should be displayed on the site at /latest.
    Hint Use the --dataset-name option mentioned in the cliapp_reference.md to specify a dataset name of "latest".
  • The same data should also be available with a dataset name containing the date and time it was generated.
    HintUse the date command. E.g. date +"%y" would give you the year.
  • Any datasets older than 24 hours should be automatically deleted. More recent ones should be kept accessible.
    HintUse the find command on the folder containing the datasets. It has options to filter by date/time last modified, and an option to delete the files it finds. The DATA_FOLDER environment variable will tell you where the datasets are stored.

When you think your cronjob is working, check! Either look at the "dataset generated" timestamp on the website, or check the logs in /var/spool/mail/ec2-user.

Afternoon (Part 2)

If you haven't finished the "Data Manipulation" step yet, do so before starting the next steps - use the provided jq command in the hints there to proceed

Scaling up

Because of your great work, the Chimera website has become hugely popular - it's been featured on the TV news and you have thousands of users every day. In fact, it's become so popular that it's starting to slow down due to the sheer number of users.

We need to scale up!

We're going to set up two more VMs so we can share the website traffic between them.
For this exercise, we're going to focus on setting up the new VMs. We won't deal with how to share the traffic between them.

You should have hostnames (IP addresses) for the two new VMs. These are the 2 VMs for Part 2 that we saw in the list of EC2 instances earlier. The credentials to log in are the same as for your first VM.

Now, we could set up these VMs by SSHing into them and running some commands, like we did with the first VM in the morning. But, instead we're going to use Ansible.

Why Ansible?

The main reasons for using Ansible are:

  • Ansible makes it just as easy to set up 100 machines as it is to setup the first 1. So, when we need to scale up further, this will be simple.

  • Ansible is declarative rather than imperative.

    • Declarative: we say what we want the end state to be
    • Imperative: we say the steps we want to take

    This makes it easier to manage. It gives us a lot of reassurance because Ansible files describe the state of the servers and can be managed in source control, like code. When it runs, Ansible can tell us what steps it has already taken, whether steps succeeded or failed (we'll see this in practice later).

How to Ansible

Ansible needs two machines (either physical machines or VMs) to run:

  • Control node
    Any machine with Ansible installed. You can run Ansible commands and playbooks by invoking the ansible or ansible-playbook command from any control node. You can use any computer that has a Python installation as a control node - laptops, shared desktops, and servers can all run Ansible.
    Note: However, you cannot use a Windows machine as a control node.

  • Managed nodes
    The servers you manage with Ansible. Managed nodes are also sometimes called “hosts”. Ansible is not installed on managed nodes. They only need a way for the control node to connect (e.g. SSH). Most operations require Python but Ansible can use the "raw module" to install Python itself.

This morning, you created 3 VMs: the 1 original VM and the 2 new VMs.
Control Node: Use the original VM for Part 1 as the Control Node.
Managed Nodes: Use the 2 new VMs as the Managed Nodes.

If you use Mac or Linux, you can even choose to use your own computer as the Control Node instead (you'll need to install Ansible on it first).
If you use Windows, you'll have to use the original VM (or some other VM you may happen to have) as the Control Node, since Ansible doesn't run on Windows.

Let's get going

Step 1. SSH into your Control Node (the original VM) if you aren't already connected.

Step 2. Check that Ansible is installed

Run the command ansible --version. If this prints out a version number + some other info, then Ansible is installed.
Don't worry if it says [DEPRECATION WARNING] etc.

If Ansible isn't installed, then we can install it with sudo pip install ansible.
This might take a minute to run. It might show a warning/error message at the end.
Even if it does show a warning/error, run ansible --version to check if it succeeded.

Step 3. Test the SSH connection from the Control Node to each of the Managed Nodes.
You are currently SSH'd from your own machine to the Control Node.
You are now going to SSH from the Control Node into each of the Managed Nodes in turn.

                                      .--------------->  Managed
                                    .'   SSH tunnel       Node 1
 Your   ----------------->  Control
laptop      SSH tunnel       Node  .     
                                    `----------------->  Managed
                                         SSH tunnel       Node 2
Hint: From your SSH session on the control node, SSH onto the Managed Node:
ssh ec2-user@HOSTNAME-OR-IP-ADDRESS-OF-MANAGED-NODE
It should ask you for a password (it's the same as the password to the first VM)

To exit each SSH session, run:
exit

Step 4. Create an SSH key and add it to the Managed Nodes.

All this typing in of passwords is a bit boring, isn't it!
Let's save ourselves some hassle by creating an SSH key instead. This is the same as what we did at the start of the workshop but this time we'll create the key on the Control Node and copy the public part onto the two Managed Nodes.

That way, the Managed Nodes will allow the Control Node to connect.

  • SSH from your laptop onto the Control Node (if you aren't already connected).
  • Create an SSH key with ssh-keygen (accept the default options)
  • Copy the SSH key onto the Managed Nodes, but rather than doing it manually, use the ssh-copy-id command this time. Run this command twice, once for each node:
    ssh-copy-id ec2-user@HOSTNAME-OF-MANAGED-NODE
    You will be asked for the password to the Managed Node.

Step 5. Tell Ansible which machines we want it to control.

Ansible works against multiple Managed Nodes or “hosts” in your infrastructure at the same time, using a list known as Inventory. The most common formats are INI and YAML.

We will create our own inventory file.
Here's an example INI format inventory:

[my_group_name]
host-name-of-server-1.example.com
host-name-of-server-2.example.com
3.10.179.156

In an Ansible inventory file, you can create groups, each with a group name.
Within each group, you add lines, each with a hostname or IP address.

Have a go creating your own inventory file - perhaps save it in your home directory and name it my-ansible-inventory

Hint: SSH from your laptop onto the Control Node (if you aren't already connected).

Check you're in the home directory with the command pwd. This stands for "print working directory" and should print /home/ec2-user. If not, you can change to your home directory with cd ~ and then run pwd again to check.

Create the inventory file with vim or nano: vim my-ansible-inventory

Add the inventory details:

  • Create a group called "webservers"
  • Add the two hostnames (or IP addresses) of your new VMs
[webservers]
1.2.3.4
5.6.7.8

Step 6. Give Ansible our instructions

We give Ansible instructions by creating a Playbook.
If you need to execute a task with Ansible more than once, write a playbook and put it under source control. Then you can use the playbook to push out new configuration or confirm the configuration of remote systems.

First, create the playbook, let's call it my-ansible-playbook.yml

Hint:

Create the file in the home directory of your Control Node, next to your inventory file. Again, use vim or nano, e.g. vim my-ansible-playbook.yml

Playbooks are expressed in YAML format.

A playbook is composed of one or more ‘plays’ in an ordered list.
For now you just need one playbook, containing one ‘play’. In each play, we specify:

  • the name of the play
  • the group of hosts to run the play on (using the group name from the Inventory)
  • the remote_user that Ansible should use to connect to the hosts (Ansible connects to the hosts using SSH)

For example:

- name: Install Chimera on new web servers
  hosts: webservers
  remote_user: ec2-user

Each play runs one or more tasks. Each task calls an Ansible module.
We will have several tasks in our playbook.
A playbook runs plays in order from top to bottom. Within each play, tasks also run in order from top to bottom.

Let's create our first task, then inspect what it's doing:

- name: Install Chimera on new web servers
  hosts: webservers
  remote_user: ec2-user

  tasks:
  - name: Create log folder
    ansible.builtin.file:
      path: /var/log/chimera
      state: directory
      mode: '777'
    become: yes

Each task has a name (e.g. "Create log folder").
The task then picks a Module (e.g. "ansible.builtin.file") and uses that as a key.

There are a huge number of Ansible Modules, grouped into Collections.
Here is the list of all Ansible Collections.
And here is the list of Modules within the "ansible.builtin" collection.

In this case, we are using the ansible.builtin.file module, which can manage files, folders and file properties.

Each Module performs a specific task and takes a different set of parameters.
You can check on the Ansible website to find out which parameters to use.

In this case:

  • path: /var/log/chimera means... act on the file or folder /var/log/chimera
  • state: directory means... make this path be a directory
  • mode: '777' means... give everyone full permissions to this directory

become: yes is an option on all Modules.
It tells Ansible to run the Module as the root user (like putting sudo before a shell command). See the Ansible website for more details about "become".

Step 7. Time to run the playbook for the first time

Run your ansible playbook using this command:
ansible-playbook my-ansible-playbook.yml -i my-ansible-inventory

The output should look a bit like this:

PLAY [Install Chimera on new web servers] ********************

TASK [Gathering Facts] ********************
ok: [3.10.179.156]
ok: [35.178.196.136]

TASK [Create log folder] ********************
changed: [35.178.196.136]
changed: [3.10.179.156]

PLAY RECAP *********************
35.178.196.136 : ok=2    changed=1    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0
3.10.179.156 : ok=2    changed=1    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0

Notice that, within the "Create log folder" task, it says changed for both nodes.
This is how we can tell that Ansible has made a change.

Now run the playbook again...
ansible-playbook my-ansible-playbook.yml -i my-ansible-inventory

Now, for the "Create log folder" task, it says ok instead of changed:

PLAY [Install Chimera on new web servers] ********************

TASK [Gathering Facts] ********************
ok: [3.10.179.156]
ok: [35.178.196.136]

TASK [Create log folder] ********************
ok: [35.178.196.136]
ok: [3.10.179.156]

But why?
Simply because the folder already exists (Ansible created it the first time you ran the playbook).

So, Ansible checks the state of the folder before trying to create it.
If the folder already exists, and has the correct permissions, it won't make any changes. Isn't that reassuring? 😊

You can run the playbook just to check that everything is in the correct state, safe in the knowledge that it won't change anything that doesn't need changing.

Step 8. Write the rest of the playbook

Now it's over to you to write the rest of the Ansible playbook.
I'll give you a list of tasks to add below.

For each task, you'll need to:

  • Work out which Module you need
  • Work out which options you need for the module
    You can look this up on the Ansible docs
  • Work out whether you need to run the module as root user (become: yes)
    Hint: you can just try without first - it will fail and tell you if it doesn't have permission to perform the task

For reference, here's the Ansible Modules that we'll be using:

Write your playbook one task at a time.
After each new task, run the playbook so you can check if it works and you can see Ansible making the changes.
Notice how each task shows changed the first time and ok subsequently.

Here's the list of tasks you'll need to add:

Task a: Create folder for the Chimera webapp binary
We want to create the folder /opt/chimera/bin

Hints:
Hint 1: Give your task a name The name could be Create folder for the Chimera webapp binary
Show me the code
      - name: Create folder for the Chimera webapp binary

Hint 2: Which module should you use? We want to create a folder.
So, we should use the ansible.builtin.file module.
Show me the code
  - name: Create folder for the Chimera webapp binary
    ansible.builtin.file:

Hint 3: Which folder do we want to create? We want to create the folder /opt/chimera/bin
Show me the code
  - name: Create folder for the Chimera webapp binary
    ansible.builtin.file:
      path: /opt/chimera/bin

Hint 4: What state do we want the folder to be in? We need to specify that it's a folder that we're creating, not a file
Show me the code
  - name: Create folder for the Chimera webapp binary
    ansible.builtin.file:
      path: /opt/chimera/bin
      state: directory

Hint 5: Do we need to run this as a root user? Try it out? If it fails, try adding become: yes
Show me the code
  - name: Create folder for the Chimera webapp binary
    ansible.builtin.file:
      path: /opt/chimera/bin
      state: directory
    become: yes


Task b: Copy webapp program from Control Node to Managed Nodes
The Control Node (the VM that you're running Ansible on) has the webapp program running on it.
It's file name is webapp and it's in the folder /opt/chimera/bin/.
We want to copy this file to the Managed Nodes.
We want to put it in the same folder on the Managed Nodes (/opt/chimera/bin/).
The file we're copying is a program, so make sure it is executable.

Hints:
Hint 1: Give your task a name The name could be Copy webapp program from Control Node to Managed Nodes
Show me the code
  - name: Copy webapp program from Control Node to Managed Nodes

Hint 2: Which module should you use? We want to copy a file from the local to remote machine.
So, we should use the ansible.builtin.copy module.
Show me the code
  - name: Copy webapp program from Control Node to Managed Nodes
    ansible.builtin.copy:

Hint 3: Which file do we want to copy? It's file name is `webapp` and it's in the folder `/opt/chimera/bin/`.
You'll need to join the folder and file name to give the full path
Show me the code
  - name: Copy webapp program from Control Node to Managed Nodes
    ansible.builtin.copy:
      src: /opt/chimera/bin/webapp

Hint 4: Where do we want the file to end up? We want to put it in the same folder on the Managed Nodes: /opt/chimera/bin/
Show me the code
  - name: Copy webapp program from Control Node to Managed Nodes
    ansible.builtin.copy:
      src: /opt/chimera/bin/webapp
      dest: /opt/chimera/bin/

Hint 5: Does the file need any special permissions? Remember to make the file executable.
The docs for this aren't very clear - a nice way of doing this is using the mode '+x'
Show me the code
  - name: Copy webapp program from Control Node to Managed Nodes
    ansible.builtin.copy:
      src: /opt/chimera/bin/webapp
      dest: /opt/chimera/bin/
      mode: '+x'

Hint 6: Do we need to run this as a root user? Try it out? If it fails, try adding become: yes
Show me the code
  - name: Copy webapp program from Control Node to Managed Nodes
    ansible.builtin.copy:
      src: /opt/chimera/bin/webapp
      dest: /opt/chimera/bin/
      mode: '+x'
    become: yes


Task c: Copy start-webapp.sh from Control Node to Managed Nodes
Here's another file to copy from the Control Node to the Managed Nodes.
It's file name is start-webapp.sh and it's in the folder /opt/chimera/bin/.
We want to put it in the same folder on the Managed Nodes (/opt/chimera/bin/).
The file we're copying is a program, so make sure it is executable.

Hints: This task will look a lot like the previous one. You just need a different "name" for the task and the correct "src" filename

Task d: Copy webapp.service from Control Node to Managed Nodes
Here's another file to copy from the Control Node to the Managed Nodes.
It's file name is webapp.service and it's in the folder /opt/chimera/bin/.
This time we want to put it in a different folder
We want to put it in folder (/usr/lib/systemd/system/) on the Managed Nodes.
The file we're copying is not a program, so it doesn't need to be executable.

Hints:
Hint 1: Give your task a name The name could be Copy webapp.service from Control Node to Managed Nodes
Show me the code
  - name: Copy webapp.service from Control Node to Managed Nodes

Hint 2: Which module should you use? We want to copy a file from the local to remote machine.
So, we should use the ansible.builtin.copy module.
Show me the code
  - name: Copy webapp.service from Control Node to Managed Nodes
    ansible.builtin.copy:

Hint 3: Which file do we want to copy? It's file name is `webapp.service` and it's in the folder `/opt/chimera/bin/`.
You'll need to join the folder and file name to give the full path
Show me the code
  - name: Copy webapp.service from Control Node to Managed Nodes
    ansible.builtin.copy:
      src: /opt/chimera/bin/webapp.service

Hint 4: Where do we want the file to end up? We want to put it in folder /usr/lib/systemd/system/ on the Managed Nodes.
Show me the code
  - name: Copy webapp.service from Control Node to Managed Nodes
    ansible.builtin.copy:
      src: /opt/chimera/bin/webapp.service
      dest: /usr/lib/systemd/system/

Hint 5: Does the file need any special permissions? The file we're copying **is not** a program, so it doesn't need to be executable.
So, there's nothing to add here.

Hint 6: Do we need to run this as a root user? Try it out? If it fails, try adding become: yes
Show me the code
  - name: Copy webapp.service from Control Node to Managed Nodes
    ansible.builtin.copy:
      src: /opt/chimera/bin/webapp.service
      dest: /usr/lib/systemd/system/
    become: yes


Task e: Start the webapp service
The previous task copied webapp.service into the /usr/lib/systemd/system/ folder.
This installs webapp as a systemd service. We now want to start this service.

systemd is a tool on most Linux distributions for managing user processes. We are using it to start the webapp more reliably than simply running the executable directly. If we needed to restart the app after a change, we would be able to do so easily with the systemd module.

Hints:
Hint 1: Give your task a name The name could be Start the webapp service
Show me the code
  - name: Start the webapp service

Hint 2: Which module should you use? We want to start a systemd service.
So, we should use the ansible.builtin.systemd module.
Show me the code
  - name: Start the webapp service
    ansible.builtin.systemd:

Hint 3: Which service do we want to start? The service name is webapp.service (which we can shorten to webapp if we like.).
Show me the code
  - name: Start the webapp service
    ansible.builtin.systemd:
      name: webapp

Hint 4: What state should the service end up in? We want the service to be "started".
If the service is already running, then we want to leave it running (no need to restart it). You could set state to "restarted" instead, if you want to ensure it is restarted after a change to the app or its configuration.
Show me the code
  - name: Start the webapp service
    ansible.builtin.systemd:
      name: webapp
      state: started

Hint 6: Do we need to run this as a root user? Try it out? If it fails, try adding become: yes
Show me the code
  - name: Start the webapp service
    ansible.builtin.systemd:
      name: webapp
      state: started
    become: yes


Step 9 Check the webapp is working!

Once Ansible says everything has succeeded, the webapp should be working on each of your 2 new VMs. Let's check this.
Try to visit the hostname (or public IP address) of your managed nodes and you should hopefully see the earthquake map (try this for each of your 2 managed nodes).

Stretch Goals

You've done the core part of the exercise so feel free to pick and choose stretch goals that interest you more.

Ansible: Further Steps

You now have the webapp running on the 2 new VMs but not with any real data. Let's fix that, but note the cronjob to generate datasets is not something we want to scale in the same way: we want the same datasets available on all of the replica webservers. So the data processing job should only run once. Then after it generates datasets, Ansible can synchronise the data folder from that machine with any other webservers.

Update your playbook:

  • Add a new task to create a folder at /opt/chimera/data on each host (to store datasets). You can create it similarly to the log folder.
  • Run your updated playbook.

Write a new playbook

This will be for generating and sharing datasets, so call it something like "datasets-playbook.yml".

In the new playbook, add a play that runs on just one server. It should do the following:

  • Copy two files from the control node onto the target server: the cliapp executable and the script you wrote earlier today.
  • Use the ansible.builtin.yum module to install jq.
  • Run your shell script for generating the "latest" dataset. Note that you may need to set DATA_FOLDER and PATH variables for your script to work.
  • Use the ansible.posix.synchronize module to copy the data from the host onto the control node. Alternatively, you could try running an rsync command directly with the "ansible.builtin.command" module.

Add a second play to that playbook. It should run on all webservers and just needs one task - copy the up to date datasets folder from the control node onto each host.

Test it out

Running this playbook should generate a "latest" dataset and synchronise the same data folder across all hosts. Try it out with an ansible-playbook command! Check it works by opening up "/latest" in your browser using the hosts' addresses.

If your shell script also deletes old datasets, you will want the "delete" option of syncrhonize/rsync to make sure that they get deleted on the control node/other hosts.

Run it on a schedule

To run this playbook on a schedule, you could manually modify your existing crontab, but let's keep going with Ansible. Use the ansible.builtin.cron module to manage a cronjob.

Hint

Add a second play to your original playbook. It should target the control node itself and just needs one task, using that "cron" module to manage a cronjob. The cronjob should run your new "datasets-playbook.yml" playbook via Ansible. You should manually remove any existing cronjobs on your control node.

Yes, your control node can manage itself! You can do this by specifying hosts: localhost when writing a play.

Automation: Stage 2

The CEO has come back with some more requirements for you.

  • At midnight everyday the script should create a summary of the last 24 hours. It should be called yesterday.
  • Every hour during the day a new dataset called today should be produced. It should show all earthquakes since the most recent midnight.

Hints

  • The USGS has more than just the hourly data feed.
  • You might want to add some parameters to your script, if you haven't already.

Document cliapp

If you run cliapp --help you'll see lots of different options that can be passed to the program. We've documented some of these in cliapp_reference.md but some are a complete mystery.

Can you work out what they do and complete the documentation?

Create an API

We only suggest this stretch goal if you would particularly like more experience creating APIs, and provoking thoughts about API design.

While command line programs are very useful in certain circumstances they do have their limitations. We would like to create an API that makes the data produced by cliapp available as JSON, rather than through the rather clunky webapp.

Use your existing knowledge of Flask (and documentation from the internet) to create an API to run alongside the webapp. You can work on it locally to start with (remembering any extra requirements e.g. to set a DATA_FOLDER environment variable). Your initial goal should be to create an endpoint to GET a dataset based on its name:

// e.g. GET the dataset called `edge-throw-except`
GET http://localhost/dataset/edge-throw-except

To deploy it to the Ansible hosts, you could publish your app to a public repository and then get Ansible to pull it onto the hosts, install dependencies and start up the app.

Once you have this working consider other improvements. A GET endpoint to list all the currently available datasets? A POST endpoint that accepts input data and options and invokes cliapp with them? There are lots of possibilities.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •