This repository is for learners on Corndel's DevOps apprenticeship.
You will need to get the sample Globex database running in your browser.
Open this repo in GitHub Codespaces and follow the set-up instructions.
We're going to start off with exploring the database and using some SQL. Take a look at the questions.
Now you are familiar with the database structure you might have noticed that it's not exactly following best practices. In particular there are some potential GDPR violations, which could get Globex into a lot of trouble.
Conduct a review of the database and identify as many potential problems as possible.
Things to look out for:
- Personal data that may be covered by GDPR
- Other sensitive data that might be being stored incorrectly
- Any other structural problems
Grade your issues by severity and prioritise them as high, medium or low priorities.
Once you've identified the problems it's time to decide how to fix them. You can't fix these all with technical changes to the database (although you will need some of these). You'll also have to think about whether you'll need to change business processes; communicate with your customers; or some combination of these and other things.
Work through your issues in priority order and come up with a plan to fix them.
Database normalisation is an important topic in relational databases. While this is beyond the core scope of a DevOps course it is a useful topic to be aware of. There are a huge number of normal forms but the first three (1NF, 2NF, 3NF) are by far the most well know.
Question 1: Why does the database satisfy the first normal form?
Question 2: Does the database satisfy the second normal form?
- For simplicity here assume the database tables have many more entries than they currently do, i.e. millions (to avoid there being any "accidental" dependencies and candidate keys)
Question 3: Why doesn't the Employee table satisfy the third normal form?
The database we've been using so far has contained a lot of live, sensitive data. For local development and testing we want a database that is close to live but without any sensitive data.
How would you go about masking/redacting the data? Would randomly generating test data based off of production statistics be any better?
One limitation of the current system is that each order can only contain a single product.
Write an SQL script to change the database so that an individual order can contain multiple products. Your script should migrate the existing data so that previous orders aren't lost.
This part of the workshop should take the form of a discussion. See Solutions Design for more details.