Welcome! My name is Allan Ruivo Wildner, and this profile is intended to present my portfolio of personal projects. Here, you will find the projects I have developed throughout my journey of learning and professional growth in the field of technology.
- Email: allanruivo@outlook.com
- 1 - Data from IBGE using airflow + EC2 + docker + streamlit
- 2 - MTG cards price via Scryfall API
- 3 - Book collecion with wishlist and prices scrapping from Estante Virtual
Using WSL: Key Steps and Commands
WSL (Windows Subsystem for Linux) lets you run a full Linux environment directly on Windows without using a virtual machine or dual boot.
Here’s a quick guide to setting up and managing WSL (Windows Subsystem for Linux), along with some essential commands:
wsl --install
— Enables WSL on Windows.wsl --list --verbose
— Lists all installed Linux distributions with detailed info.wsl --list --online
— Shows available distributions you can install.wsl --install --distribution <distro>
— Installs a specific Linux distribution.wsl --unregister <distro>
— Uninstalls a distribution.wsl --set-default <distro>
— Sets the default distribution for WSL sessions.wsl --update
— Updates the WSL system.wsl --status
— Displays the current WSL configuration and status.wsl --help
— Opens the help menu with a list of all commands.df -h /
— Shows disk usage within the Linux environment.free -h
— Displays memory and swap usage.wsl --manage <distro> --resize <memory>
— Adjusts the memory limit for a distribution.wsl --shutdown
— Gracefully shuts down all running WSL instances.
Basic Linux Commands
Linux is a free, open-source operating system known for its stability, security, and use across servers, desktops, and embedded systems.
Here are some commonly used Linux commands for navigating and managing files and directories:
ls
— Lists directories and files in the current path.ls -a
— Shows hidden files and directories.cd <path>
— Navigates to the specified directory.mv <source> <destination>
— Moves or renames a file or directory.rm <file>
— Deletes a specific file.rm -rf <directory>
— Deletes a directory and its contents recursively.mkdir <directory>
— Creates a new directory.sudo
— Runs a command with superuser (admin) privileges.chmod
— Aumentar a permissão de um arquivo.pkill -f <"process">
— To kill a running process if needed (replace with the name or pattern).lsof -i :<port>
— Find the procces using the port.kill -9 <pid>
— Kill the process.
Python Setup
Python is a versatile, high-level programming language known for its readability and wide range of applications.
- Download and install Python from the official website.
During installation, make sure to:
- Run the installer as administrator.
- Select the option to add Python to the system PATH.
- After installation, verify that Python is accessible from your WSL environment by running
python
orpython3
.
If the command is not recognized, add the Python installation path manually via Windows Environment Variables.
Python Virtual Environment Setup
A Python virtual environment is an isolated folder that lets you manage dependencies for a specific project without affecting others.
python3 -m venv <env_name>
— Create a virtual environment in your project directory.source <env_name>/bin/activate
— Activate the environment.deactivate
— Deactivate the environment.pip install -r <path_to_requirements.txt>
— Install dependencies from a requirements.txt file or directly via pip.pip freeze > requirements.txt
— Create requirements.txt.
VS Code Setup
Visual Studio Code (VS Code) is a lightweight, open-source code editor with built-in support for debugging, version control, and extensions across many programming languages.
- Install Visual Studio Code from the Microsoft Store.
code
— Run it int VS Code WSL terminal.
Git Setup
Git is a free and open-source distributed version control system that allows developers to track changes in source code, collaborate on projects, and manage different versions of files efficiently and securely.
sudo apt update && sudo apt install git -y
— Install Git.git config --global user.name "<your_name>"
— Configure github name credential.git config --global user.email "<your_email>"
— Configure github email credential.git init -b <branch_name>
— To transform a local repository in a remote repository.- Set up SSH authentication for GitHub: Go to GitHub → Settings → SSH and GPG Keys → click New SSH Key.
- ssh-keygen -t ed25519 -C "your_email@example.com" — Generate a ssh key.
(Press Enter three times to accept the defaults)
eval "$(ssh-agent -s)"
— Start the SSH agent.ssh-add ~/.ssh/id_ed25519
— Add the SSH private key to the agent.cat ~/.ssh/id_ed25519.pub
— View the public key.
Paste the copied key into GitHub when creating the new SSH Key.
git clone <repository_url>
— Clone an existing repository into VS Code.
Git Commands
git status
— Checks the current status of your working directory and staging area.git add <file1> <file2> <fileN>
— Adds specific files to the staging area.git add -A
— Adds all changes (new, modified, deleted files) to the staging area.git commit -m "<message>"
— Commits staged changes with a message.git log
— Shows the commit history of the current branch.git log --all
— Displays the commit history across all branches.git branch
— Lists all local branches.git branch <new-branch>
— Creates a new branch.git checkout <branch>
— Switches to an existing branch.git checkout -b <branch>
— Creates and switches to a new branch.git merge <source-branch>
— Merges a branch into the current one.To cancel a merge in progress, use
git merge --abort
.git checkout <commit-hash>
— Navigates to a specific commit (detached HEAD).git push <remote> <branch>
— Sends local commits to a remote branch.git remote -v
— Lists the connected remote repositories.git remote add origin <url>
— Connects your local repo to a remote one.git push <remote> --delete <branch>
— Deletes a remote branch.git fetch
— Downloads changes from the remote repository without merging.git pull
— Fetches and merges changes from the remote repository into the current branch.git rebase <target-branch>
— Reapplies commits on top of another branch.git checkout <commit_id> -- <arquivo>
- Undo changes in a specific file.git restore --staged <file1> <file2>
— Unstages files that were added withgit add
.
Commit Standardization (Commitizen)
To standardize commit messages, you can use the [Commitizen] library:
pip install -U commitizen
— Install commitizen.cz commit
— Use interactive commit formatting.
AWS CLI
The AWS CLI (Command Line Interface) is a tool that lets you manage and automate AWS services directly from your terminal using simple text commands.
curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip"
— Dowload the installation package.unzip awscliv2.zip
— Unzip the file.sudo ./aws/install
— Install the AWS CLI.- In your AWS account, configure an IAM user with the necessary permissions.
aws configure sso
— Configure SSO
Provide the following details when prompted: SSO session name (Recommended): <session_name> SSO start URL [None]: <IAM_start_URL> SSO region [None]: <AWS_region> SSO registration scopes [None]: sso:account:access
aws sso login --profile default
— Log in to your AWS session.
Terraform Setup
Terraform is an open-source Infrastructure as Code (IaC) tool that allows you to provision, manage, and version cloud infrastructure using declarative configuration files.
sudo apt-get install terraform
— Install terraform.terraform init
— Initialize your Terraform project (downloads necessary providers and sets up the working directory).terraform plan
— Create an execution plan (previews changes without applying them).terraform apply
— Apply the configuration to provision the infrastructure.
AWS EC2 Instance
Amazon EC2 (Elastic Compute Cloud) is a scalable virtual server service that allows you to run applications in the cloud. It's commonly used to host websites, run backend services, or test environments on-demand.
To deploy an EC2 instance using Terraform, refer to the main.tf file, which defines all necessary infrastructure as code.
Manual Steps (if needed):
- Create an EC2 instance via the AWS Console, making sure to configure an SSH key pair during setup.
- Configure Security Group rules, such as opening port 22 for SSH access.
ssh -i ~/.ssh/ec2-key.pem ec2-user@<ec2-public-dns>
— Connect to the EC2 instance (each AMI has a default username).
PostgreSQL
PostgreSQL is a free and open-source relational database management system known for its reliability, extensibility, and full compliance with SQL standards.
Adjustments necessary to enable remote access to your PostgreSQL instance on EC2: Enabled external listening Updated postgresql.conf by setting: listen_addresses = '*' (remove "#") Allowed external connections Edited pg_hba.conf to add: host all all 0.0.0.0/0 md5 Restarted PostgreSQL Applied config changes by restarting the PostgreSQL service. Opened firewall access Ensured EC2's Security Group allows inbound traffic on port 5432 from your IP or all IPs (for testing). Verified PostgreSQL is running and listening externally Used netstat to confirm it's listening on 0.0.0.0:5432. Corrected credentials and connection IP Fixed host IP and confirmed that the database, user, and permissions were properly set.
sudo apt update && sudo apt install -y postgresql-14
— Install PostgreSQL.pg_lsclusters
— Check for an active cluster.psql -U user -d database
— Open postgreSQL (default database = postgres).psql -U user -d database
— Open postgreSQL.CREATE SCHEMA schema_name;
— Create schema.CREATE DATABASE my_bank WITH OWNER = my_user TEMPLATE = template1 ENCODING = ‘UTF8’ TABLESPACE = pg_default CONNECTION LIMIT = 100;
— Create database.CREATE TABLE my_table (<field1> <data type>, <field2> <data type>, <field3> <data type>);
— Create table.CREATE ROLE my_user WITH LOGIN PASSWORD 'my_password' SUPERUSER CREATEDB CREATEROLE;
— Create user.\h
- Help.\q
- Return.\l
- View databases.\dn
- View schemas.\dt
- view tables.exit
- Exit.\c database
- Enter database.DROP TABLE nome_da_tabela;
— Delete table.sudo nano /var/lib/pgsql/data/postgresql.conf
— Check configurations.cd /tmp && sudo -u postgres pg_ctl reload -D /var/lib/pgsql/data
— Reload config files.sudo nano /var/lib/pgsql/data/pg_hba.conf
— Checking host-base authentication.sudo systemctl restart postgresql
— Restarting postgreSQL.\du
— View users.
DBT
dbt (data build tool) is a command-line tool that enables data teams to transform, test, and document data in the warehouse using modular SQL and software engineering practices.
pip install dbt-postgres
— Install DBT.dbt init
— Configure DBT.dbt debug
— Check configuration.cd ~/.dbt && nano profiles.yml
— Editing profiles.yml (The profiles.yml file in dbt (data build tool) is a configuration file that stores the connection settings needed for dbt to access your data warehouse).dbt run
— Run the models without tests (--select to select a specific model).dbt build
— Run all objects (--select to select a specific object).dbt test
— Test the models (--select to select a specific model).dbt seed
— Import the seeds file to the database (--select to select a specific model).pip install --upgrade dbt-core
— Update DBT.
Docker
sudo mkdir -m 0755 -p /etc/apt/keyrings && curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
— Add GPG key.echo \"deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] \ https://download.docker.com/linux/ubuntu \ $(lsb_release -cs) stable" | \ sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
— Add the docker official repository.sudo apt update && sudo apt install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
— Update and install the docker engine.sudo usermod -aG docker $USER
— Giver permission to the user to run docker.docker compose up
— Activate docker.docker compose down
— Deactivate docker.docker compose -f docker_compose.yml ps
— View containers online.docker compose -f docker_compose.yml logs -f docker_name
— Check the logs.docker exec -it docker-airflow-webserver-1 bash
— Access docker.
Airflow
Apache Airflow is an open-source platform used to programmatically author, schedule, and monitor workflows—especially data pipelines—by defining them as code using Python.
airflow dags list
— Use to list all discovered DAGs.airflow dags unpause <dag>
— To unpause (activate) a specific DAG.- Python script to verify DAG imports manually:
from airflow.models import DagBag dagbag = DagBag() dagbag.dags.keys() dagbag.import_errors
docker exec docker-airflow-webserver-1 ls /opt/airflow/dags
— Check if airflow find the dags.docker exec -it docker-airflow-webserver-1 airflow dags list-import-errors
— Check the import errors.
Streamlit
Streamlit is an open-source Python framework that allows you to quickly build and share interactive web apps for data science and machine learning projects using simple Python scripts.
Installation
pip install streamlit
— Installs Streamlit.Some configurations may need to be done in the browser. Visualization and Charts
plotly
andmatplotlib
— Libraries for building charts.plotly_chart
— Displays interactive charts created with Plotly.PIL
— Library used for image manipulation and adding icons to charts. State and Configurationsession_state['variable']
— Stores the application state, allowing data to persist across user interactions.set_page_config
— Configures the page title, layout, and icon. Interface and Navigationsidebar
— Creates elements in the side navigation menu.columns
— Creates columns in the interface.expander
— Creates expandable sections with additional content.header
— Displays titles in the interface. Content Displayimage
— Displays images.markdown
— Displays formatted text using Markdown.metric
— Displays highlighted metrics. Interactivitybutton
— Creates interactive buttons.selectbox
,multiselect
,slider
— Implement filters and sorting options.text_input
— Captures text input from the user.number_input
— Captures numeric input from the user.form
— Creates interactive forms.file_uploader
— Allows users to upload files.data_editor
— Allows users to edit data in an interactive table. Messages and Controlwarning
— Displays warning messages.success
— Displays success messages.error
— Displays error messages.stop
— Stops script execution when a specific condition is met. Data Manipulationpandas
— Used for reading and manipulating data (CSV, Excel, filtering, sorting, aggregation).
Jenkins
Jenkins is an open-source automation server that helps developers build, test, and deploy their software continuously. In this project, we will only use jenkins to perform a manual execution of the airflow dags.
sudo cat /var/lib/jenkins/secrets/initialAdminPassword
— Check the initial admin password.docker exec docker-jenkins-1 cat /var/jenkins_home/secrets/initialAdminPassword
— In docker.
N8N
N8N is an open-source workflow automation tool that lets you connect apps, services, and custom logic to automate tasks and data flows—without needing to write full applications.
sudo apt install nodejs
— Installing NodeJS (N8N requirements).sudo apt install npm
— Installing NPM (N8N requirements).npm install n8n -g
— Installing N8N.n8n
— Opening N8N.