For any active developer or data scientist, maintaining a large portfolio of projects on GitHub can become disorganized. Manually adding and updating descriptive topics for each repository is a tedious and inconsistent process. This lack of proper tagging reduces the discoverability of projects, making it difficult for recruiters, collaborators, or even my future self to quickly grasp the scope and technologies of a given project.
The goal of this project was to create a robust automation script that streamlines the organization of my GitHub profile. The tool is designed to scan a local directory of Git repositories, intelligently generate a set of relevant topics based on the project's folder structure and README content, and apply these tags directly to the corresponding GitHub repositories via the API.
I developed a Python script encapsulated in a GitHubTopicUpdater
class that automates the entire tagging process from end-to-end. The workflow is as follows:
-
Secure User Authentication: The script begins by securely and interactively prompting for the user's GitHub username and a Personal Access Token (PAT) using Python's
getpass
library to ensure credentials are not exposed on the command line. It also requests the root path of the local projects directory. -
Local Repository Scanning: The tool recursively walks through the specified directory, identifying valid Git projects by the presence of a
.git
subfolder. -
Intelligent Topic Generation: For each repository found, the script compiles a list of potential topics from multiple sources:
- Folder Structure: It parses the names of parent directories and the repository folder itself, cleaning them to be valid topic formats (e.g.,
_
and spaces are converted to-
). This leverages a hierarchical folder structure (/machine-learning/time-series-forecasting/
) to derive contextual tags. - README Content Analysis: The script reads the
README.md
file within each project and searches for keywords from a comprehensive, predefined list of over 100 relevant technologies, libraries, and concepts (TECH_KEYWORDS
) in the AI and Machine Learning space. - Filtering and Cleaning: All generated topics are standardized to lowercase. A
BLACKLIST_TOPICS
set is used to filter out generic, non-descriptive words (e.g., "project", "data", "model"), ensuring the final tags are meaningful and specific.
- Folder Structure: It parses the names of parent directories and the repository folder itself, cleaning them to be valid topic formats (e.g.,
-
User Review and Confirmation: Before any changes are made, the script presents a clear, formatted summary of all repositories and the topics it has generated for each one. It requires an explicit 'yes' confirmation from the user to proceed, preventing accidental updates.
-
GitHub API Integration: Upon confirmation, the script iterates through each repository and uses the
requests
library to send aPUT
request to the GitHub API, updating the repository's topics. A one-second delay is implemented between API calls to respect rate limits.
os
: Utilized for file system navigation, such as walking through project directories and constructing file paths.requests
: Essential for making HTTP API calls to the GitHub REST API to update repository topics.re
: Leveraged for regular expression operations to clean folder and repository names and to reliably search for keywords within README files.getpass
: Implemented for securely prompting the user for their GitHub Personal Access Token without displaying it in the terminal.time
: Used to add atime.sleep(1)
delay between API requests to prevent rate-limiting issues.
-
Prerequisites:
- Python 3 installed on your system.
- A GitHub Personal Access Token (PAT) with full
repo
scope. You can create one here. - A local directory where you have cloned the GitHub repositories you wish to tag.
-
Installation:
- Save the script as
run_all.py
. - Install the necessary Python library:
pip install requests
- Save the script as
-
Execution:
- Open your terminal or command prompt.
- Navigate to the directory where you saved
run_all.py
. - Run the script using the following command:
python run_all.py
- Follow the on-screen prompts to enter your GitHub username, PAT, and the path to your projects directory.
- Carefully review the proposed topics for each repository.
- Type
yes
and press Enter to confirm and begin the update process on GitHub.
The script performs its function reliably, successfully automating a manual and error-prone task. The primary result is a significant improvement in the organization and discoverability of projects on my GitHub profile. It effectively bridges the gap between local project organization and the public-facing presentation on GitHub, ensuring that repository topics are consistent, comprehensive, and up-to-date. The tool provides clear, real-time feedback on the success or failure of each API update, allowing for easy monitoring.
Here is an example of the script's output during an execution cycle, showing the user prompts, the review list, and the final update status messages.
--- GitHub Auto-Tagger Configuration ---
Enter your GitHub username: imehranasgari
Enter your GitHub Personal Access Token (PAT):
Enter the full path to your projects directory: /Users/mehran/Documents/GitHub_Projects
----------------------------------------
STEP 1: Scanning local folders and reading READMEs...
SUCCESS: Found 2 repositories. Here is the generated list:
----------------------------------------------------------------------
"API-for-Deep-Learning-Model": ['api', 'deep-learning', 'deployment', 'fastapi', 'python', 'tensorflow'],
"Customer-Churn-Prediction": ['classification', 'deep-learning', 'machine-learning', 'pandas', 'python', 'scikit-learn', 'xgboost'],
----------------------------------------------------------------------
>>> Review the list. Type 'yes' to update GitHub, or anything else to cancel: yes
STEP 2: Starting to update topics on GitHub...
Attempting to update 'API-for-Deep-Learning-Model'...
✅ SUCCESS: Topics updated for 'API-for-Deep-Learning-Model'.
Attempting to update 'Customer-Churn-Prediction'...
✅ SUCCESS: Topics updated for 'Customer-Churn-Prediction'.
All done!
Building this tool provided valuable hands-on experience in several key areas of software development:
- API Integration: It was a practical exercise in consuming a major third-party REST API (GitHub), including handling authentication (with PATs), structuring requests correctly, and interpreting responses.
- Secure Coding Practices: The deliberate choice to use
getpass
instead of a standardinput()
for the PAT reflects an understanding of the importance of handling sensitive data securely. - User-Centric Design: The script was designed with the user in mind, incorporating a final review and confirmation step. This is a critical feature for any automation tool that performs write operations, as it provides a safeguard against unintended changes.
- Code Modularity and Reusability: By creating a well-defined class and internal methods (
_scan_local_repos
,_update_github_repo
), the logic is clean, organized, and easy to maintain or extend in the future. The use of customizable lists for keywords and blacklisted topics also makes the tool highly adaptable.
Email: imehranasgari@gmail.com.
GitHub: https://github.com/imehranasgari.
This project is licensed under the Apache 2.0 License – see the LICENSE
file for details.