The goal of this project is to create an AI agent that can assist in the maintenance and updating of HSF Training modules.
The agent will be able to ingest training modules, look at the content, and suggest updates or improvements based on the latest developments in the field.
The project must support multiple LLM providers. The default is Google Gemini as it is accessible for free (up to some limit). We plan to add support for other providers like OpenAI GPT, Anthropic Claude, and others in the future.
- Automated Content Analysis: Uses Google Gemini AI to analyze training materials
- GitHub Integration: Reads repository content and creates issues/PRs for suggestions
- Multi-Format Support: Handles Markdown files and Jupyter notebooks
- Structured Suggestions: Provides categorized, prioritized update recommendations
- CLI Interface: Easy-to-use command-line interface with rich output
# Clone the repository
git clone <repository_url>
cd hsf-training-ai-maintenance
# Install dependencies
pip install -r requirements.txt
Create a .env
file based on the example:
cp .env.example .env
Edit .env
with your API keys:
GEMINI_API_KEY=your_gemini_api_key_here
GITHUB_TOKEN=your_github_token_here
Analyze a complete repository:
python hsf_training_agent/main.py analyze https://github.com/hsf-training/hsf-training-example
Analyze a single file:
python hsf_training_agent/main.py analyze-file https://github.com/hsf-training/hsf-training-example path/to/file.md
- Go to Google AI Studio
- Create a new API key
- Add it to your
.env
file asGEMINI_API_KEY
- Go to GitHub Settings → Developer settings → Personal access tokens
- Generate a new token with
repo
permissions - Add it to your
.env
file asGITHUB_TOKEN
python hsf_training_agent/main.py analyze https://github.com/hsf-training/hsf-training-example
This will:
- Analyze all training content
- Create GitHub issues with suggestions
- Generate a summary issue
python hsf_training_agent/main.py analyze --no-issues --no-summary https://github.com/hsf-training/hsf-training-example
python hsf_training_agent/main.py analyze --dry-run https://github.com/hsf-training/hsf-training-example
This shows exactly what issues would be created without actually creating them.
python hsf_training_agent/main.py analyze -o results.json https://github.com/hsf-training/hsf-training-example
python hsf_training_agent/main.py analyze-file https://github.com/hsf-training/hsf-training-example _episodes/01-introduction.md
The agent can be configured via environment variables or .env
file:
Variable | Description | Default |
---|---|---|
GEMINI_API_KEY |
Google Gemini API key | Required |
GITHUB_TOKEN |
GitHub API token | Optional |
LOG_LEVEL |
Logging level | INFO |
MAX_FILE_SIZE_MB |
Maximum file size to process | 10 |
ANALYSIS_TIMEOUT_SECONDS |
Analysis timeout | 300 |
The agent focuses on identifying:
- Software Updates: Outdated versions, libraries, frameworks
- Best Practices: Evolved methodologies and improved practices
- Recent Developments: New developments in HEP and computational science
- Resource Updates: Broken links, outdated documentation
- Technical Accuracy: Outdated or incorrect technical information
- Example Improvements: More current examples and case studies
software_update
: Outdated software versions or toolsbest_practice
: Improved methodologies or practicesrecent_development
: Recent field developmentsresource_update
: Link updates or resource changestechnical_accuracy
: Technical correctionsexample_improvement
: Better examples or exercises
The agent creates structured GitHub issues with:
- Clear titles and descriptions
- Categorized suggestions with priorities
- Specific change recommendations
- Educational context preservation
When using -o
option, results are saved as JSON with:
- Complete analysis results
- Structured suggestion data
- Repository metadata
- Processing statistics
hsf_training_agent/
├── src/
│ ├── ai/ # AI integration (Gemini)
│ ├── github/ # GitHub API clients
│ ├── processors/ # Content processors
│ ├── config/ # Configuration management
│ ├── analyzer.py # Main orchestrator
│ ├── cli.py # Command line interface
│ └── utils.py # Utilities and error handling
├── main.py # Entry point
└── requirements.txt # Dependencies
To add support for new file formats:
- Create a processor in
src/processors/
- Implement content analysis methods
- Register in the main analyzer
Modify src/ai/prompt_templates.py
to:
- Add new analysis prompts
- Customize suggestion categories
- Adjust analysis focus areas
"No Gemini API key"
- Ensure
GEMINI_API_KEY
is set in.env
- Verify the key is valid and has quota
"GitHub API rate limit"
- Add
GITHUB_TOKEN
for higher rate limits - Consider using
--no-issues
for analysis-only mode
"File not found"
- Check repository URL format
- Verify file paths are relative to repository root
Run with debug logging:
python hsf_training_agent/main.py --log-level DEBUG analyze <repo_url>
- Follow the existing code structure
- Add tests for new functionality
- Update documentation for new features
- Ensure error handling is comprehensive