Article Scout 📚

Article Scout is an intelligent research paper evaluation system that helps students and researchers assess the relevance and quality of academic papers for their TCC (Final Project) or research work.

🌟 Features / Funcionalidades

🔍 PDF Text Extraction

Multi-method PDF text extraction (PyPDF2, pdfminer.six, PyMuPDF)
Automatic text truncation for API limits
Support for various PDF formats

🤖 AI-Powered Evaluation

Comprehensive paper evaluation using Groq LLM
Multiple evaluation criteria:
- Relevance to research theme
- Originality and novelty
- Methodology quality
- Results and discussion quality
- Potential impact
- Writing clarity
- References timeliness

📊 Interactive Web Interface

Streamlit-based web application
Real-time evaluation results
User-friendly interface
Detailed explanations for each criterion

🧪 Comprehensive Testing

Unit tests for PDF extraction
Integration tests for complete workflow
Performance testing
Error handling validation

🚀 Quick Start / Início Rápido

Prerequisites / Pré-requisitos

# Python 3.12+
# Groq API Key
# Required packages (see requirements.txt)

Installation / Instalação

# Clone the repository
git clone https://github.com/yourusername/article-scout.git
cd article-scout

# Create virtual environment
python3 -m venv .venv
source .venv/bin/activate  # Linux/Mac
# or
.venv\Scripts\activate     # Windows

# Install dependencies
pip install -r requirements.txt

# Set up environment variables
cp .env.example .env
# Edit .env and add your GROQ_API_KEY

Usage / Uso

Web Application / Aplicação Web

# Start Streamlit app
streamlit run streamlit_app.py

Command Line / Linha de Comando

# Test PDF extraction
python3 -m pytest tests/test_pdf_extraction.py -v

# Test complete integration
python3 -m pytest tests/test_integration.py -v

# Test Streamlit integration
python3 -m pytest tests/test_streamlit_integration.py -v

📁 Project Structure / Estrutura do Projeto

article_scout/
├── 📁 input_files/              # PDF files for testing
├── 📁 utils/
│   ├── __init__.py
│   └── pdf_extractor.py         # PDF text extraction module
├── 📁 tests/
│   ├── __init__.py
│   ├── test_pdf_extraction.py   # PDF extraction tests
│   ├── test_integration.py      # Integration tests
│   └── test_streamlit_integration.py  # Streamlit flow tests
├── article_scout_agent.py       # Main evaluation engine
├── streamlit_app.py             # Web interface
├── requirements.txt             # Python dependencies
├── pyproject.toml              # Project configuration
└── README.md                   # This file

🔧 Configuration / Configuração

Environment Variables / Variáveis de Ambiente

Create a .env file in the project root:

GROQ_API_KEY=your_groq_api_key_here

API Limits / Limites da API

Input limit: 5000 characters (configurable in article_scout_agent.py)
Model: llama-3.1-8b-instant (Groq)
Temperature: 0.3 (for consistent results)

🧪 Testing / Testes

Running Tests / Executando Testes

# Run all tests
python3 -m pytest tests/ -v

# Run specific test categories
python3 -m pytest tests/test_pdf_extraction.py -v      # PDF extraction
python3 -m pytest tests/test_integration.py -v         # Integration
python3 -m pytest tests/test_streamlit_integration.py -v  # Streamlit flow

# Run with detailed output
python3 -m pytest tests/ -v -s

Test Coverage / Cobertura de Testes

✅ PDF text extraction with multiple methods
✅ Article Scout Agent evaluation workflow
✅ Streamlit integration flow
✅ Error handling and edge cases
✅ Performance testing
✅ API limit handling

📊 Evaluation Criteria / Critérios de Avaliação

The Article Scout evaluates papers based on 7 key criteria:

Criterion / Critério	Weight / Peso	Description / Descrição
Relevance	20%	How well the paper aligns with your research theme
Originality	15%	Novelty and innovation of the work
Methodology	15%	Quality and robustness of research methods
Results & Discussion	15%	Clarity and soundness of findings
Potential Impact	15%	Significance and implications of the work
Writing Clarity	10%	Readability and communication quality
References	10%	Timeliness and relevance of citations

🔄 Workflow / Fluxo de Trabalho

1. 📁 PDF Upload
   ↓
2. 🔍 Text Extraction (utils/pdf_extractor.py)
   ↓
3. 🤖 AI Evaluation (article_scout_agent.py)
   ↓
4. 📊 Results Display (streamlit_app.py)

Detailed Flow / Fluxo Detalhado

PDF Upload: User uploads a research paper PDF
Text Extraction: System extracts text using multiple methods
Text Truncation: If needed, text is truncated to fit API limits
AI Evaluation: Article Scout Agent evaluates the paper
Results Formatting: Results are formatted for display
Web Display: Results are shown in the Streamlit interface

🛠️ Development / Desenvolvimento

Adding New Features / Adicionando Novas Funcionalidades

PDF Extraction Methods:
- Add new method in utils/pdf_extractor.py
- Update fallback chain in try_pdfminer() or try_pymupdf()
Evaluation Criteria:
- Add new criterion in article_scout_agent.py
- Update State TypedDict and workflow
- Add corresponding test cases
Web Interface:
- Modify streamlit_app.py for new features
- Update result formatting functions

Code Style / Estilo de Código

Follow PEP 8 guidelines
Use type hints
Add docstrings for all functions
Write comprehensive tests

🐛 Troubleshooting / Solução de Problemas

Common Issues / Problemas Comuns

PDF Extraction Fails / Falha na Extração de PDF

# Check if PDF is image-based
python3 -c "from utils.pdf_extractor import extract_text_from_pdf; print(extract_text_from_pdf('your_file.pdf'))"

API Key Issues / Problemas com Chave da API

# Verify environment variable
echo $GROQ_API_KEY
# or
python3 -c "import os; print(os.getenv('GROQ_API_KEY'))"

Import Errors / Erros de Importação

# Check Python path
python3 -c "import sys; print(sys.path)"
# Ensure you're in the project root directory

📈 Performance / Performance

Benchmarks / Benchmarks

PDF Extraction: < 10 seconds for most files
AI Evaluation: < 60 seconds for standard papers
Total Workflow: < 2 minutes end-to-end

Optimization Tips / Dicas de Otimização

Use smaller PDFs when possible
Consider pre-processing large documents
Cache evaluation results for repeated papers

🤝 Contributing / Contribuindo

How to Contribute / Como Contribuir

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

Development Setup / Configuração de Desenvolvimento

# Install development dependencies
pip install -r requirements-dev.txt

# Run linting
flake8 .

# Run type checking
mypy .

# Run all tests
python3 -m pytest tests/ -v --cov=.

📄 License / Licença

This project is licensed under the MIT License - see the LICENSE file for details.

Este projeto está licenciado sob a Licença MIT - veja o arquivo LICENSE para detalhes.

🙏 Acknowledgments / Agradecimentos

Groq for providing the LLM API
Streamlit for the web framework
PyPDF2 and pdfminer.six for PDF processing
LangGraph for workflow orchestration

📞 Support / Suporte

Issues: GitHub Issues
Discussions: GitHub Discussions
Email: your.email@example.com

🔄 Changelog / Histórico de Versões

v1.0.0 (2024-01-XX)

✅ Initial release
✅ PDF text extraction with multiple methods
✅ AI-powered paper evaluation
✅ Streamlit web interface
✅ Comprehensive test suite
✅ Integration testing framework

Made with ❤️ for the academic community

Feito com ❤️ para a comunidade acadêmica

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
tests		tests
utils		utils
.gitignore		.gitignore
.python-version		.python-version
LICENSE		LICENSE
README.md		README.md
article_scout_agent.py		article_scout_agent.py
env.example		env.example
pyproject.toml		pyproject.toml
streamlit_app.py		streamlit_app.py
uv.lock		uv.lock

License

mpraes/article_scout_agent

Folders and files

Latest commit

History

Repository files navigation