Skip to content

krzemienski/awesome-researcher

Repository files navigation

Awesome-List Researcher

A Docker-based tool that automatically finds and suggests new resources for GitHub Awesome Lists.

Overview

This tool ingests any GitHub Awesome-style repository, parses its README.md, and uses OpenAI's agents to discover new, non-duplicate, spec-compliant resources that can be added to the list. The result is an updated Markdown file that passes awesome-lint checks.

Features

  • Fully containerized - runs completely inside Docker
  • Enforces cost and time constraints
  • Configurable OpenAI model selection
  • Deduplication of existing resources
  • Validation of new resources (HTTPS, accessibility, etc.)
  • Structured logging of all operations

Requirements

  • Docker
  • OpenAI API key

Usage

# Build and run the tool
./build-and-run.sh --repo_url https://github.com/username/awesome-repo \
  --wall_time 600 \
  --cost_ceiling 10.00 \
  --output_dir runs/ \
  --model_planner gpt-4.1 \
  --model_researcher o3 \
  --model_validator o3

Parameters

Parameter Description Default
--repo_url GitHub URL of the Awesome list (required) -
--wall_time Maximum execution time in seconds 600
--cost_ceiling Maximum OpenAI API cost in USD 10.00
--output_dir Directory for output artifacts runs/
--seed Random seed for deterministic behavior random
--model_planner Model for planning research queries gpt-4.1
--model_researcher Model for researching new resources o3
--model_validator Model for validating new resources o3

Environment Variables

  • OPENAI_API_KEY (required): Your OpenAI API key

Output

All outputs are saved under runs/<ISO-TIMESTAMP>/ with the following artifacts:

  • original.json: Parsed content of the original README
  • plan.json: Research plan generated by the planner agent
  • candidate_*.json: Candidate resources found by research agents
  • new_links.json: Validated new resources after deduplication
  • updated_list.md: Final updated Markdown list
  • agent.log: Detailed log of all operations
  • research_report.md: Summary of the research process

Model Selection Strategy

The tool uses different models for different tasks to optimize for cost and quality:

  • Planner Agent: gpt-4.1 - Deep reasoning to create high-quality search queries
  • Category Researcher: o3 - Cost-effective option for large volume of research tasks
  • Validator: o3 - Lightweight description cleanup and validation

Running Tests

To run the end-to-end test:

# Make sure the OPENAI_API_KEY is set
export OPENAI_API_KEY=your_api_key_here
# Run the test
./tests/run_e2e.sh

Add the --keep flag to preserve test outputs:

./tests/run_e2e.sh --keep

License

MIT

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published