Multithreaded Archon Benchmark Quality/Time/Cost Estimation Framework #1

yash94404 · 2025-04-30T05:53:36Z

Generated 100 archon config operators of various depths 2-3 based on Archon rules with generator, critic, ranker, and fuser layers (generate_config.py)
Added temporary buffer for each thread and flushed buffer at the end of thread execution of a config to log for a certain question (utils.py, benchmark.py)
Added GPQA_Diamond Benchmark execution class to Archon (benchmark.py)
Log parsing logic to generate quality/cost/time estimates based on log and model outputs

- Added temporary buffer for each thread and flushed buffer at the end of thread execution of a config to log for a certain question (in utils.py and benchmark.py) - Added GPQA_Diamond Benchmark execution class to Archon (benchmark.py) - Log parsing logic to generate quality/cost/time estimates based on log and model outputs

yash94404 added 10 commits March 31, 2025 16:40

archon testing framework initial push

cd5168e

Remove large files and ignore them

c591402

Track large files with Git LFS

054b180

Re-added large files with Git LFS

d25c1a9

committing archon operator testing code against benchmarks

ad19b61

cleaning up extraneous files

f021dc7

cleaning up code

6023bfa

adding more log processing files

55c528f

adding 100 operators, tweaked config generation script

05be722

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Multithreaded Archon Benchmark Quality/Time/Cost Estimation Framework #1

Multithreaded Archon Benchmark Quality/Time/Cost Estimation Framework #1

Uh oh!

yash94404 commented Apr 30, 2025 •

edited

Loading

Uh oh!

Uh oh!

Multithreaded Archon Benchmark Quality/Time/Cost Estimation Framework #1

Are you sure you want to change the base?

Multithreaded Archon Benchmark Quality/Time/Cost Estimation Framework #1

Uh oh!

Conversation

yash94404 commented Apr 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

yash94404 commented Apr 30, 2025 •

edited

Loading