initial attempt to capture perf experiment stats #114
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
I've added additional metadata into the
chatlog.jsonl
:The first 2 are basically data tags in order to split/pivot for future analysis (e.g. o3 vs o4, medium vs high effort). More tags can be added in the future.
In addition, there's a high-level aggregation of the chunk_times data that gets printed to the console after each response:
Additional aggregations can be added now or later.
Example of what shows up in the
chatlog.jsonl
(formatted for readability):And here's an example of what shows up on the console:
A separate post-processing script to process many
jsonl
chat logs, aggregate and plot them can be written in the future. I would need a better picture of how folks plan to collect the jsonl logs and how they want to analyze the data.I'm looking for feedback whether this is: