Add dataset creation from traces

**Is your feature request related to a problem? Please describe.**
The data gathered through the use of `rai_benchmark` presents itself as a great opportunity to create datasets, which can be used for fine-tuning of smaller L/VLMs. Ability to create datasets automatically after a benchmark is run would be very helpful.

**Describe the solution you'd like**
A module could be created allowing for automatic creation of a dataset from a langfuse traceback after a run of `rai_benchmark` -- (however the module should not be tightly integrated with `rai_benchmark` itself -- in the future it might be beneficial to extend this functionality to non-benchmark deployments, allowing for online and/or imitation learning).


**Describe alternatives you've considered**
TBD

**Additional context**
Initial research has pointed to https://docs.unsloth.ai/ as a possible target framework to conduct fine tuning. Additional information on dataset requirements may be found there.

****Note****: Different models might require/benefit from different prompt structure for fine tuning. The dataset shouldn't necessarily take care of that, it might be better to leave it to pre-processing pipeline during fine tuning (once that's developed). However the dataset should possibly include all relevant metadata, which will allow for finetuning possibly largest range of models.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add dataset creation from traces #663

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add dataset creation from traces #663

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions