Releases: finitearth/promptolution
Releases · finitearth/promptolution
Release v2.1.0
Release v2.1.0
What's changed
Added features:
-
We added Reward and LLM-as-a-Judge to our task family
- Reward allows you to write a custom function that scores the prediction, without requiring groundtruth
- LLM-as-a-Judge allows you to deligate the task of scoring a prediction to a Judge-LLM, optionally accepting groundtruth
-
Changes to CAPO, to make it applicable to the new tasks:
- CAPO now accepts input parameter "check_fs_accuracy" (default True) - in case of reward tasks the accuracy cannot be evaluated, so we will take the prediction of the downstream_llm as target of fs.
- CAPO also accepts "create_fs_reasoning" (default is True): if set to false, just use input-output pairs from df_few_shots
-
introduces tag-extraction function, to centralize repeated code for extractions like "<final_answer>5</final_answer>"
Further changes:
- We now utilize mypy for automated type checking
- core functionalities of classification task has been moved to base task to prevent code duplication for other tasks
- test coverage is now boosted to >90%
Full Changelog: here
Release v2.0.1
Release v2.0.1
What's changed
- updated python requirement to >=3.10 (as 3.9 will lose support after October 2025)
- fixed numpy version constraints (thanks to @asalaria-cisco)
- make dependencies groups extras optional
Full Changelog: here
Release v2.0.0
Release v2.0.0
What's changed
Added features
- We welcome CAPO to the family of our optimizers! CAPO is an optimizer, capable of utilizing few-shot examples to improve prompt performance. Additionally it implements multiple AutoML-approaches. Check out the paper by Zehle et al. (2025) for more details (yep it's us :))
- Eval-Cache is now part of the ClassificationTask! This saves a lot of LLM-calls as we do not rerun already evaluated data points
- Similar to the Eval-Cache, we added a Sequence-Cache, allowing to extract reasoning chains for few-shot examples
- introduced evaluation strategies to the ClassificationTask, allowing for random subsampling, sequential blocking of the dataset or just retrieving scores of datapoints that were already evaluated on prompts
Further changes
- rearanged imports and module memberships
- Classificators are now called Classifiers
- Fixed multiple docstrings and namings of variables.
- Simplified testing and extended the testcases to the new implementations
- Classification task can now also output a per-datapoint score
- Introduced statistical tests (specifically paired-t-test), for CAPO
Full Changelog: here
Release v1.4.0
Release v1.4.0
What's changed
Added features
- Reworked APILLM to allow for calls to any API that follows the OpenAI API format
- Added graceful failing in optimization runs, allowing to obtain results after an error
- Reworked configs to ExperimentConfig, allowing to parse any attributes
Further Changes:
- Reworked getting started notebook
- Added tests for the entire package, covering roughly 80% of the codebase
- Reworked dependency and import structure to allow the usage of a subset of the package
Full Changelog: here
Release v1.3.2
Release v1.3.2
What's changed
Added features
- Allow for configuration and evaluation of system prompts in all LLM-Classes
- CSV Callback is now FileOutputCallback and able to write Parquet files
- Fixed LLM-Call templates in VLLM
- refined OPRO-implementation to be closer to the paper
Full Changelog: here
Release v1.3.1
Release v1.3.1
What's changed
Added features
- new features for the VLLM Wrapper (accept seeding to ensure reproducibility)
- fixes in the "MarkerBasedClassificator"
- fixes in prompt creation and task description handling
- generalize the Classificator
- add verbosity and callback handling in EvoPromptGA
- add timestamp to the callback
- removed datasets from repo
- changed task creation (now by default with a dataset)
Full Changelog: here
Release v1.3.0
Release v1.3.0
What's changed
Added features
- new features for the VLLM Wrapper (automatic batch size determination, accepting kwargs)
- allow callbacks to terminate optimization run
- add token count functionality
- renamed "Classificator"-Predictor to "FirstOccurenceClassificator"
- introduced "MarkerBasedClassifcator"
- automatic task description creation
- use task description in prompt creation
- implement CSV callbacks
Full Changelog: here
Release v1.2.0
Release v1.2.0
What's changed
Added features
- New LLM wrapper: VLLM for local inference with batches
Full Changelog: here
Release v1.1.1
Release v1.1.1
Release v1.1.0
Release v1.1.0
What's changed
Added features
- Enable reading tasks from a pandas dataframe
Further Changes:
- deleted experiment files from the repo folders (logs, configs, etc.)
- improved opros meta-prompt
- added support for python versions from 3.9 onwards (previously 3.11)