Releases · finitearth/promptolution

03 Sep 18:15

github-actions

v2.1.0

ab458fc

Release v2.1.0 Latest

Latest

Release v2.1.0

What's changed

Added features:

We added Reward and LLM-as-a-Judge to our task family
- Reward allows you to write a custom function that scores the prediction, without requiring groundtruth
- LLM-as-a-Judge allows you to deligate the task of scoring a prediction to a Judge-LLM, optionally accepting groundtruth
Changes to CAPO, to make it applicable to the new tasks:
- CAPO now accepts input parameter "check_fs_accuracy" (default True) - in case of reward tasks the accuracy cannot be evaluated, so we will take the prediction of the downstream_llm as target of fs.
- CAPO also accepts "create_fs_reasoning" (default is True): if set to false, just use input-output pairs from df_few_shots
introduces tag-extraction function, to centralize repeated code for extractions like "<final_answer>5</final_answer>"

Further changes:

We now utilize mypy for automated type checking
core functionalities of classification task has been moved to base task to prevent code duplication for other tasks
test coverage is now boosted to >90%

Full Changelog: here

Assets 2

04 Aug 13:01

github-actions

v2.0.1

1014ccf

Release v2.0.1

What's changed

updated python requirement to >=3.10 (as 3.9 will lose support after October 2025)
fixed numpy version constraints (thanks to @asalaria-cisco)
make dependencies groups extras optional

Full Changelog: here

Contributors

asalaria-cisco

Assets 2

19 May 22:08

github-actions

v2.0.0

d486c61

Release v2.0.0

What's changed

Added features

We welcome CAPO to the family of our optimizers! CAPO is an optimizer, capable of utilizing few-shot examples to improve prompt performance. Additionally it implements multiple AutoML-approaches. Check out the paper by Zehle et al. (2025) for more details (yep it's us :))
Eval-Cache is now part of the ClassificationTask! This saves a lot of LLM-calls as we do not rerun already evaluated data points
Similar to the Eval-Cache, we added a Sequence-Cache, allowing to extract reasoning chains for few-shot examples
introduced evaluation strategies to the ClassificationTask, allowing for random subsampling, sequential blocking of the dataset or just retrieving scores of datapoints that were already evaluated on prompts

Further changes

rearanged imports and module memberships
Classificators are now called Classifiers
Fixed multiple docstrings and namings of variables.
Simplified testing and extended the testcases to the new implementations
Classification task can now also output a per-datapoint score
Introduced statistical tests (specifically paired-t-test), for CAPO

Full Changelog: here

Assets 2

22 Apr 15:14

github-actions

v1.4.0

6e47cd1

Release v1.4.0

What's changed

Added features

Reworked APILLM to allow for calls to any API that follows the OpenAI API format
Added graceful failing in optimization runs, allowing to obtain results after an error
Reworked configs to ExperimentConfig, allowing to parse any attributes

Further Changes:

Reworked getting started notebook
Added tests for the entire package, covering roughly 80% of the codebase
Reworked dependency and import structure to allow the usage of a subset of the package

Full Changelog: here

Assets 2

20 Mar 17:42

github-actions

v1.3.2

7c052a9

Release v1.3.2

What's changed

Added features

Allow for configuration and evaluation of system prompts in all LLM-Classes
CSV Callback is now FileOutputCallback and able to write Parquet files
Fixed LLM-Call templates in VLLM
refined OPRO-implementation to be closer to the paper

Full Changelog: here

Assets 2

12 Mar 18:53

github-actions

v1.3.1

c12ab62

Release v1.3.1

What's changed

Added features

new features for the VLLM Wrapper (accept seeding to ensure reproducibility)
fixes in the "MarkerBasedClassificator"
fixes in prompt creation and task description handling
generalize the Classificator
add verbosity and callback handling in EvoPromptGA
add timestamp to the callback
removed datasets from repo
changed task creation (now by default with a dataset)

Full Changelog: here

Assets 2

09 Mar 20:51

github-actions

v1.3.0

8ecc6a8

Release v1.3.0

What's changed

Added features

new features for the VLLM Wrapper (automatic batch size determination, accepting kwargs)
allow callbacks to terminate optimization run
add token count functionality
renamed "Classificator"-Predictor to "FirstOccurenceClassificator"
introduced "MarkerBasedClassifcator"
automatic task description creation
use task description in prompt creation
implement CSV callbacks

Full Changelog: here

Assets 2

06 Mar 12:55

github-actions

v1.2.0

0eb5409

Release v1.2.0

What's changed

Added features

New LLM wrapper: VLLM for local inference with batches

Full Changelog: here

Assets 2

21 Feb 11:41

github-actions

v1.1.1

b5e8adb

Release v1.1.1

Release v1.1.1

Assets 2

19 Nov 19:16

github-actions

v1.1.0

729983f

Release v1.1.0

What's changed

Added features

Enable reading tasks from a pandas dataframe

Further Changes:

deleted experiment files from the repo folders (logs, configs, etc.)
improved opros meta-prompt
added support for python versions from 3.9 onwards (previously 3.11)

Assets 2

Releases: finitearth/promptolution

Release v2.1.0

Release v2.1.0

What's changed

Added features:

Further changes:

Uh oh!

Release v2.0.1

Release v2.0.1

What's changed

Contributors

Uh oh!

Release v2.0.0

Release v2.0.0

What's changed

Added features

Further changes

Uh oh!

Release v1.4.0

Release v1.4.0

What's changed

Added features

Further Changes:

Uh oh!

Release v1.3.2

Release v1.3.2

What's changed

Added features

Uh oh!

Release v1.3.1

Release v1.3.1

What's changed

Added features

Uh oh!

Release v1.3.0

Release v1.3.0

What's changed

Added features

Uh oh!

Release v1.2.0

Release v1.2.0

What's changed

Added features

Uh oh!

Release v1.1.1

Uh oh!

Release v1.1.0

Release v1.1.0

What's changed

Added features

Further Changes:

Uh oh!