Skip to content

Releases: finitearth/promptolution

Release v2.1.0

03 Sep 18:15
ab458fc

Choose a tag to compare

Release v2.1.0

What's changed

Added features:

  • We added Reward and LLM-as-a-Judge to our task family

    • Reward allows you to write a custom function that scores the prediction, without requiring groundtruth
    • LLM-as-a-Judge allows you to deligate the task of scoring a prediction to a Judge-LLM, optionally accepting groundtruth
  • Changes to CAPO, to make it applicable to the new tasks:

    • CAPO now accepts input parameter "check_fs_accuracy" (default True) - in case of reward tasks the accuracy cannot be evaluated, so we will take the prediction of the downstream_llm as target of fs.
    • CAPO also accepts "create_fs_reasoning" (default is True): if set to false, just use input-output pairs from df_few_shots
  • introduces tag-extraction function, to centralize repeated code for extractions like "<final_answer>5</final_answer>"

Further changes:

  • We now utilize mypy for automated type checking
  • core functionalities of classification task has been moved to base task to prevent code duplication for other tasks
  • test coverage is now boosted to >90%

Full Changelog: here

Release v2.0.1

04 Aug 13:01
1014ccf

Choose a tag to compare

Release v2.0.1

What's changed

  • updated python requirement to >=3.10 (as 3.9 will lose support after October 2025)
  • fixed numpy version constraints (thanks to @asalaria-cisco)
  • make dependencies groups extras optional

Full Changelog: here

Release v2.0.0

19 May 22:08
d486c61

Choose a tag to compare

Release v2.0.0

What's changed

Added features

  • We welcome CAPO to the family of our optimizers! CAPO is an optimizer, capable of utilizing few-shot examples to improve prompt performance. Additionally it implements multiple AutoML-approaches. Check out the paper by Zehle et al. (2025) for more details (yep it's us :))
  • Eval-Cache is now part of the ClassificationTask! This saves a lot of LLM-calls as we do not rerun already evaluated data points
  • Similar to the Eval-Cache, we added a Sequence-Cache, allowing to extract reasoning chains for few-shot examples
  • introduced evaluation strategies to the ClassificationTask, allowing for random subsampling, sequential blocking of the dataset or just retrieving scores of datapoints that were already evaluated on prompts

Further changes

  • rearanged imports and module memberships
  • Classificators are now called Classifiers
  • Fixed multiple docstrings and namings of variables.
  • Simplified testing and extended the testcases to the new implementations
  • Classification task can now also output a per-datapoint score
  • Introduced statistical tests (specifically paired-t-test), for CAPO

Full Changelog: here

Release v1.4.0

22 Apr 15:14
6e47cd1

Choose a tag to compare

Release v1.4.0

What's changed

Added features

  • Reworked APILLM to allow for calls to any API that follows the OpenAI API format
  • Added graceful failing in optimization runs, allowing to obtain results after an error
  • Reworked configs to ExperimentConfig, allowing to parse any attributes

Further Changes:

  • Reworked getting started notebook
  • Added tests for the entire package, covering roughly 80% of the codebase
  • Reworked dependency and import structure to allow the usage of a subset of the package

Full Changelog: here

Release v1.3.2

20 Mar 17:42
7c052a9

Choose a tag to compare

Release v1.3.2

What's changed

Added features

  • Allow for configuration and evaluation of system prompts in all LLM-Classes
  • CSV Callback is now FileOutputCallback and able to write Parquet files
  • Fixed LLM-Call templates in VLLM
  • refined OPRO-implementation to be closer to the paper

Full Changelog: here

Release v1.3.1

12 Mar 18:53
c12ab62

Choose a tag to compare

Release v1.3.1

What's changed

Added features

  • new features for the VLLM Wrapper (accept seeding to ensure reproducibility)
  • fixes in the "MarkerBasedClassificator"
  • fixes in prompt creation and task description handling
  • generalize the Classificator
  • add verbosity and callback handling in EvoPromptGA
  • add timestamp to the callback
  • removed datasets from repo
  • changed task creation (now by default with a dataset)

Full Changelog: here

Release v1.3.0

09 Mar 20:51
8ecc6a8

Choose a tag to compare

Release v1.3.0

What's changed

Added features

  • new features for the VLLM Wrapper (automatic batch size determination, accepting kwargs)
  • allow callbacks to terminate optimization run
  • add token count functionality
  • renamed "Classificator"-Predictor to "FirstOccurenceClassificator"
  • introduced "MarkerBasedClassifcator"
  • automatic task description creation
  • use task description in prompt creation
  • implement CSV callbacks

Full Changelog: here

Release v1.2.0

06 Mar 12:55
0eb5409

Choose a tag to compare

Release v1.2.0

What's changed

Added features

  • New LLM wrapper: VLLM for local inference with batches

Full Changelog: here

Release v1.1.1

21 Feb 11:41
b5e8adb

Choose a tag to compare

Release v1.1.1

Release v1.1.0

19 Nov 19:16
729983f

Choose a tag to compare

Release v1.1.0

What's changed

Added features

  • Enable reading tasks from a pandas dataframe

Further Changes:

  • deleted experiment files from the repo folders (logs, configs, etc.)
  • improved opros meta-prompt
  • added support for python versions from 3.9 onwards (previously 3.11)