Open Science Software for Semantic Synthesis and Extraction of Information from Unstructured Sources.
alembica
simplifies the use of Large Language Models (LLMs) to extract structured datasets from unstructured corpora of text.
It provides a flexible and scalable framework to process, synthesize, and transform textual information into structured formats suitable for analysis and further processing.
To install alembica
in Go, run:
go get github.com/open-and-sustainable/alembica
If you want to use alembica
in other programming languages, check out the C-Shared Library in the User Guide.
User Guide – Learn how to use alembica
in different programming languages.
API Reference – Explore the Go package documentation.
- Validation of Input – Ensures that queries are correctly formatted to support proper interaction with models.
- Cost Assessment – Calculates token costs based on the requested extraction and different model pricing.
- Data Extraction – Processes unstructured text and transforms it into structured datasets for further analysis.
Author: Riccardo Boero - ribo@nilu.no
Contributions are welcome!
alembica
is licensed under the GNU AFFERO GENERAL PUBLIC LICENSE, Version 3.
Boero, R. (2025).
alembica
- Open Science Software for Semantic Synthesis and Extraction of Information from Unstructured Sources. Zenodo. https://doi.org/10.5281/zenodo.14899666