A open-source scientific zero-shot text classification Python package based on various LLM models.
This project provides a flexible framework for performing zero-shot classification using large language models in Python. It allows you to classify text into categories without requiring explicit training data for those categories. All instructions to LLMs are provided by mere natural language prompts. The framework is designed to support a wide range of text classification tasks including multi-label, multi-class, and single-class classification scenarios.
This package was developed as part of an academic research project to systematically classify political communication. The primary goal was to create an easy-to-use and accessible framework for building adaptable zero-shot classifications with large language models (LLMs) across a wide variety of text analysis tasks. By providing a flexible and intuitive tool, this project aims to empower students and researchers β especially those in social sciences β to explore, evaluate, and harness the potential of zero-shot classification while addressing its challenges in a user-friendly environment. I have no financial interest in this project.
This project is open-source and was developed with no financial interests. It is intended to support academic research and the broader scientific community. Contributions are welcome to help improve the framework and expand its capabilities.
- Flexible prompt-driven zero-shot classification (with few-shot option).
- Supports diverse classification tasks: multi-label/class, sequential (IZSC), and hierarchical (HZSC).
- Option to perform double shots: Perform each step twice and evaluate the performance and uncertainty afterwards.
- Integrates with multiple LLM providers (e.g., OpenAI, Ollama) and pandas DataFrames.
- Provides an easy-to-use command-line interface (CLI) for demonstration purpose.
- Open-source and non-commercial, designed for research and academic use.
- Zero-Shot Classification (via Prompting): The engine performs Zero-Shot Classification by instructing Large Language Models (LLMs) using user-defined natural language prompts. This allows classifying text into new categories without needing specific training examples for those categories; the LLM understands the task solely from the prompt provided through the engine.
- Sequential Classification (IZSC Approach): The Iterative Zero-Shot Classification (IZSC) module enables Sequential Classification. Within this package, it means users can define a series of potentially independent classification steps executed in order. This allows building flexible, multi-stage analysis pipelines where each stage uses potentially different prompts or focuses on different aspects of the text, without requiring strict hierarchical dependencies between stages.
- Hierarchical Classification (HZSC Approach): The Hierarchical Zero-Shot Classification (HZSC) module implements Hierarchical Classification. This allows users to structure classification tasks as a tree or hierarchy defined in the configuration. The package guides the LLM through this predefined structure, making classification decisions at one level contingent upon the outcomes of decisions at parent levels.
- Multi-Prompting: The framework supports a Multi-Prompting strategy by allowing users to define distinct prompts for each classification step (in IZSC) or for different levels/branches within a hierarchy (in HZSC). This flexibility enables tailoring LLM instructions specifically to the sub-task at hand, aiming for improved accuracy and nuance.
- Modular Prompt Design: The package encourages Modular Prompt Design. Prompts can be constructed from reusable text blocks managed within the configuration. While not automated, this structure simplifies the user's process of experimenting with, testing, and refining different prompt components manually to optimize LLM performance.
- Iterative Double Validated Zero-Shot Classification (IZSC): IZSC is the core module to classify texts in an iterative process. It can use a double validation technique to ensure the robustness and accuracy of the classifications.
- Hierarchical Double Validated Zero-Shot Classification (HZSC): HZSC extends the zero-shot classification capabilities to hierarchical category structures. It leverages a double validation approach to maintain accuracy while navigating the complexities of hierarchical classification.
Install the zeroshot-engine package using pip in your Windows Powershell or Linux / Mac Bash Terminal.
pip install zeroshot-engineTest the zeroshot-engine in the HDZSC-scenario by selecting from a wide variety of LLMs and bringing your own text for classification:
On Linux/Mac:
zeroshot-engine demoOn Windows:
zeroshot-engine.exe demoThis command will guide you through an interactive demo where you can:
- Choose an LLM model (e.g., one from OpenAI or Ollama).
- Provide your own text for classification or use a provided example text.
- Observe how the hierarchical classification process works in real-time.
This tutorial provides example code for your first test project, which you can use as a template to build and adapt your own research projects. For more detailed information and advanced usage, please refer to the documentation.
For more detailed information about the framework and its implementation, please refer to the following documentation:
-
Overview of IZSC and HZSC - A comprehensive explanation of the Iterative and Hierarchical Double Zero-Shot Classification approaches, including detailed examples and usage patterns.
-
Performance Evaluation - Benchmark results and performance metrics across different models and classification tasks.
-
In-Depth-Demo-Explanation - Learn how the HZSC works in detail.
-
Tutorial: Get started with your first classification - Create your first projects with prompt, code examples and text to learn how to set up the classifer.
==============================================================
ZEROSHOTENGINE DEMO LABEL DEPENDENCY FLOWCHART
==============================================================
[POLITICAL]
ββ if political = 1:
β [PRESENTATION]
β [ATTACK]
β ββ if attack = 1:
β β [TARGET]
β β β
β β βΌ
β β STOP
β ββ if attack = 0:
β β Skip: target
β STOP
ββ if political = 0:
β Skip: presentation, attack, target
STOP
--------------------------------------------------------------
STOP CONDITIONS EXPLANATION
--------------------------------------------------------------
If political = 0 (absent), the following steps are skipped:
- presentation
- attack
- target
If attack = 0 (absent), the following steps are skipped:
- target
--------------------------------------------------------------
LEGEND
--------------------------------------------------------------
- 1 (present): Proceeds to the next classification step
- 0 (absent): Skips one or more subsequent classifications
LABEL CODES
present: 1
absent: 0
non-coded: 8
empty-list: []
--------------------------------------------------------------
- List of supported LLMs
- Additional tutorial for double validation vs. zero-temp approach.
- Documentation of all relevant functions.
- Create prompting guidelines.
- Better integration and testing of validation metrics.
- Automated Logging System
- Add contribution guidelines.
- Support for more LLMs and APIs.
- Agent-based supervising behaviour in the double validation process, that a second series prompt could evaluate the first prompt or resolves conflicts in two prompts, or a thrid prompt with different model could evaluate the conflict if needed between the first identical prompts.
While the core functionality of zeroshot-engine is already up and running, this project is still under active development.
There may be bugs, incomplete features, or areas for improvement.
If you encounter any issues, have feature requests, or would like to contribute code to the project, please feel free to:
- Open an issue on the GitHub repository.
- Submit a pull request with your contributions.
- Contact the author directly at luc.schwarz@posteo.de.
Contributions are highly appreciated and will help improve the framework for the scientific community!
Authorship and provision of Large Language Models (LLMs) reside entirely with their respective creators. zeroshot-engine claims no authorship over these models. It functions solely as an orchestrator, providing a structured framework to access models from providers like OpenAI (using the official openai Python library) or locally hosted Ollama instances (via the official LangChain Python library).
Note: Utilizing Ollama requires it to be installed and running separately on your system. Internal data handling relies on pandas DataFrames, making it a core dependency (familiar if you work with Python for data analysis).
This project is licensed under the Apache 2.0 License - see the LICENSE file for details.
Contributions are welcome! Feel free to open issues for bug reports or feature requests. If you'd like to contribute code directly, please see the contributing guidelines.
Lucas Schwarz
If you use zeroshot-engine in your research, please cite it as follows:
Schwarz, L. (2025). zeroshot-engine: A scientific zero-shot text classification engine based on various LLM models (Version 0.1.5) [Computer software]. https://doi.org/10.5281/zenodo.16890143
@software{Schwarz_zeroshot-engine_A_scientific_2025,
author = {Schwarz, Lucas},
doi = {10.5281/zenodo.16890143},
month = mar,
title = {{zeroshot-engine: A scientific zero-shot text classification engine based on various LLM models}},
url = {https://github.com/TheLucasSchwarz/zeroshotENGINE},
version = {0.1.5},
year = {2025}
}