Demokratieförderquote

This is the output of a data project within the context of the Civic Data Lab. The Civic Data Lab team supported the AWO team with technical implementation of algorithms.

Links:

This repo contains

a final static version of democracy funding programs on federal level. See data/labeled/final/2025_07_21_Tabelle_Programme_zur_Publ.xlsx
code for the development and evaluation of a classifier to automatically detect democracy funding programs

License

Code

see LICENSE-CODE

Data

We refer to the imprint of foerderdatenbank.de of the Federal Ministry for Economic Affairs and Climate Action which indicates CC BY-ND 3.0 DE as the license for all texts of the website. The dataset provided in this repository transfers information on each funding program into a machine-readable format.

see LICENSE-DATA

Method and Evaluation Results

The task was to develop a binary text classifier with limited and highly imbalanced data (with many more negatives than positives). Moreover, the concept of what constitutes a positive case is rather complex.

The Codebook (see codebook) defines what is understood as a democracy funding program within this project. It was developed iteratively through strategic data selection in collaboration with two experts from AWO. Given the low number of positives, it was crucial to first retrieve as many positives as possible. In an initial iteration, we selected a couple of dozen funding programs based on the results of a semantic search using an early version of the codebook. This was extended by results from an inverse semantic search to retrieve potential hard negatives. Subsequent coding iterations were based on predicted positives from early classifier results and random sampling of predicted negatives.

The final codebook development and coding iteration incorporated methods from the cleanlab library. AWO experts were asked to review potential label issues identified by cleanlab and to refine the codebook instructions using unclear cases. See Using this final codebook, AWO tasked coders to label all previously labeled data, enabling us to measure inter-coder reliability and human labeling performance (see notebooks/compute_icr.pdf for details).

Based on (McHugh (2012))[https://pmc.ncbi.nlm.nih.gov/articles/PMC3900052/], we used Percent Agreement as the main metric for inter-coder reliability, which was 89.3%, reflecting strong agreement. Category-specific agreement showed that negatives had much higher agreement (93.1%) than positives (75.2%). Considering the dataset as ground truth—labeled and corrected iteratively by the two AWO experts—two independent coders achieved an average accuracy of 88% and an average F1 score of 0.76. Analysis of cases where both coders disagreed with the previous labeling suggests systematic, rather than random, errors. Therefore, the cooperatively created and iteratively refined labeled dataset was used as ground truth for final classifier development.

We evaluated the following methods:

Rule-Based (e.g., classifying a program if it contains the word “democracy”)
TF–IDF as input to a Multilayer Perceptron (MLP)
In-Context Learning (ICL)
FastFit
Fine-tuning a further pretrained BERT model

To address class imbalance and the small dataset size, we tested several techniques, most notably oversampling with SMOTE and data extension through paraphrasing with large language models (LLMs).

Methods were mostly evaluated using a fixed 10-split shuffle split. The rule-based method was evaluated on the entire dataset; ICL was evaluated on 80% of the data since only 20% is needed for the MIPRO algorithm (as recommended by dspy). Hyperparameter optimization was only conducted for the TF-IDF + MLP model due to resource constraints with LLM-based methods. For shuffle split evaluations, we report the mean metric values.

For ICL, applying MIPRO with a medium study size worsened performance compared to zero-shot ICL. FastFit and fine-tuning of a further pretrained BERT performed worse than other methods. Among all approaches, the best performance was achieved by the TF-IDF + MLP model.

Experiment	Model/Variant	F1	Accuracy	Precision	Recall	F1 Std	Acc Std	Prec Std	Rec Std
TF-IDF + MLP	Optuna tuned	0.770	0.875	0.826	0.726	0.045	0.021	0.038	0.076
Fine-tuned BERT	–	0.760	0.814	0.708	0.863	0.120	0.187	0.163	0.092
ICL (Zero-Shot)	moonshotai/kimi-k2	0.757	0.875	0.875	0.667	–	–	–	–
ICL + MIPRO	moonshotai/kimi-k2	0.733	0.852	0.772	0.698	–	–	–	–
FastFit	–	0.725	0.844	0.751	0.711	0.077	0.045	0.097	0.103
Rule-Based	description_text_only	0.667	0.748	0.538	0.876	–	–	–	–

The best model (TF-IDF + MLP) already performs better than humans on the labeling task when setting expert labels as ground truth. Still, performance is not sufficient for the use case of the classifier, which is using classification results for further analysis directly. Accepting that manual work is necessary, optimizing the model for recall is an option, since recall is more important than precision because missing positives would distort the analysis in a more problematic direction. It is feasible to manually verify predicted positives but not negatives. We use the strategy to optimize recall until 0.95 and then prioritize precision. The results is a recall of 0.95 and a precision of 0.57. This model should be used as assistance and its results manually verified.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
.deepeval		.deepeval
codebook		codebook
data		data
labelstudio		labelstudio
lib		lib
model_results		model_results
models		models
notebooks		notebooks
scripts		scripts
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.python-version		.python-version
LICENSE-CODE		LICENSE-CODE
LICENSE-DATA		LICENSE-DATA
README.md		README.md
docker-compose.yml		docker-compose.yml
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Licenses found

Uh oh!

Repository files navigation

Demokratieförderquote

License

Code

Data

Method and Evaluation Results

Setup

Requirements

Dev Setup

About

Licenses found

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

Licenses found

CorrelAid/awo-demokratiefoerderquote

Folders and files

Latest commit

History

Repository files navigation

Demokratieförderquote

License

Code

Data

Method and Evaluation Results

Setup

Requirements

Dev Setup

About

Resources

License

Licenses found

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages