dbca-splitter

Independent implementation of the Distribution-Based Compositionality Assessment (DBCA) method presented in the ICLR2020 paper Measuring Compositional Generalization: A Comprehensive Method on Realistic Data.

Official authors repo contains Compositional Freebase Questions (CFQ) dataset generated by DBCA method, but not the actual code to create Maximum Compound Divergence (MCD) splits for arbitrary DAG-structured data, hence this implementation.

See here for a blog post walk-through of the paper I wrote up.

Installation

From repo root, run pip install -r requirements.txt.

Tested with Ubuntu 18.04 and Python 3.7.

Usage

Generating data

To generate a dataset with desired compositionality settings, you'll need to provide your own sample set with samples represented in the required directed-acyclic-graph (DAG) format. The current implementation just uses simple toy data for testing/research purposes.

Demo notebook

See demo notebook for sample usage.

Split generation settings

From repo root, run python run_dbca.py --h to see the various possible split generation settings (or see source here).

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
dbca		dbca
notebooks		notebooks
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
run_dbca.py		run_dbca.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

dbca-splitter

Installation

Usage

Generating data

Demo notebook

Split generation settings

About

Uh oh!

Releases

Packages

Languages

License

ronentk/dbca-splitter

Folders and files

Latest commit

History

Repository files navigation

dbca-splitter

Installation

Usage

Generating data

Demo notebook

Split generation settings

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages