Synthetic data generation toolkit for civic transparency research and testing.
pip install civic-transparency-sdk- Synthetic Data Generation: Create realistic transparency data for testing and research without requiring real user data. Generate controlled datasets with reproducible seeds for studying information dynamics.
- Internal Data Structures: Simulation-specific types (WindowAgg,ContentHash,TopHash) for generating and manipulating synthetic transparency data.
- Database Integration: Convert generated data to DuckDB/SQL databases for analysis with ready-to-use schemas and indexing patterns.
- CLI Tools: Command-line utilities for generating worlds, converting formats, and managing synthetic datasets.
Note: This SDK generates synthetic data for research/testing. For implementing the PTag API specification, see civic-transparency-ptag-types, which provides the official API response types.
Generate synthetic data:
# Activate environment
source .venv/bin/activate  # Linux/Mac
# or
.venv\Scripts\activate     # Windows
# Generate baseline world
ct-sdk generate --world A --topic-id aa55ee77 --out world_A.jsonl
# Convert to database
ct-sdk convert --jsonl world_A.jsonl --duck world_A.duckdb --schema schema/schema.sqlThe generated DuckDB files are ready for analysis with any SQL-compatible tools or custom analysis scripts.
- Academic Research: Generate controlled datasets with known parameters for studying information dynamics, coordination patterns, and transparency system behaviors.
- Algorithm Development: Build and test transparency tools using synthetic data that mimics real-world patterns without privacy concerns.
- Testing & Validation: Create reproducible test datasets for developing transparency systems without requiring real user data.
- Education: Provide realistic datasets for teaching transparency concepts, data analysis, and system design.
All generation is deterministic:
- Seed-based randomization: Same seed produces identical datasets
- Version tracking: Metadata includes package versions
- Parameter logging: All generation settings preserved in output
- Schema versioning: Database structures fully documented
Example seeds:
- World A (baseline): 4242
- World B (influenced): 8484
ci.transparency.sdk/
├── cli/            # Command-line interface (ct-sdk)
├── digests.py      # Content fingerprinting (SimHash64, MinHashSig)
├── hash_core.py    # Content identification (HashId, ContentHash, TopHash)
├── ids.py          # ID management (WorldId, TopicId)
├── io_schema.py    # JSON serialization utilities
└── window_agg.py   # Window aggregation structure (WindowAgg)
- Civic Transparency PTag Spec - Official API specification
- Civic Transparency PTag Types - Python types for PTag API responses (use this for API implementation)
- Civic Transparency Verify - Statistical verification tools (private)
This package provides synthetic data generation for research and testing. It does not include:
- Detection algorithms or thresholds
- Verification workflows or assessment criteria
- Operational patterns or alerting rules
These are maintained separately to prevent adversarial reverse-engineering while enabling legitimate transparency research.
Full documentation at: civic-interconnect.github.io/civic-transparency-py-sdk/
- Usage Guide - Getting started and common workflows
- CLI Reference - Command-line interface details
- SDK Reference - Core Python APIs
- Schema Reference - Database schemas and integration
See CONTRIBUTING.md for development setup and guidelines.
This specification follows semantic versioning. See CHANGELOG.md for version history.
MIT © Civic Interconnect