Skip to content

Amur-N/Semi-structured-Dataset-Collection

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Semi-structured Dataset Collection

An innovative benchmark for semi-structured textual data parsing, profiling and analysis. This repo collects 100+ open-source semi-structured datasets (TXT, LOG, CSV, JSON, XML, PHP, YAML, HMM, FASTQ etc.), mostly from GitHub. Visiting corresponding repositories or links in README.md of each directory to get more datasets (hundreds of in total).

This repo contains some excerpted and modified version of original datasets. Those excerpted and modified versions are freely available for research or academic work. However, for the original datasets collected by this repo, please comply with the corresponding source's license before use.

For any usage or distribution of the datasets, please refer to this repository's URL and our paper StructVizor: Interactive Profiling of Semi-Structured Textual Data.

About

An open collection includes 100+ semi-structured textual datasets. (LOG datasets, TXT datasets, CSV datasets etc.)

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published