This repository contains data tables used for plotting figures in the article Cross-platform motif discovery and benchmarking to explore binding specificities of poorly studied human transcription factors

Ilya E. Vorontsov, Ivan Kozin, Sergey Abramov, Alexandr Boytsov, Arttu Jolma, Mihai Albu, Giovanna Ambrosini, Katerina Faltejskova, Antoni J. Gralak, Nikita Gryzunov, Sachi Inukai, Semyon Kolmykov, Pavel Kravchenko, Judith F. Kribelbauer-Swietek, Kaitlin U. Laverty, Vladimir Nozdrin, Zain M. Patel, Dmitry Penzar, Marie-Luise Plescher, Sara E. Pour, Rozita Razavi, Ally W.H. Yang, Ivan Yevshin, Arsenii Zinkevich, Matthew T. Weirauch, Philipp Bucher, Bart Deplancke, Oriol Fornes, Jan Grau, Ivo Grosse, Fedor A. Kolpakov, The Codebook/GRECO-BIT Consortium, Vsevolod J. Makeev, Timothy R. Hughes, Ivan V. Kulakovskiy bioRxiv 2024.11.11.619379; doi: https://doi.org/10.1101/2024.11.11.619379

Figure F1:

Motif discovery and benchmarking pipeline and the collection of top-ranking motifs

F1B_experiments.csv, F1B_tools.csv – Contributions of different tools and experimental methods to:
- Top-ranking motif collection (TF count)
- Complete MEX set of benchmarked motifs (percentage)
F1C – Distributions of auROC values for all TF-dataset pairs calculated from the top-ranking motifs from each motif discovery tool selected by global benchmarking: tested on ChIP-Seq.
F1D – Distributions of auROC values for all TF-dataset pairs calculated from the top-ranking motifs from each motif discovery tool selected by global benchmarking: tested on all variants of genomic HT-SELEX (D).

Figure F2:

F2A.csv – Highest overall performance of the best motifs (one per TF) when training and testing on the same experiment type.
F2B.csv – Highest performance in cross-platform evaluation.
- Color scale: Median performance (higher = better)
- Box size: IQR (lower = better)
- Numbers: Number of tested TFs per tool and experiment combination

Figure F3:

Performance of motifs from artificial sequences in genomic prediction

F3A.csv, F3B.csv – Difference in best auROC values for genomic data (ChIP-Seq vs artificial: HT-SELEX, SMiLE-Seq, PBM)
F3C_ChIP-Seq_train.csv, F3C_GHT-SELEX_train.csv – Correlation of training vs test performance: ChIP-Seq, GHT-SELEX
F3D_ChIP-Seq_test.csv, F3D_GHT-SELEX_test.csv – Correlation of artificial training vs genomic test performance

Supplementary Figure SF1:

SF1A.csv – Number of experiments processed per motif discovery tool
SF1B.csv – Number of motifs per experiment type and tool
SF1C_tools.csv, SF1C_experiments.csv – Composition of overall top-20 motif collection, same as Figure F1 representation

Supplementary Figure SF2:

Top-ranking motif counts from tools in intra-/cross-platform benchmarking

SF2A.csv – Top-ranking motifs by tool, trained and tested on the same experiment type
SF2B.csv – Top-ranking motifs by tool, trained on other types, tested on a given experiment type

Supplementary Figure SF3:

Intra-/cross-platform benchmarking for GHT-SELEX and HT-SELEX subtypes

SF3A.csv – Training and testing on the same experiment type
SF3B.csv – Training and testing across platforms and within GHT/HT-SELEX subtypes (IVT, Lysate, GFPIVT)

Supplementary Figure SF4:

SF4A.csv – Fraction of cases where 1st/2nd/3rd motif from a tool run ranked highest overall
SF4B_ChIP-Seq.csv, SF4B_GHT-SELEX.csv – Distributions of pseudo-auROC values for top-ranking motifs across TFs
SF4C_ChIP-Seq.csv, SF4C_GHT-SELEX.csv – auROC distributions for zinc-finger TFs
SF4D.csv – Scatterplots of auROC (GHT-SELEX vs ChIP-Seq) for zinc-finger TFs

Supplementary Figure SF5:

SF5.csv – Correlation of motif performance within the same vs across different experiment types

Supplementary Figure SF6:

SF6.csv – Distributions of motif properties (length, GC%, info content)
- A: By discovery tool
- B: By experiment type

Supplementary Figure SF7:

SF7A.csv – Correlation of performance metrics with motif properties (length, GC%, info content) for ChIP-Seq & GHT-SELEX
SF7B.csv – Same as A for HT-SELEX and SMiLE-Seq benchmarks
SF7C.csv – Violin plots of info content variation (top 10 motifs/TF), split by experiment type

Supplementary Figure SF11:

Correlations of different benchmarking metrics considering all pairs of motifs and datasets

SF11A.csv – ChIP-Seq
SF11B.csv – GHT-SELEX

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

This repository contains data tables used for plotting figures in the article Cross-platform motif discovery and benchmarking to explore binding specificities of poorly studied human transcription factors

Figure F1:

Figure F2:

Figure F3:

Supplementary Figure SF1:

Supplementary Figure SF2:

Supplementary Figure SF3:

Supplementary Figure SF4:

Supplementary Figure SF5:

Supplementary Figure SF6:

Supplementary Figure SF7:

Supplementary Figure SF11:

About

Uh oh!

Releases

Packages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
F1B_experiments.csv		F1B_experiments.csv
F1B_tools.csv		F1B_tools.csv
F1C.csv		F1C.csv
F1D.csv		F1D.csv
F2A.csv		F2A.csv
F2B.csv		F2B.csv
F2C.csv		F2C.csv
F2D.csv		F2D.csv
F3A.csv		F3A.csv
F3B.csv		F3B.csv
F3C_ChIP-Seq_train.csv		F3C_ChIP-Seq_train.csv
F3C_GHT-SELEX_train.csv		F3C_GHT-SELEX_train.csv
F3D_ChIP-Seq.csv		F3D_ChIP-Seq.csv
F3D_GHT-SELEX.csv		F3D_GHT-SELEX.csv
README.md		README.md
SF12A.csv		SF12A.csv
SF12B.csv		SF12B.csv
SF1A.csv		SF1A.csv
SF1B.csv		SF1B.csv
SF1C_experiments.csv		SF1C_experiments.csv
SF1C_tools.csv		SF1C_tools.csv
SF2A.csv		SF2A.csv
SF2B.csv		SF2B.csv
SF3A.csv		SF3A.csv
SF3B.csv		SF3B.csv
SF4A.csv		SF4A.csv
SF4B_ChIP-Seq.csv		SF4B_ChIP-Seq.csv
SF4B_GHT-SELEX.csv		SF4B_GHT-SELEX.csv
SF4C_ChIP-Seq.csv		SF4C_ChIP-Seq.csv
SF4C_GHT-SELEX.csv		SF4C_GHT-SELEX.csv
SF4D.csv		SF4D.csv
SF5.csv		SF5.csv
SF6.csv		SF6.csv
SF7A.csv		SF7A.csv
SF7B.csv		SF7B.csv
SF7C.csv		SF7C.csv

autosome-ru/TSV_data_for_Codebook_MEX_figures

Folders and files

Latest commit

History

Repository files navigation

This repository contains data tables used for plotting figures in the article Cross-platform motif discovery and benchmarking to explore binding specificities of poorly studied human transcription factors

Figure F1:

Figure F2:

Figure F3:

Supplementary Figure SF1:

Supplementary Figure SF2:

Supplementary Figure SF3:

Supplementary Figure SF4:

Supplementary Figure SF5:

Supplementary Figure SF6:

Supplementary Figure SF7:

Supplementary Figure SF11:

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages