This repository contains data, preprocessing pipeline, figure scripts, and supplementary materials for the manuscript Systematic benchmarking of mass spectrometry-based antibody sequencing reveals methodological biases.
All raw mass spectrometry data are publicly available via ProteomeXchange under the identifier PXD055846.
This repository includes both the raw outputs from proteomics tools and the preprocessed peptide data:
-
data/raw/
Contains the raw output files generated by the following tools:- Casanovo (Yilmaz et al., 2024)
- MaxQuant (Cox & Mann, 2008)
- MSFragger (Kong et al., 2017)
-
data/processed/
Contains preprocessed peptide dataframes:peptides_preprocessed_TP.tsv
– Peptides that map to the monoclonal antibody (mAb) sequence present in the sample (True Positives) after filtration of potential contaminations (used for Figures 3-6).peptides_preprocessed_TP_FP.tsv
– All mAb-related peptides (both True Positives and False Positives) after filtration of potential contaminations (used for Figure 2).peptides_all_TP_FP_with_blanks.tsv
– All mAb-related peptides before filtration of potential contaminations, including those from blank samples.
See src/data_preprocessing/run_data_preprocessing.R
for the full data preprocessing pipeline.
The metadata/
directory contains supporting reference files used throughout the analysis, including:
- Monoclonal antibody (mAb) sequences
- Reference protein database
- Sample descriptions
Scripts used to generate each main figure are located in the corresponding subfolders (fig2/
, fig3/
, etc.).
supplementary_files
folder contains the following Supplementary files:
- Supplementary File 1: MaxQuant parameter file example for the tryptic samples from experimental replicate 1
- Supplementary File 2: MSFragger parameter file example for the tryptic samples from experimental replicate 1
- Supplementary File 3: Casanovo config file
- Supplementary File 4: Stitch example config file for h9C12 WT
If you use this work, please cite the following: (to be updated once published)