Skip to content

EasyTutorial

NorthblueM edited this page Mar 30, 2025 · 4 revisions

[TOC]


English

Choosing MS Data Format

  • Explanation of MS Data Format Options

    • RAW (Orbitrap):
      • Earlier Thermo Orbitrap instruments
      • Does not support FAIMS
    • RAW (FAIMS)(Orbitrap New Instrument, Ascend):
      • FAIMS-enabled, including newer and earlier Thermo Orbitrap instruments
      • Newer Thermo Orbitrap instruments (e.g., Ascend), including those with or without FAIMS
    • RAW (Astral)(FAIMS):
      • Thermo Astral instruments, with or without FAIMS
    • d (timsTOF):
      • Bruker timsTOF instruments
    • MGF (Not Recommended):
    • PF2:
      • Internal binary spectral file used by the pFindStudio series software
  • If encounter any issues, please refer to: Spectrum Extraction Fails

Recommendations for MS Data Format

  • Directly loading RAW files is recommended

    • pLink3 uses its internal plugin called pParse to extract spectra and convert them to binary .pf2 files
    • pParse provides precursor mass correction and monoisotopic peak detection functions
    • Additionally, pParse supports exporting mixed spectra, meaning that multiple precursors are fragmented in a single MS2
    • The binary .pf2 format allows for relatively faster read/write operations
    • Similarly, pLink2 also does not recommend MGF input and suggests directly loading RAW files
  • MGF is not recommended

    • Although MGF is widely used, it is not a standardized format
      • In MGF files, spectrum header information requirements vary across software; aside from the minimal definitions, there is no universally accepted standard
    • If the MGF file lacks complete information, unexpected bugs might occur during pLink searches
    • Note: Under certain settings, pParse can also export .mgf files, but they are different from those extracted by other tools. The .mgf files extracted by third-party tools may lack the additional features provided by pParse mentioned above, and thus do not achieve optimal pLink identification performance.
  • If the instrument type lacks native pLink support

    • For instruments not natively supported by pLink, you may use third-party tools like MSConvert to export MGF as input. (Functional but suboptimal - better than nothing)

Choosing Multi-process or Multi-thread

  • For versions greater than pLink3.0.16, a new multi-process mode (Multi-process) has been added.

  • Multi-thread mode (Multi-thread)

    • The entire search workflow is handled by only 1 process. On this single process, multiple threads are allocated.
    • RAW files are searched sequentially, meaning the next RAW file will only be searched after the current one is completed.
  • Multi-process mode (Multi-process)

    • Some parts of pLink only support single-thread operation, such as serial file reading and writing. During these parts, CPU resources cannot be fully utilized, which affects the overall search speed.
    • The motivation behind designing Multi-process is to maximize the utilization of CPU resources as configured in the parameters.
    • In Multi-process mode, the number of processes and threads per process are automatically allocated by the program, but the total CPU usage will not exceed the configured CPU Number.
    • Multi-process searches multiple RAW files in parallel, merging the results at the end and performing FDR quality control together.

Recommendations for Multi-process or Multi-thread

  • Whether you choose Multi-process or Multi-thread, it has almost no effect on the identification results. It only relates to the allocation of CPU resources.
  • We strongly recommend using Multi-process when there are many RAW files, as it will be faster.
  • When there is only one RAW file, there is no difference between Multi-process and Multi-thread.
  • If choosing Multi-process gets stuck at the following log, and you do not see the process of loading spectra, you can try switching to Multi-thread to see if it runs normally.
    [MultiProcess] Start...
    

Add New Crosslinker

  • Supported Crosslinker Types

    • Chemical crosslinking, mass spectrometry non-cleavable (MS-non-Cleavable), e.g., DSS
    • Chemical crosslinking, mass spectrometry cleavable (MS-Cleavable), e.g., DSSO
    • Endogenous crosslinking, e.g., disulfide bonds (SS), zero-length crosslinking between amino acids (e.g., Tyr-Tyr)
  • Method: Open pConfig.exe and select the Linkers tab

  • To configure a new crosslinker, mimic existing ones

    • For example, in the above screenshot, the parameters for the MS-Cleavable DSSO crosslinker
  • Parameter Definitions

    • Name: Crosslinker name
      • Note: The name must not contain spaces or special characters
      • Avoid special characters, including: Chinese, =, non-English letters, etc.
      • Recommendation: It is best to consist only of numbers, English letters, -, _, etc.
    • AlphaSites/BetaSites: Reactive sites of the crosslinker
      • [ denotes the protein N-terminus, ] denotes the protein C-terminus
      • For asymmetric crosslinkers, AlphaSites and BetaSites can be different
      • For multiple reactive sites, list the amino acid letters (e.g., DE)
    • LinkerMass: The extra mass added after the crosslinker reaction
    • MonoMass: The extra mass when the crosslinker reacts at only one end
      • If multiple monolink forms exist, fill in only the most significant one here; the others can be set as a variable modification in the search parameters
    • LinkerComposition: The chemical composition added after the crosslinker reaction
    • MonoComposition: The chemical composition added when the crosslinker reacts at only one end
    • MSCleavable: Whether the crosslinker arm are cleavable during MS fragmentation
    • LongMass/ShortMass: The residual modification mass after crosslinker arm cleavage
      • Long is the larger residual mass, Short is the smaller one
      • If more than two cleavage types exist, fill in only the two most significant ones
  • Important: LinkerComposition and MonoComposition must be filled in correctly, as the software uses these to recalculate crosslinker related masses.

  • If unsure how to set the chemical composition or mass of the crosslinker, it is recommended to consult the official documentation or literature

  • Isotope Labeled Crosslinkers

    • H or 1H denotes hydrogen, 2H denotes deuterium.
    • N or 14N denotes nitrogen, 15N denotes its isotope.
    • C denotes carbon, 13C denotes its isotope.
    • For example: Configuration for DSS-D12
      • DSS-D12 means 12 hydrogen atoms in DSS are replaced by 12 deuterium atoms
      • Reference documentation: DSS-H12/D12
      • LinkerMass: 150.1434042
      • MonoMass: 168.1539675
      • LinkerCompostion: C(8)1H(-2)2H(12)O(2)
      • MonoCompostion: C(8)2H(12)O(3)


中文

选择MS Data Format

  • MS Data Format各项的解释
    • RAW (Orbitrap):
      • 较早的Thermo的Orbitrap仪器
      • 不支持FAIMS
    • RAW (FAIMS)(Orbitrap New Instrument, Ascend):
      • 使用了FAIMS,包括较早和较新的Orbitrap仪器
      • Thermo较新的Orbitrap仪器,例如Ascend,包括使用或没有使用FAIMS
    • RAW (Astral)(FAIMS):
      • Thermo的Astral仪器,包括使用或没有使用FAIMS
    • d (timsTOF):
      • Bruker的timsTOF仪器
    • MGF (Not Recommended):
    • PF2:
      • pFindStudio系列软件内部的二进制谱图文件
  • 遇到问题请参考: 若没能成功提取谱图

MS Data Format选择建议

  • 推荐直接载入RAW文件

    • pLink3调用内部插件pParse提取谱图,并转化为二进制格式的.pf2文件
    • pParse有precursor mass校正和monoisotopic peak检测功能
    • pParse支持混合谱导出,混合谱意为多个precursors共碎裂在一张MS2
    • 二进制格式pf2的读写相对更快
    • 同样,pLink2也是不推荐MGF输入,建议直接载入RAW
  • 不推荐选择MGF

    • 虽然MGF使用广泛,但MGF不是标准格式
      • 不同软件要求的MGF的谱图头信息存在差别,除最低定义外,没有公认的规则
    • MGF所含信息不全,pLink搜索时可能会出现奇怪的bug
    • 注意pParse部分设置下也导出.mgf文件,但和其它工具提取的是不同的。第三方工具提取的.mgf可能没有以上说的pParse附加的特性,达不到最佳的pLink鉴定性能
  • 若有仪器类型pLink没有原生支持

    • 若某些仪器类型pLink没有原生支持,也可以使用第三方工具导出MGF格式作为输入,例如MSConvert。(仅限跑通pLink软件,有总比没有强)

选择Multi-processMulti-thread

  • 大于pLink3.0.16的版本,新增多进程模式(Multi-process)。

  • 多线程模式(Multi-thread

    • 搜索的全流程,仅1个process在工作。在这1个process的基础上,分配多线程。
    • RAW文件是一个接一个顺序搜索,即仅上一个RAW搜索完毕,才会进行下一个RAW的搜索。
  • 多进程模式(Multi-process

    • pLink的部分环节仅支持单线程工作,例如串行的文件读写。在这些环节时,无法充分利用CPU资源,影响整体的搜索速度。
    • 设计Multi-process的动机,即是尽可能充分利用设定的CPU资源。
    • Multi-process模式,进程数和每个进程的线程数由程序自动分配,但总的CPU利用数不会超过设定的CPU Number
    • Multi-process会并行地同时搜索多个RAW,在最后合并结果,并一起进行FDR质控。

Multi-processMulti-thread选择建议

  • 不管选择Multi-process, 还是Multi-thread,对鉴定结果几乎没有影响,仅与CPU资源的分配有关系。
  • 我们强烈建议在RAW文件比较多时,使用Multi-process速度会更快
  • 一个RAW文件时,Multi-processMulti-thread没有区别。
  • 如果选择Multi-process一直卡在如下日志,没有看到载入谱图的过程,可以尝试切换Multi-thread看是否能正常运行
    [MultiProcess] Start...
    

新增交联剂

  • 支持交联剂类型

    • 化学交联,质谱不可断裂(MS-non-Cleavable),例如DSS
    • 化学交联,质谱可断裂(MS-Cleavable),例如DSSO
    • 内源性交联,例如二硫键(SS)、氨基酸零距离交联(例如Tyr-Tyr
  • 方法:打开pConfig.exe,选择Linkers

  • 配置新交联剂,可模仿已有交联剂

    • 例如上图示例,MS-Cleavable的DSSO交联剂参数
  • 参数含义

    • Name:交联剂名称
      • 注意:交联剂名称不能含空格或特殊字符
      • 不要含特殊字符,包括:中文、=、小语种等
      • 建议:最好仅由数字、英文字母、-_等组成
    • AlphaSites/BetaSites:交联反应位点
      • [表示蛋白质N端,]表示蛋白质C端
      • 非对称交联剂,AlphaSitesBetaSites可不同
      • 多反应位点,可填多个氨基酸字母,例如DE
    • LinkerMass:交联剂反应后,多出的质量
    • MonoMass:交联剂仅一端反应,多出的质量
      • 如存在多种monolink形式,此处可仅填其中最显著的一种,其它的可以当做可变修饰设置为搜索参数
    • LinkerComposition:交联剂反应后,多出的化学组成
    • MonoComposition:交联剂仅一端反应,多出的化学组成
    • MSCleavable:交联臂在质谱碎裂过程中是否可断裂
    • LongMass/ShortMass:交联臂断裂后,残留的修饰质量
      • Long为残留质量大者,Short为残留质量小者
      • 若存在超过2种断裂类型,此处可仅填最显著的2种
  • 注意LinkerCompositionMonoComposition需要填写正确,软件内部会利用化学组成,再计算一遍交联剂的相关质量。

  • 在不知道怎么设置交联剂化学组成或质量时,建议查阅官方文档或文献

  • 同位素标记的交联剂

    • H1H表示氢原子,2H表示氘原子
    • N14N表示氮原子,15N表示其同位素
    • C表示碳原子,13C表示其同位素
    • 例如:DSS-D12的参数配置
      • DSS-D12意为DSS的12个氢原子被12个氘原子取代
      • 参考文档:DSS-H12/D12
      • LinkerMass: 150.1434042
      • MonoMass: 168.1539675
      • LinkerCompostion: C(8)1H(-2)2H(12)O(2)
      • MonoCompostion: C(8)2H(12)O(3)