Skip to content

Insertions context issue #209

@diegogarcialopez

Description

@diegogarcialopez

Hi AlexandrovLab,
I was working on some samples when I got some unexpected results. One of these samples seems to have some T/A insertions at homopolymer regions, however when I used SigProfilerMatrixGenerator to compute the mutational matrix there was no mutation belonging to the 1:Ins:T:5 class. I decided to look closely on IGV and it definitely looks like these indels should be considered as 1:Ins:T:5.

Therefore, I went to TCGA to check this issue with other samples. However, the same issue appeared. I tried computing the mutational matrix for the sample TCGA-DM-A1D8. This sample contains 2 insertions that look like they should be classified as 1:Ins:T:5 according to the cBioPortal data from IGV.

Image Image

But none of them appears to belong to that class when computing the mutational matrix. I used different versions of SigProfilerMatrixGenerator (v1.1, v1.2 and v.1.2.31) as well as the SigProfilerAssignment webtool (https://cancer.sanger.ac.uk/signatures/assignment/app/), but all give the same results.

Mutational_Profile_ID.pdf

I find this weird, because around 2 years ago I computed the mutational matrixes for this exact TCGA sample and I do had these 2 indels classified as 1:Ins:T:5 mutations. Moreover, according to the literature this should be one of the most prevalent INDEL types in cohorts and I it is completly absent in some mutational matrixes that I have reciently computed.

As a side not, I manually added 1 bp to the start and end positions of these insertions and then they were called as 1:Ins:T:5 mutations. I was worried this could affect other inserions, but let me know whether you think this would this be a potential solution.

Please let me know if I am doing something wrong. If not I would appreciate if you could let me know whether there is a quick solution or a specific version of the package that does not have this potential issue that I could use in the meantime.

Thank you very much in advance.

For reproducibility, this is the code I have used to compute the mutational matrix (tested in Google Colab and in a Linux based HPC):

pip install SigProfilerMatrixGenerator

from SigProfilerMatrixGenerator.scripts import SigProfilerMatrixGeneratorFunc as matGen
from SigProfilerMatrixGenerator import install as genInstall
genInstall.install('GRCh37', bash=True)

matrices = matGen.SigProfilerMatrixGeneratorFunc("test", "GRCh37", "/content/test/", plot=False, exome=False, bed_file=None, chrom_based=False, tsb_stat=False, seqInfo=False, cushion=100)

Here it is the input file (I just added a .txt extension because otherwise I could not upload it):

TCGA-DM-A1D8_SigProfiler_input.maf.txt

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions