-
Notifications
You must be signed in to change notification settings - Fork 4
Description
I am trying to simulate signal data by creating a kmer-model for a modified base. My data comes from R10.4.1 flow cell and thus needs a 9-mer kmer model. I have a sequence with a site-specific modification that causes a large decrease in the current. All I want to do is try to simulate the signal by only varying the parameters for the 9-mers that overlap the modification. It would have been very straightforward if I could just take the ont R10.4.1 400 bps 2 column file and simply add 9 lines to the file at the end for only the 9 modified 9mers in my sequence. It seems that I cannot do that or that I am not doing it correctly. The manual says that I have to use a file formatted in the same was as for f5c (r9.4_450bps.nucleotide.6mer.template.model). When I look at that file, it is a 6mer kmer library with a 6 column structure. The file for r9.4_450bps.cpg.6mer.template.model has a 5 column structure. There are no examples of 9mer libraries. Is the problem that I would need a 5^9 line file for A, C, G, T, X which would be unwieldly? Is that why I have to use a 6mer library, which translates to 5^6 lines? Strangely, the errors I get seem to indicate that it could hand 9mer libraries but I don't know how to get it to do so.
For example I get the error:
[read_model::INFO] k-mer size in file /home/jst/models/9mer_XX_CPD_f5c.txt is 9
[read_model::ERROR] File /home/jst/models/9mer_XX_CPD_f5c.txt has too many entries. Expected 262144 kmers in the model, but file had more than that At src/model.c:114
This is what I had in the header:
#model_name R10.4.1_400bps_9mer_custom
#kit R10.4.1_400bps
#strand template
#k 9
#alphabet dnaX
If it did recognize it as a 9mer library with a modified base X, then it should have anticipated 5^9 9-mers, not 4^9 which is 262144.
Is there anyway to simply use the two column ONT 9mer library with a few added modified 9-mers? My case is a bit non standard because my modifications run in tandem, i.e., XX.