-
Notifications
You must be signed in to change notification settings - Fork 23
Open
Description
Hi Marcus,
I've got a couple of questions regarding the api usage and raw signal analysis. Sorry in advance if some of the questions are super basic as I'm not too familiar with the remora api yet.
- The
region_basecalls = io_read.extract_ref_reg(ref_reg = ref_region, signal_type="norm")
returns an object, sufficient for mapping raw signal data as it contains the seq_to_sig array. I'm wondering how these values are transformed as I've analzyed the pA values from sample to sample and my modified sample has roughly ~2pA higher values on average, comapred to non-modified sample (in non-modified sites, such as explicitly selected large kmers of 50-60 bases in length without any CG sites). Is it possible for such difference is caused by the samples being run at different times or even cells or would this strictly be caused by the DNA sample being modified with bulky mods at every CG site? I also wonder what's the best properly transform the norm signal values to represent something like the printout of remora analyze command. - I've encountered some reads, where the seq_to_sig array doesn't make any sense to me. The only logical conclusion I've come to is that somehow the strand "skips" and goes through way faster than intended? I'm posting a single example that gets returned by <io_read.extract_ref_reg>. In this particular case, I'm looking at 31-mers centered at CG sites. Would you perhaps help in understanding how such a read happens? Also, what exactly does sig_start denote?
read = io_read.extract_ref_reg()
ReadRefReg(
read_id='784882a1-b632-4512-8322-6a7227f6183e',
norm_signal=array([
55827.85090193, 55757.59090376, 55944.95089888, 55804.43090254,
55921.53089949, 55687.33090559, 55593.65090803, 55289.19091596,
55289.19091596, 55476.55091108, 55804.43090254, 56272.83089034,
55476.55091108, 54446.07093792, 55429.7109123 , 55218.93091779,
55687.33090559, 55640.49090681, 55195.5109184 , 55008.15092328,
55781.01090315, 55382.87091352, 55851.27090132, 56249.41089095,
56296.25088973, 56553.87088302, 59504.79080615, 60160.55078907,
60137.13078968, 59715.57080066, 59434.53080798, 59668.73080188,
59738.99080005, 59715.57080066, 59879.51079639, 60020.03079273,
59668.73080188, 58614.83082933, 57279.89086411, 56577.29088241,
56577.29088241, 56670.97087997, 56319.67088912, 59434.53080798,
59200.33081408, 59668.73080188, 59738.99080005, 59762.41079944,
59481.37080676, 59668.73080188, 59106.65081652, 59528.21080554,
59387.6908092 , 60160.55078907, 59457.95080737, 59973.19079395,
59598.47080371, 59598.47080371, 59317.43081103, 59340.85081042,
59247.17081286, 59926.35079517, 59856.090797 , 60020.03079273,
59645.31080249, 59598.47080371, 59130.07081591, 59645.31080249,
59996.61079334, 59949.77079456, 60301.07078541, 59926.35079517,
59856.090797 , 60066.87079151, 59598.47080371, 60020.03079273,
59996.61079334, 59528.21080554, 59575.05080432, 59692.15080127,
59387.6908092 , 59645.31080249, 59528.21080554, 59902.93079578,
59364.27080981, 59270.59081225, 57537.5108574 , 55429.7109123 ,
55336.03091474, 55593.65090803, 56108.89089461, 56670.97087997,
54422.65093853, 53579.53096049, 54399.23093914, 55172.09091901,
55289.19091596, 55031.57092267, 55195.5109184 , 55218.93091779,
55499.97091047, 54797.37092877, 55078.41092145, 55265.77091657,
54961.3109245 , 54820.79092816, 55054.99092206, 55617.07090742,
55593.65090803, 55734.17090437, 55921.53089949, 55663.9109062 ,
55734.17090437, 55804.43090254, 55546.81090925, 55781.01090315,
55382.87091352, 55406.29091291, 55359.45091413, 55944.95089888,
55382.87091352, 55663.9109062 , 58029.33084459, 63579.87070001,
63533.03070123
]),
seq='AGTTAATGAGTGTGTCGAAAAACCTGAAATC',
seq_to_sig_map=array([
0, 0, 1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6,
6, 7, 7, 8, 8, 9, 9, 10, 11, 23, 29, 41, 53,
83, 89, 95, 113, 119, 125
]),
ref_reg=RefRegion(ctg='JB0001.0', strand='+', start=17711, end=17742),
sig_start=22687
)
- We noticed some modified reads have 20-50% more <.norm_signal> entries. I assume this correlates to dwell time? In any case, is dwell time information stored in the <io_read> object?
- Is the <ReadRefReg.seq> the basecalled sequence or the ref? I assume it's the basecalled seq.
As always, I appreciate any help on this
Metadata
Metadata
Assignees
Labels
No labels