Replies: 6 comments
-
There is a step missing in the provided solution - symmetrising the final interaction matrix (saddle) # transposing 2nd and 3rd dimensions, leaving 1st dimension alone !
_sX = _sumX + _sumX.transpose(0,2,1)
_cX = _countX + _countX.transpose(0,2,1) it is unclear though why that symmetrising is there in the first place and when does it come into play ... Potentially useful for future fully sparse implementation of saddle |
Beta Was this translation helpful? Give feedback.
-
related PR: #484 |
Beta Was this translation helpful? Give feedback.
-
to me, this seems like a very useful extension of the saddle functionality! A simple back-of-the-envelope calculation:
Are these estimates correct? I can think of two solutions to mitigate these issues:
|
Beta Was this translation helpful? Give feedback.
-
100%, yes ! this would explode in memory very very quickly ... definitely . Yes, this could become another way of shooting yourself in the foot - but not the first one:
But enough of that ... Practical solutions:
|
Beta Was this translation helpful? Give feedback.
-
such ultimately flexible saddle_by_dist is also a flashback to existing (in sandbox now) |
Beta Was this translation helpful? Give feedback.
-
some examples just in case - here is how a saddle by distance looks like for WT-like and deltaRad21-like samples - biology aside - short range does look quite different (saddles are done using their own eigen vectors @25kb binsize, intra-arm only) so, yes, - we don't need all of those individual diagonals per se - but they are nice to have - to aggregate them into distance bins after the fact I personally haven't even check different chroms separately - that was purely theoretical excercise - I think still it would be useful |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
intro: saddles plots discrete/continuous by distance are useful and informative ...
"By distance" means - taking into account interactions within a certain distance range, e.g.
1MB<D<5MB
when calculating "the saddle" ...Currently, the "by distance" part is done by filling anything that does not belong to a selected distance band - i.e. outside of
min_diag
andmax_diag
withNaNs
:Instead one can generate saddles for every diagonal without significant performance penalty:
So, instead of existing
_accumulate
one could use something like that:This way one would accumulate saddles (sum and counts first) into 3D-stacked arrays, ala pileups/snipping:
S[diagonals, saddle_bins, saddle_bins]
- such that it would be easy to do by distance saddles like so:Full prototype solution below:
https://gist.github.com/sergpolly/ee39a452c1e30f12d5100b28f35f4ee0
Beta Was this translation helpful? Give feedback.
All reactions