Unclear Cooler.pixels().fetch behavior

Just wanted to report a behavior which I assume is intentional but might be misleading (or at least it was for me). 

When using ```Cooler.pixels().fetch("<some_genomic_region>")```, an instance of the class ```RangeSelector1D``` is returned. This selector returns all the pixels where the first bin id belongs to the provided interval. As an example, using the `yeast.10kb.cool` file in the ```tests/data``` folder:

```python
from cooler import Cooler

handle = Cooler("yeast.10kb.cool")
print(handle.pixels().fetch("chrII"))
```

returns:

```
        bin1_id  bin2_id  count
36311        33       33   2433
36312        33       34   7410
36313        33       35   3444
36314        33       36    475
36315        33       37   1741
...         ...      ...    ...
128531      114     1209      1
128532      114     1221      2
128533      114     1223      2
128534      114     1224      2
128535      114     1225     12

[92225 rows x 3 columns]
```

which checks out since:

```python
print(handle.extent("chrII"))  # results in (33, 115)
```

The issue (in my opinion) is that it does **not** return all pixels with at least one bin in the region. There can be pixels where the second bin belongs to the region and yet they are not returned.

```python
print(handle.pixels()[:].query("bin2_id >= 33 and bin2_id < 115 and bin1_id < 33"))
```

returns:

```
       bin1_id  bin2_id  count
28           0       33      3
29           0       34     41
30           0       35     48
31           0       36      9
32           0       37     61
...        ...      ...    ...
36110       32       65      1
36111       32       75      1
36112       32       90      2
36113       32      101      1
36114       32      113      1

[2460 rows x 3 columns]
```

I understand why this might be the case. There is no easy way of indexing the `bin2_id` column, therefore finding these pixels would require iterating and filtering all pixels prior to the retrieved slice. This would clash with the lazy fetching and most likely be slow. 

Understanding why this behavior might not change, I believe it should be clearly stated in the documentation (I did not find anything about it but I might have missed it), since one could get misleading results. I would be happy to open a PR to either change or document the behavior if needed.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Unclear Cooler.pixels().fetch behavior #455

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Unclear Cooler.pixels().fetch behavior #455

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions