Skip to content

Conversation

@hanslovsky
Copy link
Contributor

Power users may want to manage their CellArrDataset more directly with tiledb.Array or uris that do not share a common prefix. This PR provides an example implementation of how to achieve this by extracting out a common base class that is then extended to reproduce the existing class with the implemented safeguards:

  1. class _CellArrDatasetBase: base class that is constructed with tiledb.Arrays: This does not manage (open/close) any arrays. An additional check for read-only is added to the constructor
  2. class _CellArrDatasetUri(_CellArrDatasetBase): this class is constructed with uris pointing to existing tiledb.Arrays. This class opens arrays from these uris (each uri is taken as is, no prefix prepended) and closes them via __del__. The arrays are passed into super().__init__
  3. CellArrDataset(_CellArrDatasetUri): this class has the same interface as before. In the constructor, it simply does some string concatenations to produce the uris that are then passed into super().__init__.

Most users should use the existing class (3). Power users can fall back to (1) or (2) for extra flexibility if needed. The _ prefix of (1) and (2) indicates that people who use those should not expect support and will need to debug on their own if they use them incorrectly.

Note: This is a draft PR to start a discussion around such power user classes. In my use case, I am planning on a processing pipeline that maps one CellArrDataset to a new one in each step. The input is considered immutable, i.e. I currently have to copy the metadata each time, even when only the matrix data changes. Using (1) and (2) allows for much greater flexibility in such scenarios and avoids unnecessary data replication.

…wer users

This will allow to operate on tiledb arrays directly
@hanslovsky hanslovsky force-pushed the propose-dataset-relaxation branch from b379dcd to d4ae610 Compare January 23, 2025 15:59
@jkanche
Copy link
Member

jkanche commented Feb 25, 2025

Let me know if cellarr_array works for the usecases you are thinking of. If it does, then we can either close or migrate these changes to that repository

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants