Skip to content

File-Augmented Schema #1151

Open
Open
@ethho

Description

@ethho

Feature Request

Problem Statement

While the database provides data structure, efficient queries, and transaction support, files are still preferred for strong large objects such as images, numerical arrays, movies, etc. Users like to have direct read-only access to the files without mediation by the database. Storing large objects in MySQL tables has adverse performance effects on data queries.
DataJoint has previously implemented several approaches to address some aspects of this problem:

  1. Storing file paths as varchar strings with the user responsible for the file management.
  2. The attach and attach@store datatype to store files, preserving the filename but not the folder structures
  3. The blob@store datatype for storing serialized data structures in external files
  4. The filepath@store datatype to allow organizing files and folders under users' control
  5. The AdapatedType datatype that allows defining custom logic to apply for reading and writing.

In particular, the SpyGlass pipeline Loren Frank's lab relied on the filepath and AdaptedType features to implement NWB file management.
None of these methods simultaneously address the following desiderata:

  1. A logical, consistent file folder structure that's prescribed by DataJoint, based on the schema design and primary key values
  2. Keeping files in their original form and extension so that they can read and used outside DataJoint. Files should be accessible for reading without datajoint-python or DB access, and files should maintain their native file extensions and MIME types (as opposed to serializing into another format).
  3. Files are copied into their location and referenced in a single step as part of the insert and fetch operations.
  4. Files are deleted when the table entries referencing them are deleted
  5. Data consistency through transaction processing: inserts and deletes are executed as atomic transactions that can rollback when the transaction fails and where concurrent transactions do not lead to inconsistencies.

We need a solution for file management that simultaneously addresses all of these desiderata.

Metadata

Metadata

Labels

enhancementIndicates new improvementsneeds-discussionIssues requiring further development review and verify impact.staleIndicates issues, pull requests, or discussions are inactive

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions