Skip to content

Commit 088093d

Browse files
Merge pull request #35 from kabilar/main
Implement `find_full_path` within `ephys` modules
2 parents 1fdbcf1 + 6f9507c commit 088093d

File tree

8 files changed

+104
-136
lines changed

8 files changed

+104
-136
lines changed

.gitignore

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,6 @@
1+
# User data
2+
.DS_Store
3+
14
# Byte-compiled / optimized / DLL files
25
__pycache__/
36
*.py[cod]

README.md

Lines changed: 28 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,4 @@
11
# DataJoint Element - Array Electrophysiology Element
2-
DataJoint Element for array electrophysiology.
32

43
This repository features DataJoint pipeline design for extracellular array electrophysiology,
54
with ***Neuropixels*** probe and ***kilosort*** spike sorting method.
@@ -13,12 +12,16 @@ ephys pipeline.
1312

1413
See [Background](Background.md) for the background information and development timeline.
1514

16-
## The Pipeline Architecture
15+
## Element architecture
1716

1817
![element-array-ephys diagram](images/attached_array_ephys_element.svg)
1918

2019
As the diagram depicts, the array ephys element starts immediately downstream from ***Session***,
21-
and also requires some notion of ***Location*** as a dependency for ***InsertionLocation***.
20+
and also requires some notion of ***Location*** as a dependency for ***InsertionLocation***. We
21+
provide an [example workflow](https://github.com/datajoint/workflow-array-ephys/) with a
22+
[pipeline script](https://github.com/datajoint/workflow-array-ephys/blob/main/workflow_array_ephys/pipeline.py)
23+
that models (a) combining this Element with the corresponding [Element-Session](https://github.com/datajoint/element-session)
24+
, and (b) declaring a ***SkullReference*** table to provide Location.
2225

2326
### The design of probe
2427

@@ -45,14 +48,24 @@ This ephys element features automatic ingestion for spike sorting results from t
4548
+ ***WaveformSet*** - A set of spike waveforms for units from a given CuratedClustering
4649

4750
## Installation
48-
```
49-
pip install element-array-ephys
50-
```
5151

52-
If you already have an older version of ***element-array-ephys*** installed using `pip`, upgrade with
53-
```
54-
pip install --upgrade element-array-ephys
55-
```
52+
+ Install `element-array-ephys`
53+
```
54+
pip install element-array-ephys
55+
```
56+
57+
+ Upgrade `element-array-ephys` previously installed with `pip`
58+
```
59+
pip install --upgrade element-array-ephys
60+
```
61+
62+
+ Install `element-interface`
63+
64+
+ `element-interface` is a dependency of `element-array-ephys`, however it is not contained within `requirements.txt`.
65+
66+
```
67+
pip install "element-interface @ git+https://github.com/datajoint/element-interface"
68+
```
5669
5770
## Usage
5871
@@ -65,12 +78,12 @@ To activate the `element-array-ephys`, ones need to provide:
6578
+ schema name for the ephys module
6679
6780
2. Upstream tables
68-
+ Session table
69-
+ SkullReference table (Reference table for InsertionLocation, specifying the skull reference)
81+
+ Session table: A set of keys identifying a recording session (see [Element-Session](https://github.com/datajoint/element-session)).
82+
+ SkullReference table: A reference table for InsertionLocation, specifying the skull reference (see [example pipeline](https://github.com/datajoint/workflow-array-ephys/blob/main/workflow_array_ephys/pipeline.py)).
7083
71-
3. Utility functions
72-
+ get_ephys_root_data_dir()
73-
+ get_session_directory()
84+
3. Utility functions. See [example definitions here](https://github.com/datajoint/workflow-array-ephys/blob/main/workflow_array_ephys/paths.py)
85+
+ get_ephys_root_data_dir(): Returns your root data directory.
86+
+ get_session_directory(): Returns the path of the session data relative to the root.
7487
7588
For more detail, check the docstring of the `element-array-ephys`:
7689

element_array_ephys/__init__.py

Lines changed: 0 additions & 69 deletions
Original file line numberDiff line numberDiff line change
@@ -1,69 +0,0 @@
1-
import datajoint as dj
2-
import pathlib
3-
import uuid
4-
import hashlib
5-
6-
7-
dj.config['enable_python_native_blobs'] = True
8-
9-
10-
def find_full_path(root_directories, relative_path):
11-
"""
12-
Given a relative path, search and return the full-path
13-
from provided potential root directories (in the given order)
14-
:param root_directories: potential root directories
15-
:param relative_path: the relative path to find the valid root directory
16-
:return: root_directory (pathlib.Path object)
17-
"""
18-
relative_path = pathlib.Path(relative_path)
19-
20-
if relative_path.exists():
21-
return relative_path
22-
23-
# turn to list if only a single root directory is provided
24-
if isinstance(root_directories, (str, pathlib.Path)):
25-
root_directories = [root_directories]
26-
27-
for root_dir in root_directories:
28-
if (pathlib.Path(root_dir) / relative_path).exists():
29-
return pathlib.Path(root_dir) / relative_path
30-
31-
raise FileNotFoundError('No valid full-path found (from {})'
32-
' for {}'.format(root_directories, relative_path))
33-
34-
35-
def find_root_directory(root_directories, full_path):
36-
"""
37-
Given multiple potential root directories and a full-path,
38-
search and return one directory that is the parent of the given path
39-
:param root_directories: potential root directories
40-
:param full_path: the relative path to search the root directory
41-
:return: full-path (pathlib.Path object)
42-
"""
43-
full_path = pathlib.Path(full_path)
44-
45-
if not full_path.exists():
46-
raise FileNotFoundError(f'{full_path} does not exist!')
47-
48-
# turn to list if only a single root directory is provided
49-
if isinstance(root_directories, (str, pathlib.Path)):
50-
root_directories = [root_directories]
51-
52-
try:
53-
return next(pathlib.Path(root_dir) for root_dir in root_directories
54-
if pathlib.Path(root_dir) in set(full_path.parents))
55-
56-
except StopIteration:
57-
raise FileNotFoundError('No valid root directory found (from {})'
58-
' for {}'.format(root_directories, full_path))
59-
60-
61-
def dict_to_uuid(key):
62-
"""
63-
Given a dictionary `key`, returns a hash string as UUID
64-
"""
65-
hashed = hashlib.md5()
66-
for k, v in sorted(key.items()):
67-
hashed.update(str(k).encode())
68-
hashed.update(str(v).encode())
69-
return uuid.UUID(hex=hashed.hexdigest())

element_array_ephys/ephys.py

Lines changed: 39 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -4,9 +4,10 @@
44
import numpy as np
55
import inspect
66
import importlib
7+
from element_interface.utils import find_root_directory, find_full_path, dict_to_uuid
78

89
from .readers import spikeglx, kilosort, openephys
9-
from . import probe, find_full_path, find_root_directory, dict_to_uuid
10+
from . import probe
1011

1112
schema = dj.schema()
1213

@@ -46,7 +47,6 @@ def activate(ephys_schema_name, probe_schema_name=None, *, create_schema=True,
4647
global _linking_module
4748
_linking_module = linking_module
4849

49-
# activate
5050
probe.activate(probe_schema_name, create_schema=create_schema,
5151
create_tables=create_tables)
5252
schema.activate(ephys_schema_name, create_schema=create_schema,
@@ -57,9 +57,10 @@ def activate(ephys_schema_name, probe_schema_name=None, *, create_schema=True,
5757

5858
def get_ephys_root_data_dir() -> list:
5959
"""
60-
All data paths, directories in DataJoint Elements are recommended to be stored as
61-
relative paths, with respect to some user-configured "root" directory,
62-
which varies from machine to machine (e.g. different mounted drive locations)
60+
All data paths, directories in DataJoint Elements are recommended to be
61+
stored as relative paths, with respect to some user-configured "root"
62+
directory, which varies from machine to machine (e.g. different mounted
63+
drive locations)
6364
6465
get_ephys_root_data_dir() -> list
6566
This user-provided function retrieves the possible root data directories
@@ -78,7 +79,7 @@ def get_session_directory(session_key: dict) -> str:
7879
Retrieve the session directory containing the
7980
recorded Neuropixels data for a given Session
8081
:param session_key: a dictionary of one Session `key`
81-
:return: a string for full path to the session directory
82+
:return: a string for relative or full path to the session directory
8283
"""
8384
return _linking_module.get_session_directory(session_key)
8485

@@ -140,21 +141,24 @@ class EphysFile(dj.Part):
140141
"""
141142

142143
def make(self, key):
143-
sess_dir = pathlib.Path(get_session_directory(key))
144+
145+
session_dir = find_full_path(get_ephys_root_data_dir(),
146+
get_session_directory(key))
144147

145148
inserted_probe_serial_number = (ProbeInsertion * probe.Probe & key).fetch1('probe')
146149

147150
# search session dir and determine acquisition software
148151
for ephys_pattern, ephys_acq_type in zip(['*.ap.meta', '*.oebin'],
149152
['SpikeGLX', 'Open Ephys']):
150-
ephys_meta_filepaths = [fp for fp in sess_dir.rglob(ephys_pattern)]
153+
ephys_meta_filepaths = [fp for fp in session_dir.rglob(ephys_pattern)]
151154
if ephys_meta_filepaths:
152155
acq_software = ephys_acq_type
153156
break
154157
else:
155158
raise FileNotFoundError(
156159
f'Ephys recording data not found!'
157-
f' Neither SpikeGLX nor Open Ephys recording files found')
160+
f' Neither SpikeGLX nor Open Ephys recording files found'
161+
f' in {session_dir}')
158162

159163
if acq_software == 'SpikeGLX':
160164
for meta_filepath in ephys_meta_filepaths:
@@ -187,12 +191,13 @@ def make(self, key):
187191
'acq_software': acq_software,
188192
'sampling_rate': spikeglx_meta.meta['imSampRate']})
189193

190-
root_dir = find_root_directory(get_ephys_root_data_dir(), meta_filepath)
194+
root_dir = find_root_directory(get_ephys_root_data_dir(),
195+
meta_filepath)
191196
self.EphysFile.insert1({
192197
**key,
193198
'file_path': meta_filepath.relative_to(root_dir).as_posix()})
194199
elif acq_software == 'Open Ephys':
195-
dataset = openephys.OpenEphys(sess_dir)
200+
dataset = openephys.OpenEphys(session_dir)
196201
for serial_number, probe_data in dataset.probes.items():
197202
if str(serial_number) == inserted_probe_serial_number:
198203
break
@@ -220,8 +225,7 @@ def make(self, key):
220225
'acq_software': acq_software,
221226
'sampling_rate': probe_data.ap_meta['sample_rate']})
222227

223-
root_dir = find_root_directory(
224-
get_ephys_root_data_dir(),
228+
root_dir = find_root_directory(get_ephys_root_data_dir(),
225229
probe_data.recording_info['recording_files'][0])
226230
self.EphysFile.insert([{**key,
227231
'file_path': fp.relative_to(root_dir).as_posix()}
@@ -290,8 +294,11 @@ def make(self, key):
290294
shank, shank_col, shank_row, _ = spikeglx_recording.apmeta.shankmap['data'][recorded_site]
291295
electrode_keys.append(probe_electrodes[(shank, shank_col, shank_row)])
292296
elif acq_software == 'Open Ephys':
293-
sess_dir = pathlib.Path(get_session_directory(key))
294-
loaded_oe = openephys.OpenEphys(sess_dir)
297+
298+
session_dir = find_full_path(get_ephys_root_data_dir(),
299+
get_session_directory(key))
300+
301+
loaded_oe = openephys.OpenEphys(session_dir)
295302
oe_probe = loaded_oe.probes[probe_sn]
296303

297304
lfp_channel_ind = np.arange(
@@ -442,16 +449,16 @@ class Curation(dj.Manual):
442449
curation_id: int
443450
---
444451
curation_time: datetime # time of generation of this set of curated clustering results
445-
curation_output_dir: varchar(255) # output directory of the curated results, relative to clustering root data directory
452+
curation_output_dir: varchar(255) # output directory of the curated results, relative to root data directory
446453
quality_control: bool # has this clustering result undergone quality control?
447454
manual_curation: bool # has manual curation been performed on this clustering result?
448455
curation_note='': varchar(2000)
449456
"""
450457

451458
def create1_from_clustering_task(self, key, curation_note=''):
452459
"""
453-
A convenient function to create a new corresponding "Curation"
454-
for a particular "ClusteringTask"
460+
A function to create a new corresponding "Curation" for a particular
461+
"ClusteringTask"
455462
"""
456463
if key not in Clustering():
457464
raise ValueError(f'No corresponding entry in Clustering available'
@@ -465,8 +472,10 @@ def create1_from_clustering_task(self, key, curation_note=''):
465472
# Synthesize curation_id
466473
curation_id = dj.U().aggr(self & key, n='ifnull(max(curation_id)+1,1)').fetch1('n')
467474
self.insert1({**key, 'curation_id': curation_id,
468-
'curation_time': creation_time, 'curation_output_dir': output_dir,
469-
'quality_control': is_qc, 'manual_curation': is_curated,
475+
'curation_time': creation_time,
476+
'curation_output_dir': output_dir,
477+
'quality_control': is_qc,
478+
'manual_curation': is_curated,
470479
'curation_note': curation_note})
471480

472481

@@ -613,8 +622,9 @@ def yield_unit_waveforms():
613622
spikeglx_meta_filepath = get_spikeglx_meta_filepath(key)
614623
neuropixels_recording = spikeglx.SpikeGLX(spikeglx_meta_filepath.parent)
615624
elif acq_software == 'Open Ephys':
616-
sess_dir = pathlib.Path(get_session_directory(key))
617-
openephys_dataset = openephys.OpenEphys(sess_dir)
625+
session_dir = find_full_path(get_ephys_root_data_dir(),
626+
get_session_directory(key))
627+
openephys_dataset = openephys.OpenEphys(session_dir)
618628
neuropixels_recording = openephys_dataset.probes[probe_serial_number]
619629

620630
def yield_unit_waveforms():
@@ -659,11 +669,13 @@ def get_spikeglx_meta_filepath(ephys_recording_key):
659669
except FileNotFoundError:
660670
# if not found, search in session_dir again
661671
if not spikeglx_meta_filepath.exists():
662-
sess_dir = pathlib.Path(get_session_directory(ephys_recording_key))
672+
session_dir = find_full_path(get_ephys_root_data_dir(),
673+
get_session_directory(
674+
ephys_recording_key))
663675
inserted_probe_serial_number = (ProbeInsertion * probe.Probe
664676
& ephys_recording_key).fetch1('probe')
665677

666-
spikeglx_meta_filepaths = [fp for fp in sess_dir.rglob('*.ap.meta')]
678+
spikeglx_meta_filepaths = [fp for fp in session_dir.rglob('*.ap.meta')]
667679
for meta_filepath in spikeglx_meta_filepaths:
668680
spikeglx_meta = spikeglx.SpikeGLXMeta(meta_filepath)
669681
if str(spikeglx_meta.probe_SN) == inserted_probe_serial_number:
@@ -696,8 +708,9 @@ def get_neuropixels_channel2electrode_map(ephys_recording_key, acq_software):
696708
for recorded_site, (shank, shank_col, shank_row, _) in enumerate(
697709
spikeglx_meta.shankmap['data'])}
698710
elif acq_software == 'Open Ephys':
699-
sess_dir = pathlib.Path(get_session_directory(ephys_recording_key))
700-
openephys_dataset = openephys.OpenEphys(sess_dir)
711+
session_dir = find_full_path(get_ephys_root_data_dir(),
712+
get_session_directory(ephys_recording_key))
713+
openephys_dataset = openephys.OpenEphys(session_dir)
701714
probe_serial_number = (ProbeInsertion & ephys_recording_key).fetch1('probe')
702715
probe_dataset = openephys_dataset.probes[probe_serial_number]
703716

0 commit comments

Comments
 (0)