-
Notifications
You must be signed in to change notification settings - Fork 29
Read Zarr Connectome Files #8717
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🔭 Outside diff range comments (1)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/storage/RemoteSourceDescriptorService.scala (1)
80-86
: Path-traversal risk after.normalize
– ensure path stays within dataset dir
normalize
resolves..
segments, so a user-supplied relative path like../../etc/passwd
will now escapelocalDatasetDir
, yet still pass theexists
check and be returned.val pathRelativeToDataset = localDatasetDir.resolve(localPath).normalize ... if (pathRelativeToDataset.toFile.exists) { pathRelativeToDataset.toUri }Validate that the normalised path is still a descendant of the intended base (same for
pathRelativeToLayer
) before accepting it.- val pathRelativeToDataset = localDatasetDir.resolve(localPath).normalize + val pathRelativeToDataset = localDatasetDir.resolve(localPath).normalize + if (!pathRelativeToDataset.startsWith(localDatasetDir)) + throw new Exception(s"Relative path $localPath resolves outside dataset directory") - val pathRelativeToLayer = localDatasetDir.resolve(layerName).resolve(localPath).normalize + val pathRelativeToLayer = localDatasetDir + .resolve(layerName) + .resolve(localPath) + .normalize + if (!pathRelativeToLayer.startsWith(localDatasetDir.resolve(layerName))) + throw new Exception(s"Relative path $localPath resolves outside layer directory")Without this guard an attacker (or mis-configured attachment) could read arbitrary files.
🧹 Nitpick comments (11)
unreleased_changes/8717.md (1)
1-3
: Changelog entry is too terse; missing migration & config notesConsider adding:
- Impact on deployments (datastore update required, caches, etc.)
- Migration or backward-compatibility caveats for clients that still write HDF5-only connectomes.
- Reference to the two fixed issues (
#8618
,#8567
) for traceability.A fuller entry makes release notes self-contained and searchable.
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/connectome/SynapticPartnerDirection.scala (1)
1-9
: Consider adding a short Scaladoc for public enumA one-liner explaining “
src
= presynaptic,dst
= postsynaptic” aids IDE tooltips and cross-module usage.-object SynapticPartnerDirection extends ExtendedEnumeration { +/** Direction of a synaptic partner: `src` – presynaptic, `dst` – postsynaptic. */ +object SynapticPartnerDirection extends ExtendedEnumeration {util/src/main/scala/com/scalableminds/util/collections/SequenceUtils.scala (1)
57-58
: Fix the comment to match the actual return typeThe comment mentions "returns Box of index" but the method actually returns
Int
.- // Search in a sorted array, returns Box of index where element is found or, if missing, where element would be inserted + // Search in a sorted array, returns index where element is found or, if missing, where element would be insertedwebknossos-datastore/app/com/scalableminds/webknossos/datastore/services/connectome/Hdf5ConnectomeFileService.scala (1)
116-117
: Consider extracting hardcoded synapse type namesAs noted in the PR description, string constants should be extracted. While this is documented as intentional for legacy files, consider moving these to a constants object for better maintainability.
Would you like me to help extract these constants to a dedicated object?
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/connectome/ZarrConnectomeFileService.scala (2)
93-93
: Address TODO: Verify offset and shape order for multi-dimensional arraysThe TODO comment questions whether offset and shape should be transposed. This should be verified against the Zarr specification for multi-dimensional array access.
Would you like me to help verify the correct order by checking the Zarr3 specification or testing with sample data?
193-200
: Fix variable naming inconsistencyThe variables are named
meshFileKey
but should beconnectomeFileKey
to match the actual type and improve code clarity.- attributesCache.clear { meshFileKey => - meshFileKey.dataSourceId == dataSourceId && layerNameOpt.forall(meshFileKey.layerName == _) + attributesCache.clear { connectomeFileKey => + connectomeFileKey.dataSourceId == dataSourceId && layerNameOpt.forall(connectomeFileKey.layerName == _) } openArraysCache.clear { - case (meshFileKey, _) => - meshFileKey.dataSourceId == dataSourceId && layerNameOpt.forall(meshFileKey.layerName == _) + case (connectomeFileKey, _) => + connectomeFileKey.dataSourceId == dataSourceId && layerNameOpt.forall(connectomeFileKey.layerName == _) }webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/connectome/ConnectomeFileService.scala (5)
92-95
: Extract string constants to a shared location.As mentioned in the PR objectives, these string constants should be extracted similarly to other attachments for better maintainability and consistency.
Consider creating a constants object:
object ConnectomeConstants { val LocalConnectomesDir = "connectomes" val Hdf5ConnectomeFileExtension = "hdf5" }
114-118
: Consider more specific error handling instead of generictryo
.The
tryo
wrapper could hide specific exceptions that might be useful for debugging. Consider handling specific exceptions thaturiFromPathLiteral
might throw.Replace the generic exception handling:
- registeredAttachmentNormalized <- tryo(registeredAttachment.map { attachment => + registeredAttachmentNormalized <- Box(registeredAttachment.map { attachment => attachment.copy( path = remoteSourceDescriptorService.uriFromPathLiteral(attachment.path.toString, localDatasetDir, dataLayer.name)) - }) + }) ?~! "Failed to normalize attachment path"
158-172
: Consider logging failures for better observability.While the comment explains that failures are intentionally ignored to prevent breaking the list request, it would be helpful to log these failures for debugging purposes.
Add logging before flattening:
} + .map { results => + results.foreach { + case Full(_) => // success + case failure => logger.warn(s"Failed to process connectome file: $failure") + } + results.flatten + } - // Only return successes, we don't want a malformed file breaking the list request. - .map(_.flatten))
187-187
: Simplify agglomerate ID extraction.Since you've already verified that
agglomerateIds.length == 1
, usingheadOption.toFox
is redundant.- agglomerateId <- agglomerateIds.headOption.toFox ?~> "Failed to extract the single agglomerate ID from request" + agglomerateId = agglomerateIds.head
201-202
: Consider memory usage for large agglomerate sets.The
directedPairs
method creates n*(n-1) pairs, which could consume significant memory for large inputs. For example, 1000 agglomerates would create 999,000 pairs.Consider implementing pagination or streaming for large agglomerate sets, or document the expected maximum size in the API documentation.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (12)
unreleased_changes/8717.md
(1 hunks)util/src/main/scala/com/scalableminds/util/collections/SequenceUtils.scala
(2 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/DataStoreModule.scala
(2 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DataSourceController.scala
(6 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/ConnectomeFileService.scala
(0 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/connectome/ConnectomeFileService.scala
(1 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/connectome/ConnectomeFileUtils.scala
(1 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/connectome/Hdf5ConnectomeFileService.scala
(1 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/connectome/SynapticPartnerDirection.scala
(1 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/connectome/ZarrConnectomeFileService.scala
(1 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/mesh/MeshFileService.scala
(1 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/storage/RemoteSourceDescriptorService.scala
(1 hunks)
💤 Files with no reviewable changes (1)
- webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/ConnectomeFileService.scala
🧰 Additional context used
🧠 Learnings (4)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/connectome/ConnectomeFileUtils.scala (1)
Learnt from: frcroth
PR: scalableminds/webknossos#8598
File: webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DatasetLayerAttachments.scala:89-95
Timestamp: 2025-06-02T09:49:51.047Z
Learning: In WebKnossos dataset layer attachments, multiple file types can safely use the same directory name (like "agglomerates") because the scanning logic filters by file extension. For example, AgglomerateFileInfo scans for .hdf5 files while CumsumFileInfo scans for .json files in the same "agglomerates" directory without interference.
webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DataSourceController.scala (1)
Learnt from: MichaelBuessemeyer
PR: scalableminds/webknossos#8352
File: app/models/organization/CreditTransactionService.scala:0-0
Timestamp: 2025-01-27T12:06:42.865Z
Learning: In Scala's for-comprehension with Fox (Future-like type), the `<-` operator ensures sequential execution. If any step fails, the entire chain short-circuits and returns early, preventing subsequent operations from executing. This makes it safe to perform validation checks before database operations.
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/connectome/Hdf5ConnectomeFileService.scala (1)
Learnt from: frcroth
PR: scalableminds/webknossos#8598
File: webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DatasetLayerAttachments.scala:89-95
Timestamp: 2025-06-02T09:49:51.047Z
Learning: In WebKnossos dataset layer attachments, multiple file types can safely use the same directory name (like "agglomerates") because the scanning logic filters by file extension. For example, AgglomerateFileInfo scans for .hdf5 files while CumsumFileInfo scans for .json files in the same "agglomerates" directory without interference.
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/connectome/ConnectomeFileService.scala (1)
Learnt from: frcroth
PR: scalableminds/webknossos#8598
File: webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DatasetLayerAttachments.scala:89-95
Timestamp: 2025-06-02T09:49:51.047Z
Learning: In WebKnossos dataset layer attachments, multiple file types can safely use the same directory name (like "agglomerates") because the scanning logic filters by file extension. For example, AgglomerateFileInfo scans for .hdf5 files while CumsumFileInfo scans for .json files in the same "agglomerates" directory without interference.
🧬 Code Graph Analysis (2)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/connectome/ConnectomeFileUtils.scala (1)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/connectome/SynapticPartnerDirection.scala (1)
SynapticPartnerDirection
(5-9)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/connectome/Hdf5ConnectomeFileService.scala (6)
util/src/main/scala/com/scalableminds/util/collections/SequenceUtils.scala (2)
SequenceUtils
(5-63)searchSorted
(58-63)webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/connectome/SynapticPartnerDirection.scala (1)
SynapticPartnerDirection
(5-9)webknossos-datastore/app/com/scalableminds/webknossos/datastore/storage/Hdf5FileCache.scala (2)
Hdf5FileCache
(60-111)fromPath
(54-57)webknossos-datastore/app/com/scalableminds/webknossos/datastore/DataStoreConfig.scala (1)
DataStoreConfig
(10-72)webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/connectome/ConnectomeFileService.scala (9)
mappingNameForConnectomeFile
(174-182)ingoingSynapsesForAgglomerate
(220-231)empty
(62-62)outgoingSynapsesForAgglomerate
(231-242)synapticPartnerForSynapses
(254-266)positionsForSynapses
(266-277)typesForSynapses
(277-286)synapseIdsForDirectedPair
(242-254)clearCache
(286-297)webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/connectome/ConnectomeFileUtils.scala (1)
synapticPartnerKey
(20-25)
⏰ Context from checks skipped due to timeout of 90000ms (2)
- GitHub Check: build-smoketest-push
- GitHub Check: backend-tests
🔇 Additional comments (6)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/DataStoreModule.scala (1)
7-11
: Connectome services correctly boundThe additional Guice bindings and import block look good and follow the established pattern for eager singletons.
Also applies to: 60-63
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/mesh/MeshFileService.scala (1)
82-89
: No stray references to removedlookUpMeshFileImpl
found
A recursive search for the legacy method name returned zero matches outside MeshFileService.scala, so the rename is safe.webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/connectome/ConnectomeFileUtils.scala (1)
1-25
: LGTM!The trait provides a clean abstraction for connectome file constants and utilities. Good use of pattern matching for the synaptic partner key selection.
webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DataSourceController.scala (2)
513-526
: Well-structured refactoring to use the new connectome file serviceThe consistent pattern of retrieving dataSource/dataLayer and using connectome file keys improves abstraction and maintainability.
554-556
: Good validation of the direction parameterProperly validates the string parameter into the
SynapticPartnerDirection
enum with appropriate error message.webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/connectome/Hdf5ConnectomeFileService.scala (1)
155-160
: Good error handling pattern with finishAccessOnFailureThe helper method ensures proper cleanup of file handles even when operations fail, preventing resource leaks.
...p/com/scalableminds/webknossos/datastore/services/connectome/Hdf5ConnectomeFileService.scala
Outdated
Show resolved
Hide resolved
...p/com/scalableminds/webknossos/datastore/services/connectome/ZarrConnectomeFileService.scala
Show resolved
Hide resolved
...e/app/com/scalableminds/webknossos/datastore/services/connectome/ConnectomeFileService.scala
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/connectome/Hdf5ConnectomeFileService.scala
(1 hunks)
🧰 Additional context used
🧠 Learnings (1)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/connectome/Hdf5ConnectomeFileService.scala (1)
Learnt from: frcroth
PR: scalableminds/webknossos#8598
File: webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DatasetLayerAttachments.scala:89-95
Timestamp: 2025-06-02T09:49:51.047Z
Learning: In WebKnossos dataset layer attachments, multiple file types can safely use the same directory name (like "agglomerates") because the scanning logic filters by file extension. For example, AgglomerateFileInfo scans for .hdf5 files while CumsumFileInfo scans for .json files in the same "agglomerates" directory without interference.
⏰ Context from checks skipped due to timeout of 90000ms (3)
- GitHub Check: backend-tests
- GitHub Check: frontend-tests
- GitHub Check: build-smoketest-push
🔇 Additional comments (9)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/connectome/Hdf5ConnectomeFileService.scala (9)
1-19
: LGTM: Well-structured service class with proper dependency injectionThe class structure, imports, and initialization look good. The use of dependency injection, Fox monads, and caching is appropriate for this service.
21-30
: LGTM: Proper resource management in mappingNameForConnectomeFileThe method correctly uses the cached file handle, implements error handling with Fox, and ensures proper resource cleanup.
32-62
: LGTM: Correct CSC structure usage for incoming synapsesThe method correctly uses
keyCscIndptr
for incoming synapses (CSC - Compressed Sparse Column structure) and properly handles the multi-step process of reading agglomerate pairs and then synapse ranges.
83-95
: LGTM: Clean implementation of synaptic partner retrievalThe method correctly retrieves synaptic partners for given synapses and direction, with proper error handling and resource management.
97-108
: LGTM: Proper matrix data reading for synapse positionsThe method correctly reads 3D position data using
readMatrixBlockWithOffset
and handles the conversion to the expected format.
110-123
: LGTM: Appropriate handling of legacy synapse typesThe hard-coded type names list is reasonable for legacy HDF5 files as mentioned in the PR objectives. The implementation correctly reads synapse types and combines them with the legend.
125-153
: LGTM: Sophisticated directed pair synapse lookupThe method implements a complex but correct algorithm for finding synapses between specific agglomerate pairs using CSR structure traversal and binary search. The logic handles edge cases appropriately.
155-160
: LGTM: Proper resource cleanup utilityThe
finishAccessOnFailure
helper method correctly ensures file handle cleanup even when operations fail, using thetryo
utility for exception handling.
162-166
: LGTM: Efficient cache clearing implementationThe cache clearing method correctly constructs the relevant path and clears cache entries based on the provided criteria.
...p/com/scalableminds/webknossos/datastore/services/connectome/Hdf5ConnectomeFileService.scala
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/connectome/Hdf5ConnectomeFileService.scala (1)
64-82
: Critical: Incomplete CSR structure handling in outgoingSynapsesForAgglomerateThe method reads CSR indptr but skips reading the intermediate CSR indices array, unlike the correct CSC handling in
ingoingSynapsesForAgglomerate
. This creates an inconsistency in the data structure access pattern.The CSR format requires:
- Read
keyCsrIndptr
to get pointer range- Read
keyCsrAgglomeratePair
(CSR indices) using the pointer range- Use the agglomerate pair IDs to read from
keyAgglomeratePairOffsets
🧹 Nitpick comments (2)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/connectome/Hdf5ConnectomeFileService.scala (2)
116-117
: Document the hardcoded synapse type namesThe hardcoded type names are mentioned as expected for legacy files, but consider adding a comment referencing the related issues (#8618, #8567) or internal discussions mentioned in the PR description for better maintainability.
- // Hard coded type name list, as all legacy files have this value. + // Hard coded type name list for legacy HDF5 files (see #8618, #8567) + // All legacy files use these three synapse types consistently typeNames = List("dendritic-shaft-synapse", "spine-head-synapse", "soma-synapse")
90-94
: Consider batch reading for better performanceThe current implementation reads synaptic partners one by one using
serialCombined
. For large synapse ID lists, this could be inefficient. Consider implementing batch reading if the HDF5 library supports it.This is a performance optimization that could be addressed in a future iteration if needed based on actual usage patterns.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/connectome/Hdf5ConnectomeFileService.scala
(1 hunks)
🧰 Additional context used
🧠 Learnings (1)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/connectome/Hdf5ConnectomeFileService.scala (1)
Learnt from: frcroth
PR: scalableminds/webknossos#8598
File: webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DatasetLayerAttachments.scala:89-95
Timestamp: 2025-06-02T09:49:51.047Z
Learning: In WebKnossos dataset layer attachments, multiple file types can safely use the same directory name (like "agglomerates") because the scanning logic filters by file extension. For example, AgglomerateFileInfo scans for .hdf5 files while CumsumFileInfo scans for .json files in the same "agglomerates" directory without interference.
⏰ Context from checks skipped due to timeout of 90000ms (3)
- GitHub Check: build-smoketest-push
- GitHub Check: backend-tests
- GitHub Check: frontend-tests
🔇 Additional comments (3)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/connectome/Hdf5ConnectomeFileService.scala (3)
125-153
: LGTM: Proper CSR handling in synapseIdsForDirectedPairThis method correctly implements the CSR data structure access pattern by reading the CSR indices array (line 139) before using the values to locate the target agglomerate pair. The binary search optimization using
SequenceUtils.searchSorted
is also well-implemented.
155-160
: LGTM: Proper resource cleanup patternThe
finishAccessOnFailure
helper method correctly ensures that file handles are properly closed even when exceptions occur, preventing resource leaks in the file cache.
162-166
: LGTM: Efficient cache clearing with path-based filteringThe cache clearing implementation correctly filters cache entries by path prefix, allowing for both dataset-wide and layer-specific cache invalidation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is my mini code feedback :D
directionValidated <- SynapticPartnerDirection | ||
.fromString(direction) | ||
.toFox ?~> "could not parse synaptic partner direction" | ||
(dataSource, dataLayer) <- dataSourceRepository.getDataSourceAndDataLayer(organizationId, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A high level thing I just noticed: Why do you always "load" the dataLayer
and pass it to the lookUp....FileKey
functions? At least for lookUpConnectomeFileKey
only the dataLayer.name
prop is being used. But isn't dataLayer.name
equal to dataLayerName
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True, that’s the same value. However, the lookUp* functions also use dataLayer.attachments, which needs the full layer. Also, getDataSourceAndDataLayer
is cheap as it only uses the in-memory dataSourceRepository. So we can use it to also validate that the dataSource really exists, even if we then only use dataSource.id
...p/com/scalableminds/webknossos/datastore/services/connectome/Hdf5ConnectomeFileService.scala
Outdated
Show resolved
Hide resolved
...p/com/scalableminds/webknossos/datastore/services/connectome/ZarrConnectomeFileService.scala
Outdated
Show resolved
Hide resolved
...p/com/scalableminds/webknossos/datastore/services/connectome/ZarrConnectomeFileService.scala
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok here are my testing results:
-
When I tried to load segment id 4 in the connectome tab with using the zarr connectome I get
IMO an error message explaining that the agglomerate does not exist in the connectome or so would be more preferable imo.- ok after I tested some other values somehow only segment 4 triggers the bug.
-
I can also produce a similar bug in the hdf5 variant
Maybe the test connnectome data is wrong for segment 4 🤔?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
♻️ Duplicate comments (2)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/connectome/Hdf5ConnectomeFileService.scala (2)
21-24
: Consider extracting hardcoded synapse type names to a more appropriate locationThe hardcoded
legacySynapseTypeNames
list creates a "magic" feeling in the service class. Consider moving this to a companion object, configuration file, or dedicated constants class to improve maintainability and discoverability.
69-86
: Critical: Fix inconsistent CSR structure handling in outgoingSynapsesForAgglomerateThe method incorrectly skips reading the CSR indices array (
keyCsrAgglomeratePair
) before accessingkeyAgglomeratePairOffsets
. This creates an inconsistency with the CSC handling iningoingSynapsesForAgglomerate
and may lead to incorrect results.The method should follow the same pattern as
ingoingSynapsesForAgglomerate
:
- Read CSR indptr to get
fromPtr
/toPtr
(already correct)- Read the agglomerate pairs from the CSR indices array (
keyCsrAgglomeratePair
)- For each agglomerate pair, read synapse ranges from
keyAgglomeratePairOffsets
fromPtr <- fromAndToPtr.lift(0).toFox ?~> "Could not read start offset from connectome file" toPtr <- fromAndToPtr.lift(1).toFox ?~> "Could not read end offset from connectome file" - from <- finishAccessOnFailure(cachedConnectomeFile) { - cachedConnectomeFile.uint64Reader.readArrayBlockWithOffset(keyAgglomeratePairOffsets, 1, fromPtr) - }.flatMap(_.headOption.toFox) ?~> "Could not read synapses from connectome file" - to <- finishAccessOnFailure(cachedConnectomeFile) { - cachedConnectomeFile.uint64Reader.readArrayBlockWithOffset(keyAgglomeratePairOffsets, 1, toPtr) - }.flatMap(_.headOption.toFox) ?~> "Could not read synapses from connectome file" + agglomeratePairs: Array[Long] <- if (toPtr - fromPtr == 0L) Fox.successful(Array.empty[Long]) + else + finishAccessOnFailure(cachedConnectomeFile) { + cachedConnectomeFile.uint64Reader.readArrayBlockWithOffset(keyCsrAgglomeratePair, (toPtr - fromPtr).toInt, fromPtr) + } ?~> "Could not read agglomerate pairs from connectome file" + synapseIdsNested <- Fox.serialCombined(agglomeratePairs.toList) { agglomeratePair: Long => + for { + from <- finishAccessOnFailure(cachedConnectomeFile) { + cachedConnectomeFile.uint64Reader.readArrayBlockWithOffset(keyAgglomeratePairOffsets, 1, agglomeratePair) + }.flatMap(_.headOption.toFox) + to <- finishAccessOnFailure(cachedConnectomeFile) { + cachedConnectomeFile.uint64Reader.readArrayBlockWithOffset(keyAgglomeratePairOffsets, 1, agglomeratePair + 1) + }.flatMap(_.headOption.toFox) + } yield List.range(from, to) + } ?~> "Could not read outgoing synapses from connectome file" + _ = cachedConnectomeFile.finishAccess() -} yield Seq.range(from, to) +} yield synapseIdsNested.flatten
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/connectome/Hdf5ConnectomeFileService.scala
(1 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/connectome/ZarrConnectomeFileService.scala
(1 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/mesh/ZarrMeshFileService.scala
(1 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/segmentindex/ZarrSegmentIndexFileService.scala
(1 hunks)
✅ Files skipped from review due to trivial changes (2)
- webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/mesh/ZarrMeshFileService.scala
- webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/segmentindex/ZarrSegmentIndexFileService.scala
🚧 Files skipped from review as they are similar to previous changes (1)
- webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/connectome/ZarrConnectomeFileService.scala
🧰 Additional context used
🧠 Learnings (1)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/connectome/Hdf5ConnectomeFileService.scala (1)
Learnt from: frcroth
PR: scalableminds/webknossos#8598
File: webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DatasetLayerAttachments.scala:89-95
Timestamp: 2025-06-02T09:49:51.047Z
Learning: In WebKnossos dataset layer attachments, multiple file types can safely use the same directory name (like "agglomerates") because the scanning logic filters by file extension. For example, AgglomerateFileInfo scans for .hdf5 files while CumsumFileInfo scans for .json files in the same "agglomerates" directory without interference.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: build-smoketest-push
- GitHub Check: frontend-tests
- GitHub Check: backend-tests
🔇 Additional comments (2)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/connectome/Hdf5ConnectomeFileService.scala (2)
159-164
: Good implementation of error handling utilityThe
finishAccessOnFailure
helper method provides a clean way to ensure resources are properly cleaned up even when exceptions occur. This is a solid pattern for resource management.
166-170
: Efficient cache clearing implementationThe cache clearing method correctly constructs the relevant path and uses a predicate-based clearing strategy, which is efficient and targeted.
...p/com/scalableminds/webknossos/datastore/services/connectome/Hdf5ConnectomeFileService.scala
Show resolved
Hide resolved
...p/com/scalableminds/webknossos/datastore/services/connectome/Hdf5ConnectomeFileService.scala
Show resolved
Hide resolved
...p/com/scalableminds/webknossos/datastore/services/connectome/Hdf5ConnectomeFileService.scala
Show resolved
Hide resolved
...p/com/scalableminds/webknossos/datastore/services/connectome/Hdf5ConnectomeFileService.scala
Show resolved
Hide resolved
...p/com/scalableminds/webknossos/datastore/services/connectome/Hdf5ConnectomeFileService.scala
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/datareaders/DatasetArray.scala (1)
196-197
: Excellent defensive programming - validates negative shape values.This validation check correctly prevents processing of invalid shape arrays containing negative dimensions. The implementation follows good practices:
- Uses idiomatic Scala with
shape.exists(_ < 0)
for efficient short-circuiting- Provides clear error messaging with the problematic shape values
- Maintains consistency with existing error handling patterns using
Fox.failure
- Positioned appropriately after the zero-check but before main processing logic
Consider adding unit tests to verify this validation handles edge cases like:
- Single negative dimension
- Multiple negative dimensions
- Mixed positive/negative dimensions
- Empty shape arrays
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/datareaders/DatasetArray.scala
(1 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/connectome/Hdf5ConnectomeFileService.scala
(1 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/connectome/ZarrConnectomeFileService.scala
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/connectome/Hdf5ConnectomeFileService.scala
- webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/connectome/ZarrConnectomeFileService.scala
🧰 Additional context used
🧠 Learnings (1)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/datareaders/DatasetArray.scala (3)
Learnt from: dieknolle3333
PR: scalableminds/webknossos#8229
File: frontend/javascripts/oxalis/model/accessors/dataset_accessor.ts:348-354
Timestamp: 2024-11-25T14:38:49.345Z
Learning: For the `getDatasetExtentAsProduct` function in `dataset_accessor.ts`, input validation for negative or zero dimensions is not necessary.
Learnt from: frcroth
PR: scalableminds/webknossos#8609
File: app/models/dataset/Dataset.scala:753-775
Timestamp: 2025-05-12T13:07:29.637Z
Learning: In the `updateMags` method of DatasetMagsDAO (Scala), the code handles different dataset types distinctly:
1. Non-WKW datasets have `magsOpt` populated and use the first branch which includes axisOrder, channelIndex, and credentialId.
2. WKW datasets will have `wkwResolutionsOpt` populated and use the second branch which includes cubeLength.
3. The final branch is a fallback for legacy data.
This ensures appropriate fields are populated for each dataset type.
Learnt from: MichaelBuessemeyer
PR: scalableminds/webknossos#8485
File: frontend/javascripts/oxalis/model/accessors/dataset_layer_transformation_accessor.ts:384-392
Timestamp: 2025-04-01T09:45:17.527Z
Learning: The function `isRotationAndMirrorMaybeOnly` in the dataset_layer_transformation_accessor.ts is intentionally designed to allow mirroring transformations (negative scale values). It uses the length comparison (`scale.length() === NON_SCALED_VECTOR.length()`) rather than component equality to permit mirrored axes while ensuring the overall scale magnitude remains the same.
🧬 Code Graph Analysis (1)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/datareaders/DatasetArray.scala (1)
util/src/main/scala/com/scalableminds/util/tools/Fox.scala (5)
Fox
(30-223)Fox
(225-289)failure
(58-62)s
(229-233)s
(233-243)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: backend-tests
- GitHub Check: build-smoketest-push
- GitHub Check: frontend-tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found a little nitpick in the newest changes.
Moreover during re-testing I noticed that the connectome files are not refreshed when the dataset is reloaded manually via the frontend UI. I had to restart wk to re-register the zarr-connectome file. Steps:
- delete zarr connectome folder
- start wk
- view ds -> no zarr connectome file
- restore zarr connectome folder on hard drive
- go to dashboard and reload the ds
- view the ds and no zarr connectome file
- restart wk
- zarr connectome file is present
Maybe some cache needs to be cleared?
Your fix seems to work. When I now request agglomerate 4 for the zarr connectome data, I get
But IMO the error message would be nicer if the inner error about the agglomerate not being present would be the top level error. It is more direct / easier to understand for a user IMO.
Moreover, I noticed that the hdf5 connectome yields a different error when loading segment id 4.
...p/com/scalableminds/webknossos/datastore/services/connectome/Hdf5ConnectomeFileService.scala
Outdated
Show resolved
Hide resolved
...p/com/scalableminds/webknossos/datastore/services/connectome/ZarrConnectomeFileService.scala
Outdated
Show resolved
Hide resolved
Good catch on the missing cache clear! I added this now to DataSourceController. I also flipped the >= as per your suggestion. I think further optimizing the legibility of the error messages is not really worth the effort at this time. I’d say we can do that if we start seeing these errors in the wild frequently |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok awesome, looks good 👍
thanks for fixing the caching problem
I think further optimizing the legibility of the error messages is not really worth the effort at this time. I’d say we can do that if we start seeing these errors in the wild frequently
Jep, definitely understandable 👍
Steps to test:
TODOs:
Issues:
$PR_NUMBER.md
file inunreleased_changes
or use./tools/create-changelog-entry.py
)