Skip to content

Read Zarr Connectome Files #8717

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 140 commits into
base: master
Choose a base branch
from
Open

Read Zarr Connectome Files #8717

wants to merge 140 commits into from

Conversation

fm3
Copy link
Member

@fm3 fm3 commented Jun 25, 2025

Steps to test:

  • In connectome tab, load synapses both from hdf5 and zarr connectome file, should work as expected.

TODOs:

  • list + look up connectome file keys
  • delegate to correct service
  • re-connnect hdf5 connectome file service
  • How to deal with synapse type names for legacy files? Compare slack thread
  • implement zarr connectome file service
  • enable cache clear
  • test
  • extract string constants like for the other attachments

Issues:


  • Added changelog entry (create a $PR_NUMBER.md file in unreleased_changes or use ./tools/create-changelog-entry.py)
  • Removed dev-only changes like prints and application.conf edits
  • Considered common edge cases
  • Needs datastore update after deployment

fm3 added 30 commits May 19, 2025 10:20
@fm3 fm3 removed the refactoring label Jul 1, 2025
Base automatically changed from zarr-segment-index to master July 3, 2025 09:18
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🔭 Outside diff range comments (1)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/storage/RemoteSourceDescriptorService.scala (1)

80-86: Path-traversal risk after .normalize – ensure path stays within dataset dir

normalize resolves .. segments, so a user-supplied relative path like ../../etc/passwd will now escape localDatasetDir, yet still pass the exists check and be returned.

val pathRelativeToDataset = localDatasetDir.resolve(localPath).normalize
...
if (pathRelativeToDataset.toFile.exists) {
  pathRelativeToDataset.toUri
}

Validate that the normalised path is still a descendant of the intended base (same for pathRelativeToLayer) before accepting it.

-        val pathRelativeToDataset = localDatasetDir.resolve(localPath).normalize
+        val pathRelativeToDataset = localDatasetDir.resolve(localPath).normalize
+        if (!pathRelativeToDataset.startsWith(localDatasetDir))
+          throw new Exception(s"Relative path $localPath resolves outside dataset directory")

-        val pathRelativeToLayer = localDatasetDir.resolve(layerName).resolve(localPath).normalize
+        val pathRelativeToLayer = localDatasetDir
+          .resolve(layerName)
+          .resolve(localPath)
+          .normalize
+        if (!pathRelativeToLayer.startsWith(localDatasetDir.resolve(layerName)))
+          throw new Exception(s"Relative path $localPath resolves outside layer directory")

Without this guard an attacker (or mis-configured attachment) could read arbitrary files.

🧹 Nitpick comments (11)
unreleased_changes/8717.md (1)

1-3: Changelog entry is too terse; missing migration & config notes

Consider adding:

  • Impact on deployments (datastore update required, caches, etc.)
  • Migration or backward-compatibility caveats for clients that still write HDF5-only connectomes.
  • Reference to the two fixed issues (#8618, #8567) for traceability.

A fuller entry makes release notes self-contained and searchable.

webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/connectome/SynapticPartnerDirection.scala (1)

1-9: Consider adding a short Scaladoc for public enum

A one-liner explaining “src = presynaptic, dst = postsynaptic” aids IDE tooltips and cross-module usage.

-object SynapticPartnerDirection extends ExtendedEnumeration {
+/** Direction of a synaptic partner: `src` – presynaptic, `dst` – postsynaptic. */
+object SynapticPartnerDirection extends ExtendedEnumeration {
util/src/main/scala/com/scalableminds/util/collections/SequenceUtils.scala (1)

57-58: Fix the comment to match the actual return type

The comment mentions "returns Box of index" but the method actually returns Int.

-  // Search in a sorted array, returns Box of index where element is found or, if missing, where element would be inserted
+  // Search in a sorted array, returns index where element is found or, if missing, where element would be inserted
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/connectome/Hdf5ConnectomeFileService.scala (1)

116-117: Consider extracting hardcoded synapse type names

As noted in the PR description, string constants should be extracted. While this is documented as intentional for legacy files, consider moving these to a constants object for better maintainability.

Would you like me to help extract these constants to a dedicated object?

webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/connectome/ZarrConnectomeFileService.scala (2)

93-93: Address TODO: Verify offset and shape order for multi-dimensional arrays

The TODO comment questions whether offset and shape should be transposed. This should be verified against the Zarr specification for multi-dimensional array access.

Would you like me to help verify the correct order by checking the Zarr3 specification or testing with sample data?


193-200: Fix variable naming inconsistency

The variables are named meshFileKey but should be connectomeFileKey to match the actual type and improve code clarity.

-    attributesCache.clear { meshFileKey =>
-      meshFileKey.dataSourceId == dataSourceId && layerNameOpt.forall(meshFileKey.layerName == _)
+    attributesCache.clear { connectomeFileKey =>
+      connectomeFileKey.dataSourceId == dataSourceId && layerNameOpt.forall(connectomeFileKey.layerName == _)
     }

     openArraysCache.clear {
-      case (meshFileKey, _) =>
-        meshFileKey.dataSourceId == dataSourceId && layerNameOpt.forall(meshFileKey.layerName == _)
+      case (connectomeFileKey, _) =>
+        connectomeFileKey.dataSourceId == dataSourceId && layerNameOpt.forall(connectomeFileKey.layerName == _)
     }
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/connectome/ConnectomeFileService.scala (5)

92-95: Extract string constants to a shared location.

As mentioned in the PR objectives, these string constants should be extracted similarly to other attachments for better maintainability and consistency.

Consider creating a constants object:

object ConnectomeConstants {
  val LocalConnectomesDir = "connectomes"
  val Hdf5ConnectomeFileExtension = "hdf5"
}

114-118: Consider more specific error handling instead of generic tryo.

The tryo wrapper could hide specific exceptions that might be useful for debugging. Consider handling specific exceptions that uriFromPathLiteral might throw.

Replace the generic exception handling:

-      registeredAttachmentNormalized <- tryo(registeredAttachment.map { attachment =>
+      registeredAttachmentNormalized <- Box(registeredAttachment.map { attachment =>
         attachment.copy(
           path =
             remoteSourceDescriptorService.uriFromPathLiteral(attachment.path.toString, localDatasetDir, dataLayer.name))
-      })
+      }) ?~! "Failed to normalize attachment path"

158-172: Consider logging failures for better observability.

While the comment explains that failures are intentionally ignored to prevent breaking the list request, it would be helpful to log these failures for debugging purposes.

Add logging before flattening:

         }
+        .map { results =>
+          results.foreach {
+            case Full(_) => // success
+            case failure => logger.warn(s"Failed to process connectome file: $failure")
+          }
+          results.flatten
+        }
-        // Only return successes, we don't want a malformed file breaking the list request.
-        .map(_.flatten))

187-187: Simplify agglomerate ID extraction.

Since you've already verified that agglomerateIds.length == 1, using headOption.toFox is redundant.

-        agglomerateId <- agglomerateIds.headOption.toFox ?~> "Failed to extract the single agglomerate ID from request"
+        agglomerateId = agglomerateIds.head

201-202: Consider memory usage for large agglomerate sets.

The directedPairs method creates n*(n-1) pairs, which could consume significant memory for large inputs. For example, 1000 agglomerates would create 999,000 pairs.

Consider implementing pagination or streaming for large agglomerate sets, or document the expected maximum size in the API documentation.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4c6b2b2 and 1687300.

📒 Files selected for processing (12)
  • unreleased_changes/8717.md (1 hunks)
  • util/src/main/scala/com/scalableminds/util/collections/SequenceUtils.scala (2 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/DataStoreModule.scala (2 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DataSourceController.scala (6 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/ConnectomeFileService.scala (0 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/connectome/ConnectomeFileService.scala (1 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/connectome/ConnectomeFileUtils.scala (1 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/connectome/Hdf5ConnectomeFileService.scala (1 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/connectome/SynapticPartnerDirection.scala (1 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/connectome/ZarrConnectomeFileService.scala (1 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/mesh/MeshFileService.scala (1 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/storage/RemoteSourceDescriptorService.scala (1 hunks)
💤 Files with no reviewable changes (1)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/ConnectomeFileService.scala
🧰 Additional context used
🧠 Learnings (4)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/connectome/ConnectomeFileUtils.scala (1)
Learnt from: frcroth
PR: scalableminds/webknossos#8598
File: webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DatasetLayerAttachments.scala:89-95
Timestamp: 2025-06-02T09:49:51.047Z
Learning: In WebKnossos dataset layer attachments, multiple file types can safely use the same directory name (like "agglomerates") because the scanning logic filters by file extension. For example, AgglomerateFileInfo scans for .hdf5 files while CumsumFileInfo scans for .json files in the same "agglomerates" directory without interference.
webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DataSourceController.scala (1)
Learnt from: MichaelBuessemeyer
PR: scalableminds/webknossos#8352
File: app/models/organization/CreditTransactionService.scala:0-0
Timestamp: 2025-01-27T12:06:42.865Z
Learning: In Scala's for-comprehension with Fox (Future-like type), the `<-` operator ensures sequential execution. If any step fails, the entire chain short-circuits and returns early, preventing subsequent operations from executing. This makes it safe to perform validation checks before database operations.
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/connectome/Hdf5ConnectomeFileService.scala (1)
Learnt from: frcroth
PR: scalableminds/webknossos#8598
File: webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DatasetLayerAttachments.scala:89-95
Timestamp: 2025-06-02T09:49:51.047Z
Learning: In WebKnossos dataset layer attachments, multiple file types can safely use the same directory name (like "agglomerates") because the scanning logic filters by file extension. For example, AgglomerateFileInfo scans for .hdf5 files while CumsumFileInfo scans for .json files in the same "agglomerates" directory without interference.
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/connectome/ConnectomeFileService.scala (1)
Learnt from: frcroth
PR: scalableminds/webknossos#8598
File: webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DatasetLayerAttachments.scala:89-95
Timestamp: 2025-06-02T09:49:51.047Z
Learning: In WebKnossos dataset layer attachments, multiple file types can safely use the same directory name (like "agglomerates") because the scanning logic filters by file extension. For example, AgglomerateFileInfo scans for .hdf5 files while CumsumFileInfo scans for .json files in the same "agglomerates" directory without interference.
🧬 Code Graph Analysis (2)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/connectome/ConnectomeFileUtils.scala (1)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/connectome/SynapticPartnerDirection.scala (1)
  • SynapticPartnerDirection (5-9)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/connectome/Hdf5ConnectomeFileService.scala (6)
util/src/main/scala/com/scalableminds/util/collections/SequenceUtils.scala (2)
  • SequenceUtils (5-63)
  • searchSorted (58-63)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/connectome/SynapticPartnerDirection.scala (1)
  • SynapticPartnerDirection (5-9)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/storage/Hdf5FileCache.scala (2)
  • Hdf5FileCache (60-111)
  • fromPath (54-57)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/DataStoreConfig.scala (1)
  • DataStoreConfig (10-72)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/connectome/ConnectomeFileService.scala (9)
  • mappingNameForConnectomeFile (174-182)
  • ingoingSynapsesForAgglomerate (220-231)
  • empty (62-62)
  • outgoingSynapsesForAgglomerate (231-242)
  • synapticPartnerForSynapses (254-266)
  • positionsForSynapses (266-277)
  • typesForSynapses (277-286)
  • synapseIdsForDirectedPair (242-254)
  • clearCache (286-297)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/connectome/ConnectomeFileUtils.scala (1)
  • synapticPartnerKey (20-25)
⏰ Context from checks skipped due to timeout of 90000ms (2)
  • GitHub Check: build-smoketest-push
  • GitHub Check: backend-tests
🔇 Additional comments (6)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/DataStoreModule.scala (1)

7-11: Connectome services correctly bound

The additional Guice bindings and import block look good and follow the established pattern for eager singletons.

Also applies to: 60-63

webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/mesh/MeshFileService.scala (1)

82-89: No stray references to removed lookUpMeshFileImpl found
A recursive search for the legacy method name returned zero matches outside MeshFileService.scala, so the rename is safe.

webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/connectome/ConnectomeFileUtils.scala (1)

1-25: LGTM!

The trait provides a clean abstraction for connectome file constants and utilities. Good use of pattern matching for the synaptic partner key selection.

webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DataSourceController.scala (2)

513-526: Well-structured refactoring to use the new connectome file service

The consistent pattern of retrieving dataSource/dataLayer and using connectome file keys improves abstraction and maintainability.


554-556: Good validation of the direction parameter

Properly validates the string parameter into the SynapticPartnerDirection enum with appropriate error message.

webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/connectome/Hdf5ConnectomeFileService.scala (1)

155-160: Good error handling pattern with finishAccessOnFailure

The helper method ensures proper cleanup of file handles even when operations fail, preventing resource leaks.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1687300 and b18eb66.

📒 Files selected for processing (1)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/connectome/Hdf5ConnectomeFileService.scala (1 hunks)
🧰 Additional context used
🧠 Learnings (1)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/connectome/Hdf5ConnectomeFileService.scala (1)
Learnt from: frcroth
PR: scalableminds/webknossos#8598
File: webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DatasetLayerAttachments.scala:89-95
Timestamp: 2025-06-02T09:49:51.047Z
Learning: In WebKnossos dataset layer attachments, multiple file types can safely use the same directory name (like "agglomerates") because the scanning logic filters by file extension. For example, AgglomerateFileInfo scans for .hdf5 files while CumsumFileInfo scans for .json files in the same "agglomerates" directory without interference.
⏰ Context from checks skipped due to timeout of 90000ms (3)
  • GitHub Check: backend-tests
  • GitHub Check: frontend-tests
  • GitHub Check: build-smoketest-push
🔇 Additional comments (9)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/connectome/Hdf5ConnectomeFileService.scala (9)

1-19: LGTM: Well-structured service class with proper dependency injection

The class structure, imports, and initialization look good. The use of dependency injection, Fox monads, and caching is appropriate for this service.


21-30: LGTM: Proper resource management in mappingNameForConnectomeFile

The method correctly uses the cached file handle, implements error handling with Fox, and ensures proper resource cleanup.


32-62: LGTM: Correct CSC structure usage for incoming synapses

The method correctly uses keyCscIndptr for incoming synapses (CSC - Compressed Sparse Column structure) and properly handles the multi-step process of reading agglomerate pairs and then synapse ranges.


83-95: LGTM: Clean implementation of synaptic partner retrieval

The method correctly retrieves synaptic partners for given synapses and direction, with proper error handling and resource management.


97-108: LGTM: Proper matrix data reading for synapse positions

The method correctly reads 3D position data using readMatrixBlockWithOffset and handles the conversion to the expected format.


110-123: LGTM: Appropriate handling of legacy synapse types

The hard-coded type names list is reasonable for legacy HDF5 files as mentioned in the PR objectives. The implementation correctly reads synapse types and combines them with the legend.


125-153: LGTM: Sophisticated directed pair synapse lookup

The method implements a complex but correct algorithm for finding synapses between specific agglomerate pairs using CSR structure traversal and binary search. The logic handles edge cases appropriately.


155-160: LGTM: Proper resource cleanup utility

The finishAccessOnFailure helper method correctly ensures file handle cleanup even when operations fail, using the tryo utility for exception handling.


162-166: LGTM: Efficient cache clearing implementation

The cache clearing method correctly constructs the relevant path and clears cache entries based on the provided criteria.

@fm3 fm3 requested a review from MichaelBuessemeyer July 3, 2025 12:22
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/connectome/Hdf5ConnectomeFileService.scala (1)

64-82: Critical: Incomplete CSR structure handling in outgoingSynapsesForAgglomerate

The method reads CSR indptr but skips reading the intermediate CSR indices array, unlike the correct CSC handling in ingoingSynapsesForAgglomerate. This creates an inconsistency in the data structure access pattern.

The CSR format requires:

  1. Read keyCsrIndptr to get pointer range
  2. Read keyCsrAgglomeratePair (CSR indices) using the pointer range
  3. Use the agglomerate pair IDs to read from keyAgglomeratePairOffsets
🧹 Nitpick comments (2)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/connectome/Hdf5ConnectomeFileService.scala (2)

116-117: Document the hardcoded synapse type names

The hardcoded type names are mentioned as expected for legacy files, but consider adding a comment referencing the related issues (#8618, #8567) or internal discussions mentioned in the PR description for better maintainability.

-      // Hard coded type name list, as all legacy files have this value.
+      // Hard coded type name list for legacy HDF5 files (see #8618, #8567)
+      // All legacy files use these three synapse types consistently
       typeNames = List("dendritic-shaft-synapse", "spine-head-synapse", "soma-synapse")

90-94: Consider batch reading for better performance

The current implementation reads synaptic partners one by one using serialCombined. For large synapse ID lists, this could be inefficient. Consider implementing batch reading if the HDF5 library supports it.

This is a performance optimization that could be addressed in a future iteration if needed based on actual usage patterns.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b18eb66 and d065282.

📒 Files selected for processing (1)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/connectome/Hdf5ConnectomeFileService.scala (1 hunks)
🧰 Additional context used
🧠 Learnings (1)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/connectome/Hdf5ConnectomeFileService.scala (1)
Learnt from: frcroth
PR: scalableminds/webknossos#8598
File: webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DatasetLayerAttachments.scala:89-95
Timestamp: 2025-06-02T09:49:51.047Z
Learning: In WebKnossos dataset layer attachments, multiple file types can safely use the same directory name (like "agglomerates") because the scanning logic filters by file extension. For example, AgglomerateFileInfo scans for .hdf5 files while CumsumFileInfo scans for .json files in the same "agglomerates" directory without interference.
⏰ Context from checks skipped due to timeout of 90000ms (3)
  • GitHub Check: build-smoketest-push
  • GitHub Check: backend-tests
  • GitHub Check: frontend-tests
🔇 Additional comments (3)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/connectome/Hdf5ConnectomeFileService.scala (3)

125-153: LGTM: Proper CSR handling in synapseIdsForDirectedPair

This method correctly implements the CSR data structure access pattern by reading the CSR indices array (line 139) before using the values to locate the target agglomerate pair. The binary search optimization using SequenceUtils.searchSorted is also well-implemented.


155-160: LGTM: Proper resource cleanup pattern

The finishAccessOnFailure helper method correctly ensures that file handles are properly closed even when exceptions occur, preventing resource leaks in the file cache.


162-166: LGTM: Efficient cache clearing with path-based filtering

The cache clearing implementation correctly filters cache entries by path prefix, allowing for both dataset-wide and layer-specific cache invalidation.

Copy link
Contributor

@MichaelBuessemeyer MichaelBuessemeyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is my mini code feedback :D

directionValidated <- SynapticPartnerDirection
.fromString(direction)
.toFox ?~> "could not parse synaptic partner direction"
(dataSource, dataLayer) <- dataSourceRepository.getDataSourceAndDataLayer(organizationId,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A high level thing I just noticed: Why do you always "load" the dataLayer and pass it to the lookUp....FileKey functions? At least for lookUpConnectomeFileKey only the dataLayer.name prop is being used. But isn't dataLayer.name equal to dataLayerName?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, that’s the same value. However, the lookUp* functions also use dataLayer.attachments, which needs the full layer. Also, getDataSourceAndDataLayer is cheap as it only uses the in-memory dataSourceRepository. So we can use it to also validate that the dataSource really exists, even if we then only use dataSource.id

Copy link
Contributor

@MichaelBuessemeyer MichaelBuessemeyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok here are my testing results:

  • When I tried to load segment id 4 in the connectome tab with using the zarr connectome I get
    image
    IMO an error message explaining that the agglomerate does not exist in the connectome or so would be more preferable imo.

    • ok after I tested some other values somehow only segment 4 triggers the bug.
  • I can also produce a similar bug in the hdf5 variant
    image
    Maybe the test connnectome data is wrong for segment 4 🤔?

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

♻️ Duplicate comments (2)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/connectome/Hdf5ConnectomeFileService.scala (2)

21-24: Consider extracting hardcoded synapse type names to a more appropriate location

The hardcoded legacySynapseTypeNames list creates a "magic" feeling in the service class. Consider moving this to a companion object, configuration file, or dedicated constants class to improve maintainability and discoverability.


69-86: Critical: Fix inconsistent CSR structure handling in outgoingSynapsesForAgglomerate

The method incorrectly skips reading the CSR indices array (keyCsrAgglomeratePair) before accessing keyAgglomeratePairOffsets. This creates an inconsistency with the CSC handling in ingoingSynapsesForAgglomerate and may lead to incorrect results.

The method should follow the same pattern as ingoingSynapsesForAgglomerate:

  1. Read CSR indptr to get fromPtr/toPtr (already correct)
  2. Read the agglomerate pairs from the CSR indices array (keyCsrAgglomeratePair)
  3. For each agglomerate pair, read synapse ranges from keyAgglomeratePairOffsets
   fromPtr <- fromAndToPtr.lift(0).toFox ?~> "Could not read start offset from connectome file"
   toPtr <- fromAndToPtr.lift(1).toFox ?~> "Could not read end offset from connectome file"
-  from <- finishAccessOnFailure(cachedConnectomeFile) {
-    cachedConnectomeFile.uint64Reader.readArrayBlockWithOffset(keyAgglomeratePairOffsets, 1, fromPtr)
-  }.flatMap(_.headOption.toFox) ?~> "Could not read synapses from connectome file"
-  to <- finishAccessOnFailure(cachedConnectomeFile) {
-    cachedConnectomeFile.uint64Reader.readArrayBlockWithOffset(keyAgglomeratePairOffsets, 1, toPtr)
-  }.flatMap(_.headOption.toFox) ?~> "Could not read synapses from connectome file"
+  agglomeratePairs: Array[Long] <- if (toPtr - fromPtr == 0L) Fox.successful(Array.empty[Long])
+  else
+    finishAccessOnFailure(cachedConnectomeFile) {
+      cachedConnectomeFile.uint64Reader.readArrayBlockWithOffset(keyCsrAgglomeratePair, (toPtr - fromPtr).toInt, fromPtr)
+    } ?~> "Could not read agglomerate pairs from connectome file"
+  synapseIdsNested <- Fox.serialCombined(agglomeratePairs.toList) { agglomeratePair: Long =>
+    for {
+      from <- finishAccessOnFailure(cachedConnectomeFile) {
+        cachedConnectomeFile.uint64Reader.readArrayBlockWithOffset(keyAgglomeratePairOffsets, 1, agglomeratePair)
+      }.flatMap(_.headOption.toFox)
+      to <- finishAccessOnFailure(cachedConnectomeFile) {
+        cachedConnectomeFile.uint64Reader.readArrayBlockWithOffset(keyAgglomeratePairOffsets, 1, agglomeratePair + 1)
+      }.flatMap(_.headOption.toFox)
+    } yield List.range(from, to)
+  } ?~> "Could not read outgoing synapses from connectome file"
+  _ = cachedConnectomeFile.finishAccess()
-} yield Seq.range(from, to)
+} yield synapseIdsNested.flatten
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d065282 and 09ef7a5.

📒 Files selected for processing (4)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/connectome/Hdf5ConnectomeFileService.scala (1 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/connectome/ZarrConnectomeFileService.scala (1 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/mesh/ZarrMeshFileService.scala (1 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/segmentindex/ZarrSegmentIndexFileService.scala (1 hunks)
✅ Files skipped from review due to trivial changes (2)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/mesh/ZarrMeshFileService.scala
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/segmentindex/ZarrSegmentIndexFileService.scala
🚧 Files skipped from review as they are similar to previous changes (1)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/connectome/ZarrConnectomeFileService.scala
🧰 Additional context used
🧠 Learnings (1)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/connectome/Hdf5ConnectomeFileService.scala (1)
Learnt from: frcroth
PR: scalableminds/webknossos#8598
File: webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DatasetLayerAttachments.scala:89-95
Timestamp: 2025-06-02T09:49:51.047Z
Learning: In WebKnossos dataset layer attachments, multiple file types can safely use the same directory name (like "agglomerates") because the scanning logic filters by file extension. For example, AgglomerateFileInfo scans for .hdf5 files while CumsumFileInfo scans for .json files in the same "agglomerates" directory without interference.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: build-smoketest-push
  • GitHub Check: frontend-tests
  • GitHub Check: backend-tests
🔇 Additional comments (2)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/connectome/Hdf5ConnectomeFileService.scala (2)

159-164: Good implementation of error handling utility

The finishAccessOnFailure helper method provides a clean way to ensure resources are properly cleaned up even when exceptions occur. This is a solid pattern for resource management.


166-170: Efficient cache clearing implementation

The cache clearing method correctly constructs the relevant path and uses a predicate-based clearing strategy, which is efficient and targeted.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/datareaders/DatasetArray.scala (1)

196-197: Excellent defensive programming - validates negative shape values.

This validation check correctly prevents processing of invalid shape arrays containing negative dimensions. The implementation follows good practices:

  • Uses idiomatic Scala with shape.exists(_ < 0) for efficient short-circuiting
  • Provides clear error messaging with the problematic shape values
  • Maintains consistency with existing error handling patterns using Fox.failure
  • Positioned appropriately after the zero-check but before main processing logic

Consider adding unit tests to verify this validation handles edge cases like:

  • Single negative dimension
  • Multiple negative dimensions
  • Mixed positive/negative dimensions
  • Empty shape arrays
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 09ef7a5 and 6beaf69.

📒 Files selected for processing (3)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/datareaders/DatasetArray.scala (1 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/connectome/Hdf5ConnectomeFileService.scala (1 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/connectome/ZarrConnectomeFileService.scala (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/connectome/Hdf5ConnectomeFileService.scala
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/connectome/ZarrConnectomeFileService.scala
🧰 Additional context used
🧠 Learnings (1)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/datareaders/DatasetArray.scala (3)
Learnt from: dieknolle3333
PR: scalableminds/webknossos#8229
File: frontend/javascripts/oxalis/model/accessors/dataset_accessor.ts:348-354
Timestamp: 2024-11-25T14:38:49.345Z
Learning: For the `getDatasetExtentAsProduct` function in `dataset_accessor.ts`, input validation for negative or zero dimensions is not necessary.
Learnt from: frcroth
PR: scalableminds/webknossos#8609
File: app/models/dataset/Dataset.scala:753-775
Timestamp: 2025-05-12T13:07:29.637Z
Learning: In the `updateMags` method of DatasetMagsDAO (Scala), the code handles different dataset types distinctly:
1. Non-WKW datasets have `magsOpt` populated and use the first branch which includes axisOrder, channelIndex, and credentialId.
2. WKW datasets will have `wkwResolutionsOpt` populated and use the second branch which includes cubeLength.
3. The final branch is a fallback for legacy data.
This ensures appropriate fields are populated for each dataset type.
Learnt from: MichaelBuessemeyer
PR: scalableminds/webknossos#8485
File: frontend/javascripts/oxalis/model/accessors/dataset_layer_transformation_accessor.ts:384-392
Timestamp: 2025-04-01T09:45:17.527Z
Learning: The function `isRotationAndMirrorMaybeOnly` in the dataset_layer_transformation_accessor.ts is intentionally designed to allow mirroring transformations (negative scale values). It uses the length comparison (`scale.length() === NON_SCALED_VECTOR.length()`) rather than component equality to permit mirrored axes while ensuring the overall scale magnitude remains the same.
🧬 Code Graph Analysis (1)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/datareaders/DatasetArray.scala (1)
util/src/main/scala/com/scalableminds/util/tools/Fox.scala (5)
  • Fox (30-223)
  • Fox (225-289)
  • failure (58-62)
  • s (229-233)
  • s (233-243)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: backend-tests
  • GitHub Check: build-smoketest-push
  • GitHub Check: frontend-tests

@fm3 fm3 requested a review from MichaelBuessemeyer July 7, 2025 08:05
Copy link
Contributor

@MichaelBuessemeyer MichaelBuessemeyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found a little nitpick in the newest changes.

Moreover during re-testing I noticed that the connectome files are not refreshed when the dataset is reloaded manually via the frontend UI. I had to restart wk to re-register the zarr-connectome file. Steps:

  • delete zarr connectome folder
  • start wk
  • view ds -> no zarr connectome file
  • restore zarr connectome folder on hard drive
  • go to dashboard and reload the ds
  • view the ds and no zarr connectome file
  • restart wk
  • zarr connectome file is present

Maybe some cache needs to be cleared?

Your fix seems to work. When I now request agglomerate 4 for the zarr connectome data, I get
image
But IMO the error message would be nicer if the inner error about the agglomerate not being present would be the top level error. It is more direct / easier to understand for a user IMO.

Moreover, I noticed that the hdf5 connectome yields a different error when loading segment id 4.
image

@fm3
Copy link
Member Author

fm3 commented Jul 7, 2025

Good catch on the missing cache clear! I added this now to DataSourceController. I also flipped the >= as per your suggestion.

I think further optimizing the legibility of the error messages is not really worth the effort at this time. I’d say we can do that if we start seeing these errors in the wild frequently

Copy link
Contributor

@MichaelBuessemeyer MichaelBuessemeyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok awesome, looks good 👍

thanks for fixing the caching problem

I think further optimizing the legibility of the error messages is not really worth the effort at this time. I’d say we can do that if we start seeing these errors in the wild frequently

Jep, definitely understandable 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants