Read Zarr Connectome Files #8717

fm3 · 2025-06-25T09:34:27Z

Support reading connectome files registered as attachment
read the new zarr3-based format
for hdf5 connectome files, always assume static list of synapse type names (there only ever was this static list, except for very few exceptions, compare https://scm.slack.com/archives/C5AKLAV0B/p1750852209211939?thread_ts=1705502230.128199&cid=C5AKLAV0B

Steps to test:

In connectome tab, load synapses both from hdf5 and zarr connectome file, should work as expected.

TODOs:

list + look up connectome file keys
delegate to correct service
re-connnect hdf5 connectome file service
How to deal with synapse type names for legacy files? Compare slack thread
implement zarr connectome file service
enable cache clear
test
extract string constants like for the other attachments

Issues:

contributes to Support reading new S3 Artifacts #8618
contributes to meshes, agglomerates, segment indices in datasource-properties.json #8567

Added changelog entry (create a $PR_NUMBER.md file in unreleased_changes or use ./tools/create-changelog-entry.py)
Removed dev-only changes like prints and application.conf edits
Considered common edge cases
Needs datastore update after deployment

…AgglomerateId

…hments

coderabbitai

Actionable comments posted: 3

🔭 Outside diff range comments (1)

webknossos-datastore/app/com/scalableminds/webknossos/datastore/storage/RemoteSourceDescriptorService.scala (1)
80-86: Path-traversal risk after .normalize – ensure path stays within dataset dir

normalize resolves .. segments, so a user-supplied relative path like ../../etc/passwd will now escape localDatasetDir, yet still pass the exists check and be returned.
val pathRelativeToDataset = localDatasetDir.resolve(localPath).normalize
...
if (pathRelativeToDataset.toFile.exists) {
  pathRelativeToDataset.toUri
}
Validate that the normalised path is still a descendant of the intended base (same for pathRelativeToLayer) before accepting it.
-        val pathRelativeToDataset = localDatasetDir.resolve(localPath).normalize
+        val pathRelativeToDataset = localDatasetDir.resolve(localPath).normalize
+        if (!pathRelativeToDataset.startsWith(localDatasetDir))
+          throw new Exception(s"Relative path $localPath resolves outside dataset directory")

-        val pathRelativeToLayer = localDatasetDir.resolve(layerName).resolve(localPath).normalize
+        val pathRelativeToLayer = localDatasetDir
+          .resolve(layerName)
+          .resolve(localPath)
+          .normalize
+        if (!pathRelativeToLayer.startsWith(localDatasetDir.resolve(layerName)))
+          throw new Exception(s"Relative path $localPath resolves outside layer directory")
Without this guard an attacker (or mis-configured attachment) could read arbitrary files.

🧹 Nitpick comments (11)

unreleased_changes/8717.md (1)

1-3: Changelog entry is too terse; missing migration & config notes

Consider adding:

Impact on deployments (datastore update required, caches, etc.)

Migration or backward-compatibility caveats for clients that still write HDF5-only connectomes.

Reference to the two fixed issues (#8618, #8567) for traceability.

A fuller entry makes release notes self-contained and searchable.
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/connectome/SynapticPartnerDirection.scala (1)
1-9: Consider adding a short Scaladoc for public enum

A one-liner explaining “src = presynaptic, dst = postsynaptic” aids IDE tooltips and cross-module usage.
-object SynapticPartnerDirection extends ExtendedEnumeration {
+/** Direction of a synaptic partner: `src` – presynaptic, `dst` – postsynaptic. */
+object SynapticPartnerDirection extends ExtendedEnumeration {
util/src/main/scala/com/scalableminds/util/collections/SequenceUtils.scala (1)
57-58: Fix the comment to match the actual return type

The comment mentions "returns Box of index" but the method actually returns Int.
-  // Search in a sorted array, returns Box of index where element is found or, if missing, where element would be inserted
+  // Search in a sorted array, returns index where element is found or, if missing, where element would be inserted
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/connectome/Hdf5ConnectomeFileService.scala (1)

116-117: Consider extracting hardcoded synapse type names

As noted in the PR description, string constants should be extracted. While this is documented as intentional for legacy files, consider moving these to a constants object for better maintainability.

Would you like me to help extract these constants to a dedicated object?
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/connectome/ZarrConnectomeFileService.scala (2)
93-93: Address TODO: Verify offset and shape order for multi-dimensional arrays

The TODO comment questions whether offset and shape should be transposed. This should be verified against the Zarr specification for multi-dimensional array access.

Would you like me to help verify the correct order by checking the Zarr3 specification or testing with sample data?

193-200: Fix variable naming inconsistency

The variables are named meshFileKey but should be connectomeFileKey to match the actual type and improve code clarity.
-    attributesCache.clear { meshFileKey =>
-      meshFileKey.dataSourceId == dataSourceId && layerNameOpt.forall(meshFileKey.layerName == _)
+    attributesCache.clear { connectomeFileKey =>
+      connectomeFileKey.dataSourceId == dataSourceId && layerNameOpt.forall(connectomeFileKey.layerName == _)
     }

     openArraysCache.clear {
-      case (meshFileKey, _) =>
-        meshFileKey.dataSourceId == dataSourceId && layerNameOpt.forall(meshFileKey.layerName == _)
+      case (connectomeFileKey, _) =>
+        connectomeFileKey.dataSourceId == dataSourceId && layerNameOpt.forall(connectomeFileKey.layerName == _)
     }
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/connectome/ConnectomeFileService.scala (5)
92-95: Extract string constants to a shared location.

As mentioned in the PR objectives, these string constants should be extracted similarly to other attachments for better maintainability and consistency.

Consider creating a constants object:
object ConnectomeConstants {
  val LocalConnectomesDir = "connectomes"
  val Hdf5ConnectomeFileExtension = "hdf5"
}
114-118: Consider more specific error handling instead of generic tryo.

The tryo wrapper could hide specific exceptions that might be useful for debugging. Consider handling specific exceptions that uriFromPathLiteral might throw.

Replace the generic exception handling:
-      registeredAttachmentNormalized <- tryo(registeredAttachment.map { attachment =>
+      registeredAttachmentNormalized <- Box(registeredAttachment.map { attachment =>
         attachment.copy(
           path =
             remoteSourceDescriptorService.uriFromPathLiteral(attachment.path.toString, localDatasetDir, dataLayer.name))
-      })
+      }) ?~! "Failed to normalize attachment path"
158-172: Consider logging failures for better observability.

While the comment explains that failures are intentionally ignored to prevent breaking the list request, it would be helpful to log these failures for debugging purposes.

Add logging before flattening:
         }
+        .map { results =>
+          results.foreach {
+            case Full(_) => // success
+            case failure => logger.warn(s"Failed to process connectome file: $failure")
+          }
+          results.flatten
+        }
-        // Only return successes, we don't want a malformed file breaking the list request.
-        .map(_.flatten))
187-187: Simplify agglomerate ID extraction.

Since you've already verified that agglomerateIds.length == 1, using headOption.toFox is redundant.
-        agglomerateId <- agglomerateIds.headOption.toFox ?~> "Failed to extract the single agglomerate ID from request"
+        agglomerateId = agglomerateIds.head
201-202: Consider memory usage for large agglomerate sets.

The directedPairs method creates n*(n-1) pairs, which could consume significant memory for large inputs. For example, 1000 agglomerates would create 999,000 pairs.

Consider implementing pagination or streaming for large agglomerate sets, or document the expected maximum size in the API documentation.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4c6b2b2 and 1687300.

📒 Files selected for processing (12)

unreleased_changes/8717.md (1 hunks)
util/src/main/scala/com/scalableminds/util/collections/SequenceUtils.scala (2 hunks)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/DataStoreModule.scala (2 hunks)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DataSourceController.scala (6 hunks)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/ConnectomeFileService.scala (0 hunks)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/connectome/ConnectomeFileService.scala (1 hunks)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/connectome/ConnectomeFileUtils.scala (1 hunks)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/connectome/Hdf5ConnectomeFileService.scala (1 hunks)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/connectome/SynapticPartnerDirection.scala (1 hunks)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/connectome/ZarrConnectomeFileService.scala (1 hunks)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/mesh/MeshFileService.scala (1 hunks)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/storage/RemoteSourceDescriptorService.scala (1 hunks)

💤 Files with no reviewable changes (1)

webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/ConnectomeFileService.scala

🧰 Additional context used

🧠 Learnings (4)

webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/connectome/ConnectomeFileUtils.scala (1)

Learnt from: frcroth
PR: scalableminds/webknossos#8598
File: webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DatasetLayerAttachments.scala:89-95
Timestamp: 2025-06-02T09:49:51.047Z
Learning: In WebKnossos dataset layer attachments, multiple file types can safely use the same directory name (like "agglomerates") because the scanning logic filters by file extension. For example, AgglomerateFileInfo scans for .hdf5 files while CumsumFileInfo scans for .json files in the same "agglomerates" directory without interference.

webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DataSourceController.scala (1)

Learnt from: MichaelBuessemeyer
PR: scalableminds/webknossos#8352
File: app/models/organization/CreditTransactionService.scala:0-0
Timestamp: 2025-01-27T12:06:42.865Z
Learning: In Scala's for-comprehension with Fox (Future-like type), the `<-` operator ensures sequential execution. If any step fails, the entire chain short-circuits and returns early, preventing subsequent operations from executing. This makes it safe to perform validation checks before database operations.

webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/connectome/Hdf5ConnectomeFileService.scala (1)

Learnt from: frcroth
PR: scalableminds/webknossos#8598
File: webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DatasetLayerAttachments.scala:89-95
Timestamp: 2025-06-02T09:49:51.047Z
Learning: In WebKnossos dataset layer attachments, multiple file types can safely use the same directory name (like "agglomerates") because the scanning logic filters by file extension. For example, AgglomerateFileInfo scans for .hdf5 files while CumsumFileInfo scans for .json files in the same "agglomerates" directory without interference.

webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/connectome/ConnectomeFileService.scala (1)

Learnt from: frcroth
PR: scalableminds/webknossos#8598
File: webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DatasetLayerAttachments.scala:89-95
Timestamp: 2025-06-02T09:49:51.047Z
Learning: In WebKnossos dataset layer attachments, multiple file types can safely use the same directory name (like "agglomerates") because the scanning logic filters by file extension. For example, AgglomerateFileInfo scans for .hdf5 files while CumsumFileInfo scans for .json files in the same "agglomerates" directory without interference.

🧬 Code Graph Analysis (2)

webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/connectome/ConnectomeFileUtils.scala (1)

webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/connectome/SynapticPartnerDirection.scala (1)

SynapticPartnerDirection (5-9)

webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/connectome/Hdf5ConnectomeFileService.scala (6)

util/src/main/scala/com/scalableminds/util/collections/SequenceUtils.scala (2)

SequenceUtils (5-63)

searchSorted (58-63)

webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/connectome/SynapticPartnerDirection.scala (1)

SynapticPartnerDirection (5-9)

webknossos-datastore/app/com/scalableminds/webknossos/datastore/storage/Hdf5FileCache.scala (2)

Hdf5FileCache (60-111)

fromPath (54-57)

webknossos-datastore/app/com/scalableminds/webknossos/datastore/DataStoreConfig.scala (1)

DataStoreConfig (10-72)

webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/connectome/ConnectomeFileService.scala (9)

mappingNameForConnectomeFile (174-182)

ingoingSynapsesForAgglomerate (220-231)

empty (62-62)

outgoingSynapsesForAgglomerate (231-242)

synapticPartnerForSynapses (254-266)

positionsForSynapses (266-277)

typesForSynapses (277-286)

synapseIdsForDirectedPair (242-254)

clearCache (286-297)

webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/connectome/ConnectomeFileUtils.scala (1)

synapticPartnerKey (20-25)

⏰ Context from checks skipped due to timeout of 90000ms (2)

GitHub Check: build-smoketest-push
GitHub Check: backend-tests

🔇 Additional comments (6)

webknossos-datastore/app/com/scalableminds/webknossos/datastore/DataStoreModule.scala (1)

7-11: Connectome services correctly bound

The additional Guice bindings and import block look good and follow the established pattern for eager singletons.

Also applies to: 60-63

webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/mesh/MeshFileService.scala (1)

82-89: No stray references to removed lookUpMeshFileImpl found
A recursive search for the legacy method name returned zero matches outside MeshFileService.scala, so the rename is safe.

webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/connectome/ConnectomeFileUtils.scala (1)

1-25: LGTM!

The trait provides a clean abstraction for connectome file constants and utilities. Good use of pattern matching for the synaptic partner key selection.

webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DataSourceController.scala (2)

513-526: Well-structured refactoring to use the new connectome file service

The consistent pattern of retrieving dataSource/dataLayer and using connectome file keys improves abstraction and maintainability.

554-556: Good validation of the direction parameter

Properly validates the string parameter into the SynapticPartnerDirection enum with appropriate error message.

webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/connectome/Hdf5ConnectomeFileService.scala (1)

155-160: Good error handling pattern with finishAccessOnFailure

The helper method ensures proper cleanup of file handles even when operations fail, preventing resource leaks.

...p/com/scalableminds/webknossos/datastore/services/connectome/Hdf5ConnectomeFileService.scala

...p/com/scalableminds/webknossos/datastore/services/connectome/ZarrConnectomeFileService.scala

...e/app/com/scalableminds/webknossos/datastore/services/connectome/ConnectomeFileService.scala

coderabbitai

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1687300 and b18eb66.

📒 Files selected for processing (1)

webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/connectome/Hdf5ConnectomeFileService.scala (1 hunks)

🧰 Additional context used

🧠 Learnings (1)

webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/connectome/Hdf5ConnectomeFileService.scala (1)

Learnt from: frcroth
PR: scalableminds/webknossos#8598
File: webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DatasetLayerAttachments.scala:89-95
Timestamp: 2025-06-02T09:49:51.047Z
Learning: In WebKnossos dataset layer attachments, multiple file types can safely use the same directory name (like "agglomerates") because the scanning logic filters by file extension. For example, AgglomerateFileInfo scans for .hdf5 files while CumsumFileInfo scans for .json files in the same "agglomerates" directory without interference.

⏰ Context from checks skipped due to timeout of 90000ms (3)

GitHub Check: backend-tests
GitHub Check: frontend-tests
GitHub Check: build-smoketest-push

🔇 Additional comments (9)

webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/connectome/Hdf5ConnectomeFileService.scala (9)

1-19: LGTM: Well-structured service class with proper dependency injection

The class structure, imports, and initialization look good. The use of dependency injection, Fox monads, and caching is appropriate for this service.

21-30: LGTM: Proper resource management in mappingNameForConnectomeFile

The method correctly uses the cached file handle, implements error handling with Fox, and ensures proper resource cleanup.

32-62: LGTM: Correct CSC structure usage for incoming synapses

The method correctly uses keyCscIndptr for incoming synapses (CSC - Compressed Sparse Column structure) and properly handles the multi-step process of reading agglomerate pairs and then synapse ranges.

83-95: LGTM: Clean implementation of synaptic partner retrieval

The method correctly retrieves synaptic partners for given synapses and direction, with proper error handling and resource management.

97-108: LGTM: Proper matrix data reading for synapse positions

The method correctly reads 3D position data using readMatrixBlockWithOffset and handles the conversion to the expected format.

110-123: LGTM: Appropriate handling of legacy synapse types

The hard-coded type names list is reasonable for legacy HDF5 files as mentioned in the PR objectives. The implementation correctly reads synapse types and combines them with the legend.

125-153: LGTM: Sophisticated directed pair synapse lookup

The method implements a complex but correct algorithm for finding synapses between specific agglomerate pairs using CSR structure traversal and binary search. The logic handles edge cases appropriately.

155-160: LGTM: Proper resource cleanup utility

The finishAccessOnFailure helper method correctly ensures file handle cleanup even when operations fail, using the tryo utility for exception handling.

162-166: LGTM: Efficient cache clearing implementation

The cache clearing method correctly constructs the relevant path and clears cache entries based on the provided criteria.

...p/com/scalableminds/webknossos/datastore/services/connectome/Hdf5ConnectomeFileService.scala

… into zarr-connectome

coderabbitai

Actionable comments posted: 0

♻️ Duplicate comments (1)

webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/connectome/Hdf5ConnectomeFileService.scala (1)

64-82: Critical: Incomplete CSR structure handling in outgoingSynapsesForAgglomerate

The method reads CSR indptr but skips reading the intermediate CSR indices array, unlike the correct CSC handling in ingoingSynapsesForAgglomerate. This creates an inconsistency in the data structure access pattern.

The CSR format requires:

Read keyCsrIndptr to get pointer range

Read keyCsrAgglomeratePair (CSR indices) using the pointer range

Use the agglomerate pair IDs to read from keyAgglomeratePairOffsets

🧹 Nitpick comments (2)

webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/connectome/Hdf5ConnectomeFileService.scala (2)
116-117: Document the hardcoded synapse type names

The hardcoded type names are mentioned as expected for legacy files, but consider adding a comment referencing the related issues (#8618, #8567) or internal discussions mentioned in the PR description for better maintainability.
-      // Hard coded type name list, as all legacy files have this value.
+      // Hard coded type name list for legacy HDF5 files (see #8618, #8567)
+      // All legacy files use these three synapse types consistently
       typeNames = List("dendritic-shaft-synapse", "spine-head-synapse", "soma-synapse")
90-94: Consider batch reading for better performance

The current implementation reads synaptic partners one by one using serialCombined. For large synapse ID lists, this could be inefficient. Consider implementing batch reading if the HDF5 library supports it.

This is a performance optimization that could be addressed in a future iteration if needed based on actual usage patterns.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b18eb66 and d065282.

📒 Files selected for processing (1)

webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/connectome/Hdf5ConnectomeFileService.scala (1 hunks)

🧰 Additional context used

🧠 Learnings (1)

webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/connectome/Hdf5ConnectomeFileService.scala (1)

Learnt from: frcroth
PR: scalableminds/webknossos#8598
File: webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DatasetLayerAttachments.scala:89-95
Timestamp: 2025-06-02T09:49:51.047Z
Learning: In WebKnossos dataset layer attachments, multiple file types can safely use the same directory name (like "agglomerates") because the scanning logic filters by file extension. For example, AgglomerateFileInfo scans for .hdf5 files while CumsumFileInfo scans for .json files in the same "agglomerates" directory without interference.

⏰ Context from checks skipped due to timeout of 90000ms (3)

GitHub Check: build-smoketest-push
GitHub Check: backend-tests
GitHub Check: frontend-tests

🔇 Additional comments (3)

webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/connectome/Hdf5ConnectomeFileService.scala (3)

125-153: LGTM: Proper CSR handling in synapseIdsForDirectedPair

This method correctly implements the CSR data structure access pattern by reading the CSR indices array (line 139) before using the values to locate the target agglomerate pair. The binary search optimization using SequenceUtils.searchSorted is also well-implemented.

155-160: LGTM: Proper resource cleanup pattern

The finishAccessOnFailure helper method correctly ensures that file handles are properly closed even when exceptions occur, preventing resource leaks in the file cache.

162-166: LGTM: Efficient cache clearing with path-based filtering

The cache clearing implementation correctly filters cache entries by path prefix, allowing for both dataset-wide and layer-specific cache invalidation.

MichaelBuessemeyer

Here is my mini code feedback :D

MichaelBuessemeyer · 2025-07-03T13:26:28Z

...-datastore/app/com/scalableminds/webknossos/datastore/controllers/DataSourceController.scala

+          directionValidated <- SynapticPartnerDirection
+            .fromString(direction)
+            .toFox ?~> "could not parse synaptic partner direction"
+          (dataSource, dataLayer) <- dataSourceRepository.getDataSourceAndDataLayer(organizationId,


A high level thing I just noticed: Why do you always "load" the dataLayer and pass it to the lookUp....FileKey functions? At least for lookUpConnectomeFileKey only the dataLayer.name prop is being used. But isn't dataLayer.name equal to dataLayerName?

True, that’s the same value. However, the lookUp* functions also use dataLayer.attachments, which needs the full layer. Also, getDataSourceAndDataLayer is cheap as it only uses the in-memory dataSourceRepository. So we can use it to also validate that the dataSource really exists, even if we then only use dataSource.id

...p/com/scalableminds/webknossos/datastore/services/connectome/Hdf5ConnectomeFileService.scala

...p/com/scalableminds/webknossos/datastore/services/connectome/ZarrConnectomeFileService.scala

MichaelBuessemeyer

Ok here are my testing results:

When I tried to load segment id 4 in the connectome tab with using the zarr connectome I get

IMO an error message explaining that the agglomerate does not exist in the connectome or so would be more preferable imo.
- ok after I tested some other values somehow only segment 4 triggers the bug.
I can also produce a similar bug in the hdf5 variant

Maybe the test connnectome data is wrong for segment 4 🤔?

coderabbitai

Actionable comments posted: 5

♻️ Duplicate comments (2)

webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/connectome/Hdf5ConnectomeFileService.scala (2)
21-24: Consider extracting hardcoded synapse type names to a more appropriate location

The hardcoded legacySynapseTypeNames list creates a "magic" feeling in the service class. Consider moving this to a companion object, configuration file, or dedicated constants class to improve maintainability and discoverability.

69-86: Critical: Fix inconsistent CSR structure handling in outgoingSynapsesForAgglomerate

The method incorrectly skips reading the CSR indices array (keyCsrAgglomeratePair) before accessing keyAgglomeratePairOffsets. This creates an inconsistency with the CSC handling in ingoingSynapsesForAgglomerate and may lead to incorrect results.

The method should follow the same pattern as ingoingSynapsesForAgglomerate:

Read CSR indptr to get fromPtr/toPtr (already correct)

Read the agglomerate pairs from the CSR indices array (keyCsrAgglomeratePair)

For each agglomerate pair, read synapse ranges from keyAgglomeratePairOffsets
   fromPtr <- fromAndToPtr.lift(0).toFox ?~> "Could not read start offset from connectome file"
   toPtr <- fromAndToPtr.lift(1).toFox ?~> "Could not read end offset from connectome file"
-  from <- finishAccessOnFailure(cachedConnectomeFile) {
-    cachedConnectomeFile.uint64Reader.readArrayBlockWithOffset(keyAgglomeratePairOffsets, 1, fromPtr)
-  }.flatMap(_.headOption.toFox) ?~> "Could not read synapses from connectome file"
-  to <- finishAccessOnFailure(cachedConnectomeFile) {
-    cachedConnectomeFile.uint64Reader.readArrayBlockWithOffset(keyAgglomeratePairOffsets, 1, toPtr)
-  }.flatMap(_.headOption.toFox) ?~> "Could not read synapses from connectome file"
+  agglomeratePairs: Array[Long] <- if (toPtr - fromPtr == 0L) Fox.successful(Array.empty[Long])
+  else
+    finishAccessOnFailure(cachedConnectomeFile) {
+      cachedConnectomeFile.uint64Reader.readArrayBlockWithOffset(keyCsrAgglomeratePair, (toPtr - fromPtr).toInt, fromPtr)
+    } ?~> "Could not read agglomerate pairs from connectome file"
+  synapseIdsNested <- Fox.serialCombined(agglomeratePairs.toList) { agglomeratePair: Long =>
+    for {
+      from <- finishAccessOnFailure(cachedConnectomeFile) {
+        cachedConnectomeFile.uint64Reader.readArrayBlockWithOffset(keyAgglomeratePairOffsets, 1, agglomeratePair)
+      }.flatMap(_.headOption.toFox)
+      to <- finishAccessOnFailure(cachedConnectomeFile) {
+        cachedConnectomeFile.uint64Reader.readArrayBlockWithOffset(keyAgglomeratePairOffsets, 1, agglomeratePair + 1)
+      }.flatMap(_.headOption.toFox)
+    } yield List.range(from, to)
+  } ?~> "Could not read outgoing synapses from connectome file"
+  _ = cachedConnectomeFile.finishAccess()
-} yield Seq.range(from, to)
+} yield synapseIdsNested.flatten

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d065282 and 09ef7a5.

📒 Files selected for processing (4)

webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/connectome/Hdf5ConnectomeFileService.scala (1 hunks)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/connectome/ZarrConnectomeFileService.scala (1 hunks)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/mesh/ZarrMeshFileService.scala (1 hunks)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/segmentindex/ZarrSegmentIndexFileService.scala (1 hunks)

✅ Files skipped from review due to trivial changes (2)

webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/mesh/ZarrMeshFileService.scala
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/segmentindex/ZarrSegmentIndexFileService.scala

🚧 Files skipped from review as they are similar to previous changes (1)

webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/connectome/ZarrConnectomeFileService.scala

🧰 Additional context used

🧠 Learnings (1)

webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/connectome/Hdf5ConnectomeFileService.scala (1)

Learnt from: frcroth
PR: scalableminds/webknossos#8598
File: webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DatasetLayerAttachments.scala:89-95
Timestamp: 2025-06-02T09:49:51.047Z
Learning: In WebKnossos dataset layer attachments, multiple file types can safely use the same directory name (like "agglomerates") because the scanning logic filters by file extension. For example, AgglomerateFileInfo scans for .hdf5 files while CumsumFileInfo scans for .json files in the same "agglomerates" directory without interference.

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: build-smoketest-push
GitHub Check: frontend-tests
GitHub Check: backend-tests

🔇 Additional comments (2)

webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/connectome/Hdf5ConnectomeFileService.scala (2)

159-164: Good implementation of error handling utility

The finishAccessOnFailure helper method provides a clean way to ensure resources are properly cleaned up even when exceptions occur. This is a solid pattern for resource management.

166-170: Efficient cache clearing implementation

The cache clearing method correctly constructs the relevant path and uses a predicate-based clearing strategy, which is efficient and targeted.

...p/com/scalableminds/webknossos/datastore/services/connectome/Hdf5ConnectomeFileService.scala

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

webknossos-datastore/app/com/scalableminds/webknossos/datastore/datareaders/DatasetArray.scala (1)

196-197: Excellent defensive programming - validates negative shape values.

This validation check correctly prevents processing of invalid shape arrays containing negative dimensions. The implementation follows good practices:

Uses idiomatic Scala with shape.exists(_ < 0) for efficient short-circuiting

Provides clear error messaging with the problematic shape values

Maintains consistency with existing error handling patterns using Fox.failure

Positioned appropriately after the zero-check but before main processing logic

Consider adding unit tests to verify this validation handles edge cases like:

Single negative dimension

Multiple negative dimensions

Mixed positive/negative dimensions

Empty shape arrays

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 09ef7a5 and 6beaf69.

📒 Files selected for processing (3)

webknossos-datastore/app/com/scalableminds/webknossos/datastore/datareaders/DatasetArray.scala (1 hunks)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/connectome/Hdf5ConnectomeFileService.scala (1 hunks)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/connectome/ZarrConnectomeFileService.scala (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (2)

webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/connectome/Hdf5ConnectomeFileService.scala
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/connectome/ZarrConnectomeFileService.scala

🧰 Additional context used

🧠 Learnings (1)

webknossos-datastore/app/com/scalableminds/webknossos/datastore/datareaders/DatasetArray.scala (3)

Learnt from: dieknolle3333
PR: scalableminds/webknossos#8229
File: frontend/javascripts/oxalis/model/accessors/dataset_accessor.ts:348-354
Timestamp: 2024-11-25T14:38:49.345Z
Learning: For the `getDatasetExtentAsProduct` function in `dataset_accessor.ts`, input validation for negative or zero dimensions is not necessary.

Learnt from: frcroth
PR: scalableminds/webknossos#8609
File: app/models/dataset/Dataset.scala:753-775
Timestamp: 2025-05-12T13:07:29.637Z
Learning: In the `updateMags` method of DatasetMagsDAO (Scala), the code handles different dataset types distinctly:
1. Non-WKW datasets have `magsOpt` populated and use the first branch which includes axisOrder, channelIndex, and credentialId.
2. WKW datasets will have `wkwResolutionsOpt` populated and use the second branch which includes cubeLength.
3. The final branch is a fallback for legacy data.
This ensures appropriate fields are populated for each dataset type.

Learnt from: MichaelBuessemeyer
PR: scalableminds/webknossos#8485
File: frontend/javascripts/oxalis/model/accessors/dataset_layer_transformation_accessor.ts:384-392
Timestamp: 2025-04-01T09:45:17.527Z
Learning: The function `isRotationAndMirrorMaybeOnly` in the dataset_layer_transformation_accessor.ts is intentionally designed to allow mirroring transformations (negative scale values). It uses the length comparison (`scale.length() === NON_SCALED_VECTOR.length()`) rather than component equality to permit mirrored axes while ensuring the overall scale magnitude remains the same.

🧬 Code Graph Analysis (1)

webknossos-datastore/app/com/scalableminds/webknossos/datastore/datareaders/DatasetArray.scala (1)

util/src/main/scala/com/scalableminds/util/tools/Fox.scala (5)

Fox (30-223)

Fox (225-289)

failure (58-62)

s (229-233)

s (233-243)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: backend-tests
GitHub Check: build-smoketest-push
GitHub Check: frontend-tests

MichaelBuessemeyer

I found a little nitpick in the newest changes.

Moreover during re-testing I noticed that the connectome files are not refreshed when the dataset is reloaded manually via the frontend UI. I had to restart wk to re-register the zarr-connectome file. Steps:

delete zarr connectome folder
start wk
view ds -> no zarr connectome file
restore zarr connectome folder on hard drive
go to dashboard and reload the ds
view the ds and no zarr connectome file
restart wk
zarr connectome file is present

Maybe some cache needs to be cleared?

Your fix seems to work. When I now request agglomerate 4 for the zarr connectome data, I get

But IMO the error message would be nicer if the inner error about the agglomerate not being present would be the top level error. It is more direct / easier to understand for a user IMO.

Moreover, I noticed that the hdf5 connectome yields a different error when loading segment id 4.

...p/com/scalableminds/webknossos/datastore/services/connectome/Hdf5ConnectomeFileService.scala

...p/com/scalableminds/webknossos/datastore/services/connectome/ZarrConnectomeFileService.scala

fm3 · 2025-07-07T11:19:08Z

Good catch on the missing cache clear! I added this now to DataSourceController. I also flipped the >= as per your suggestion.

I think further optimizing the legibility of the error messages is not really worth the effort at this time. I’d say we can do that if we start seeing these errors in the wild frequently

MichaelBuessemeyer

Ok awesome, looks good 👍

thanks for fixing the caching problem

I think further optimizing the legibility of the error messages is not really worth the effort at this time. I’d say we can do that if we start seeing these errors in the wild frequently

Jep, definitely understandable 👍

fm3 added 30 commits May 19, 2025 10:20

WIP: Read zarr agglomerate files

4dd9409

zarr group path

380bd69

test reading from zarr array

7fb643f

axisOrder: make y optional

f987ebf

Merge branch 'master' into agglomerates-zarr

935571a

undo attempt to make axisOrder.y optional

7c1cc8b

read multi array, ignoring underlying storage and axis order

5af14fc

apply agglomerate

f055c1e

offset can be long; pass tokencontext

13ff0e3

WIP read agglomerate skeleton

d853389

fix reading agglomerate skeleton

56ce08b

Change DatasetArray shape from Int to Long. Implement reading largest…

2855d2b

…AgglomerateId

remove unused agglomeratesForAllSegments

4ed5483

Merge branch 'master' into agglomerates-zarr

7f8f662

add shortcut for shape.product==0; implement segmentIdsForAgglomerateId

291aab5

remove unused test

0439bce

implement positionForSegmentId; agglomerateIdsForSegmentIds

11027d7

select mapping by request

0c5d647

shortcut for single-dimension shape+offset

90d97cd

handle uint32 agglomerate_to_segments arrays

8551f99

useZarr=false to test ci

d183c99

change chunkIndices back to list

cd04466

use headOption instead of list deconstruction

9eac53d

Merge branch 'master' into agglomerates-zarr

0105b8e

WIP distinguish btw hdf5 and zarr according to registered layer attac…

fd7a281

…hments

pass datasource id + layer

19641af

list attached agglomerate files

716794d

format

feb8cb4

Merge branch 'master' into agglomerates-zarr

5c924dc

use agglomeratefilekey as cache key for proper cache clear support

f040483

fm3 removed the refactoring label Jul 1, 2025

fm3 and others added 5 commits July 2, 2025 10:26

Merge branch 'master' into zarr-segment-index

9e73f73

implement pr feedback; extract common string values to traits

661b258

format

313838c

Merge branch 'master' into zarr-segment-index

2ab3ab2

Merge branch 'zarr-segment-index' into zarr-connectome

53570e6

Base automatically changed from zarr-segment-index to master July 3, 2025 09:18

fm3 added 2 commits July 3, 2025 11:23

extract string constants to ConnectomeFileUtils

f6ddc57

Merge branch 'master' into zarr-connectome

1687300

coderabbitai bot reviewed Jul 3, 2025

View reviewed changes

fm3 and others added 2 commits July 3, 2025 14:03

once more csc csr typo

b18eb66

Merge branch 'master' into zarr-connectome

66c6197

coderabbitai bot reviewed Jul 3, 2025

View reviewed changes

...p/com/scalableminds/webknossos/datastore/services/connectome/Hdf5ConnectomeFileService.scala Outdated Show resolved Hide resolved

fm3 requested a review from MichaelBuessemeyer July 3, 2025 12:22

fm3 added 2 commits July 3, 2025 14:34

fix typos in error messages

871256d

Merge branch 'zarr-connectome' of github.com:scalableminds/webknossos…

d065282

… into zarr-connectome

coderabbitai bot reviewed Jul 3, 2025

View reviewed changes

MichaelBuessemeyer reviewed Jul 3, 2025

View reviewed changes

MichaelBuessemeyer requested changes Jul 3, 2025

View reviewed changes

fm3 added 2 commits July 7, 2025 09:12

Merge branch 'master' into zarr-connectome

79603c0

implement pr feedback (part 1)

09ef7a5

coderabbitai bot reviewed Jul 7, 2025

View reviewed changes

catch toPtr<fromPtr

6beaf69

coderabbitai bot reviewed Jul 7, 2025

View reviewed changes

add finishAccess calls to allow cache release

6280991

fm3 requested a review from MichaelBuessemeyer July 7, 2025 08:05

MichaelBuessemeyer reviewed Jul 7, 2025

View reviewed changes

...p/com/scalableminds/webknossos/datastore/services/connectome/Hdf5ConnectomeFileService.scala Outdated Show resolved Hide resolved

...p/com/scalableminds/webknossos/datastore/services/connectome/ZarrConnectomeFileService.scala Outdated Show resolved Hide resolved

add missing cache clear; flip >=

2974e9f

MichaelBuessemeyer approved these changes Jul 7, 2025

View reviewed changes

Read Zarr Connectome Files #8717

Are you sure you want to change the base?

Read Zarr Connectome Files #8717

Uh oh!

Conversation

fm3 commented Jun 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Steps to test:

TODOs:

Issues:

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

MichaelBuessemeyer left a comment

Choose a reason for hiding this comment

Uh oh!

MichaelBuessemeyer Jul 3, 2025

Choose a reason for hiding this comment

Uh oh!

fm3 Jul 7, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

MichaelBuessemeyer left a comment • edited by fm3 Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

MichaelBuessemeyer left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

fm3 commented Jul 7, 2025

Uh oh!

MichaelBuessemeyer left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

fm3 commented Jun 25, 2025 •

edited

Loading

MichaelBuessemeyer left a comment •

edited by fm3

Loading

MichaelBuessemeyer left a comment •

edited

Loading