Skip to content

Commit 3d93c5c

Browse files
authored
Read Zarr Agglomerate Files (#8633)
Agglomerate Mappings can now also be read from the new zarr3-based format, and from remote object storage. - AgglomerateFileKey, which identifies an agglomerate file, now holds a LayerAttachment (which can specify a remote URI) so we can use VaultPaths for accessing remote agglomerate files - interface of AgglomerateService methods changed (take the new AgglomerateFileKey) AgglomerateService is no longer injected, instead it is explicitly created in BinaryDataServiceHolder so we can pass it the sharedChunkContentsCache - AgglomerateService now delegates to either Hdf5AgglomerateService (basically the old code) or ZarrAgglomerateService (a lot of duplication from the other one, but unifying it at this time would sacrifice performance for hdf5) - DatasetArray has new public method ReadAsMultiArray, which does not do any axisorder but instead yields a MultiArray in the order of the DatasetArray - removed unused route agglomerateIdsForAllSegmentIds ### URL of deployed dev instance (used for testing): - https://zarragglomerates.webknossos.xyz ### Steps to test: - With test dataset from https://www.notion.so/scalableminds/Test-Datasets-c0563be9c4a4499dae4e16d9b2497cfb?source=copy_link#209b51644c6380ac85e0f6b0c7e339cf select agglomerate mapping, should be displayed correctly - Import agglomerate skeleton, should look plausible - Do some proofreading (splits + merges) should work - Also test that the old format still works (e.g. use older test-agglomerate-file dataset with hdf5 agglomerate files) ### TODOs: <details> <summary>Backend</summary> - [x] open zarr agglomerate as zarr array and read contents - [x] read MultiArray without caring about AxisOrder - [x] test with 2D - [x] Read agglomerate arrays with correct types - [x] Re-Implement public functions of agglomerate service - [x] applyAgglomerate - [x] generate agglomerate skeleton - [x] largest agglomerate id - [x] generate agglomerate graph - [x] segmentIdsForAgglomerateId - [x] agglomerateIdsForSegmentIds - [x] positionForSegmentId - [x] What’s up with zarr streaming in the tests? reproduce with test-dataset, ids are wrong, also in normal data loading - [x] Create indirection for selecting the zarr agglomerates OR hdf5 agglomerates - [x] reduce code duplication btw hdf5 and zarr - [x] Error handling (index lookups always in tryo. abstraction?) - [x] Read remote (Build VaultPath for URI) - [x] Discover files? - [x] Adapt requests to specify which agglomerate file should be read from (type? full path?) - [x] Caching / Speedup (added some caching but did not test on large-scale DS. will be follow-up) - [x] Clear caches on layer/DS reload - [x] Make sure the agglomerate zarr directories don’t blow up dataset exploring - [x] Code Cleanup </details> <details> <summary>Frontend</summary> - [x] Starting proofreading with a split action doesn’t properly flush updateMappingName before calling minCut route → fixed by #8676 </details> ### Issues: - contributes to #8618 - contributes to #8567 ------ - [x] Updated [changelog](../blob/master/CHANGELOG.unreleased.md#unreleased) - [x] Removed dev-only changes like prints and application.conf edits - [x] Considered [common edge cases](../blob/master/.github/common_edge_cases.md) - [x] Needs datastore update after deployment
1 parent ee43e60 commit 3d93c5c

39 files changed

+1066
-586
lines changed

conf/messages

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -149,6 +149,7 @@ zarr.invalidFirstChunkCoord=First Channel must be 0
149149
zarr.chunkNotFound=Could not find the requested chunk
150150
zarr.notEnoughCoordinates=Invalid number of chunk coordinates. Expected to get at least 3 dimensions and channel 0.
151151
zarr.invalidAdditionalCoordinates=Invalid additional coordinates for this data layer.
152+
zarr.readShardIndex.failed=Failed to read shard information for zarr data. This may indicate missing data.
152153

153154
nml.file.uploadSuccess=Successfully uploaded file
154155
nml.file.notFound=Could not extract NML file

unreleased_changes/8633.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
### Added
2+
- Agglomerate Mappings can now also be read from the new zarr3-based format, and from remote object storage.

webknossos-datastore/app/com/scalableminds/webknossos/datastore/DataStoreModule.scala

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,6 @@ class DataStoreModule extends AbstractModule {
2727
bind(classOf[DSRemoteWebknossosClient]).asEagerSingleton()
2828
bind(classOf[BinaryDataServiceHolder]).asEagerSingleton()
2929
bind(classOf[MappingService]).asEagerSingleton()
30-
bind(classOf[AgglomerateService]).asEagerSingleton()
3130
bind(classOf[AdHocMeshServiceHolder]).asEagerSingleton()
3231
bind(classOf[ApplicationHealthService]).asEagerSingleton()
3332
bind(classOf[DSDatasetErrorLoggingService]).asEagerSingleton()

webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DSMeshController.scala

Lines changed: 7 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,7 @@ class DSMeshController @Inject()(
2323
meshFileService: MeshFileService,
2424
neuroglancerPrecomputedMeshService: NeuroglancerPrecomputedMeshFileService,
2525
fullMeshService: DSFullMeshService,
26+
dataSourceRepository: DataSourceRepository,
2627
val dsRemoteWebknossosClient: DSRemoteWebknossosClient,
2728
val dsRemoteTracingstoreClient: DSRemoteTracingstoreClient,
2829
val binaryDataServiceHolder: BinaryDataServiceHolder
@@ -66,10 +67,12 @@ class DSMeshController @Inject()(
6667
datasetDirectoryName,
6768
dataLayerName,
6869
request.body.meshFile.name)
69-
segmentIds: List[Long] <- segmentIdsForAgglomerateIdIfNeeded(
70-
organizationId,
71-
datasetDirectoryName,
72-
dataLayerName,
70+
(dataSource, dataLayer) <- dataSourceRepository.getDataSourceAndDataLayer(organizationId,
71+
datasetDirectoryName,
72+
dataLayerName)
73+
segmentIds: Seq[Long] <- segmentIdsForAgglomerateIdIfNeeded(
74+
dataSource.id,
75+
dataLayer,
7376
targetMappingName,
7477
editableMappingTracingId,
7578
request.body.segmentId,

webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DataSourceController.scala

Lines changed: 43 additions & 62 deletions
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@ import com.scalableminds.webknossos.datastore.models.datasource.{DataLayer, Data
2121
import com.scalableminds.webknossos.datastore.services._
2222
import com.scalableminds.webknossos.datastore.services.mesh.{MeshFileService, MeshMappingHelper}
2323
import com.scalableminds.webknossos.datastore.services.uploading._
24-
import com.scalableminds.webknossos.datastore.storage.{AgglomerateFileKey, DataVaultService}
24+
import com.scalableminds.webknossos.datastore.storage.DataVaultService
2525
import net.liftweb.common.Box.tryo
2626
import net.liftweb.common.{Box, Empty, Failure, Full}
2727
import play.api.data.Form
@@ -262,7 +262,10 @@ class DataSourceController @Inject()(
262262
UserAccessRequest.readDataSources(DataSourceId(datasetDirectoryName, organizationId))) {
263263
for {
264264
agglomerateService <- binaryDataServiceHolder.binaryDataService.agglomerateServiceOpt.toFox
265-
agglomerateList = agglomerateService.exploreAgglomerates(organizationId, datasetDirectoryName, dataLayerName)
265+
(dataSource, dataLayer) <- dataSourceRepository.getDataSourceAndDataLayer(organizationId,
266+
datasetDirectoryName,
267+
dataLayerName)
268+
agglomerateList = agglomerateService.listAgglomerates(dataSource.id, dataLayer)
266269
} yield Ok(Json.toJson(agglomerateList))
267270
}
268271
}
@@ -278,9 +281,12 @@ class DataSourceController @Inject()(
278281
UserAccessRequest.readDataSources(DataSourceId(datasetDirectoryName, organizationId))) {
279282
for {
280283
agglomerateService <- binaryDataServiceHolder.binaryDataService.agglomerateServiceOpt.toFox
284+
(dataSource, dataLayer) <- dataSourceRepository.getDataSourceAndDataLayer(organizationId,
285+
datasetDirectoryName,
286+
dataLayerName)
287+
agglomerateFileKey <- agglomerateService.lookUpAgglomerateFileKey(dataSource.id, dataLayer, mappingName)
281288
skeleton <- agglomerateService
282-
.generateSkeleton(organizationId, datasetDirectoryName, dataLayerName, mappingName, agglomerateId)
283-
.toFox ?~> "agglomerateSkeleton.failed"
289+
.generateSkeleton(agglomerateFileKey, agglomerateId) ?~> "agglomerateSkeleton.failed"
284290
} yield Ok(skeleton.toByteArray).as(protobufMimeType)
285291
}
286292
}
@@ -296,11 +302,12 @@ class DataSourceController @Inject()(
296302
UserAccessRequest.readDataSources(DataSourceId(datasetDirectoryName, organizationId))) {
297303
for {
298304
agglomerateService <- binaryDataServiceHolder.binaryDataService.agglomerateServiceOpt.toFox
305+
(dataSource, dataLayer) <- dataSourceRepository.getDataSourceAndDataLayer(organizationId,
306+
datasetDirectoryName,
307+
dataLayerName)
308+
agglomerateFileKey <- agglomerateService.lookUpAgglomerateFileKey(dataSource.id, dataLayer, mappingName)
299309
agglomerateGraph <- agglomerateService
300-
.generateAgglomerateGraph(
301-
AgglomerateFileKey(organizationId, datasetDirectoryName, dataLayerName, mappingName),
302-
agglomerateId)
303-
.toFox ?~> "agglomerateGraph.failed"
310+
.generateAgglomerateGraph(agglomerateFileKey, agglomerateId) ?~> "agglomerateGraph.failed"
304311
} yield Ok(agglomerateGraph.toByteArray).as(protobufMimeType)
305312
}
306313
}
@@ -316,10 +323,12 @@ class DataSourceController @Inject()(
316323
UserAccessRequest.readDataSources(DataSourceId(datasetDirectoryName, organizationId))) {
317324
for {
318325
agglomerateService <- binaryDataServiceHolder.binaryDataService.agglomerateServiceOpt.toFox
326+
(dataSource, dataLayer) <- dataSourceRepository.getDataSourceAndDataLayer(organizationId,
327+
datasetDirectoryName,
328+
dataLayerName)
329+
agglomerateFileKey <- agglomerateService.lookUpAgglomerateFileKey(dataSource.id, dataLayer, mappingName)
319330
position <- agglomerateService
320-
.positionForSegmentId(AgglomerateFileKey(organizationId, datasetDirectoryName, dataLayerName, mappingName),
321-
segmentId)
322-
.toFox ?~> "getSegmentPositionFromAgglomerateFile.failed"
331+
.positionForSegmentId(agglomerateFileKey, segmentId) ?~> "getSegmentPositionFromAgglomerateFile.failed"
323332
} yield Ok(Json.toJson(position))
324333
}
325334
}
@@ -334,16 +343,11 @@ class DataSourceController @Inject()(
334343
UserAccessRequest.readDataSources(DataSourceId(datasetDirectoryName, organizationId))) {
335344
for {
336345
agglomerateService <- binaryDataServiceHolder.binaryDataService.agglomerateServiceOpt.toFox
337-
largestAgglomerateId: Long <- agglomerateService
338-
.largestAgglomerateId(
339-
AgglomerateFileKey(
340-
organizationId,
341-
datasetDirectoryName,
342-
dataLayerName,
343-
mappingName
344-
)
345-
)
346-
.toFox
346+
(dataSource, dataLayer) <- dataSourceRepository.getDataSourceAndDataLayer(organizationId,
347+
datasetDirectoryName,
348+
dataLayerName)
349+
agglomerateFileKey <- agglomerateService.lookUpAgglomerateFileKey(dataSource.id, dataLayer, mappingName)
350+
largestAgglomerateId: Long <- agglomerateService.largestAgglomerateId(agglomerateFileKey)
347351
} yield Ok(Json.toJson(largestAgglomerateId))
348352
}
349353
}
@@ -358,45 +362,18 @@ class DataSourceController @Inject()(
358362
UserAccessRequest.readDataSources(DataSourceId(datasetDirectoryName, organizationId))) {
359363
for {
360364
agglomerateService <- binaryDataServiceHolder.binaryDataService.agglomerateServiceOpt.toFox
361-
agglomerateIds: Seq[Long] <- agglomerateService
362-
.agglomerateIdsForSegmentIds(
363-
AgglomerateFileKey(
364-
organizationId,
365-
datasetDirectoryName,
366-
dataLayerName,
367-
mappingName
368-
),
369-
request.body.items
370-
)
371-
.toFox
365+
(dataSource, dataLayer) <- dataSourceRepository.getDataSourceAndDataLayer(organizationId,
366+
datasetDirectoryName,
367+
dataLayerName)
368+
agglomerateFileKey <- agglomerateService.lookUpAgglomerateFileKey(dataSource.id, dataLayer, mappingName)
369+
agglomerateIds: Seq[Long] <- agglomerateService.agglomerateIdsForSegmentIds(
370+
agglomerateFileKey,
371+
request.body.items
372+
)
372373
} yield Ok(ListOfLong(agglomerateIds).toByteArray)
373374
}
374375
}
375376

376-
def agglomerateIdsForAllSegmentIds(
377-
organizationId: String,
378-
datasetDirectoryName: String,
379-
dataLayerName: String,
380-
mappingName: String
381-
): Action[ListOfLong] = Action.async(validateProto[ListOfLong]) { implicit request =>
382-
accessTokenService.validateAccessFromTokenContext(
383-
UserAccessRequest.readDataSources(DataSourceId(datasetDirectoryName, organizationId))) {
384-
for {
385-
agglomerateService <- binaryDataServiceHolder.binaryDataService.agglomerateServiceOpt.toFox
386-
agglomerateIds: Array[Long] <- agglomerateService
387-
.agglomerateIdsForAllSegmentIds(
388-
AgglomerateFileKey(
389-
organizationId,
390-
datasetDirectoryName,
391-
dataLayerName,
392-
mappingName
393-
)
394-
)
395-
.toFox
396-
} yield Ok(Json.toJson(agglomerateIds))
397-
}
398-
}
399-
400377
def update(organizationId: String, datasetDirectoryName: String): Action[DataSource] =
401378
Action.async(validateJson[DataSource]) { implicit request =>
402379
accessTokenService.validateAccessFromTokenContext(
@@ -637,10 +614,12 @@ class DataSourceController @Inject()(
637614
accessTokenService.validateAccessFromTokenContext(
638615
UserAccessRequest.readDataSources(DataSourceId(datasetDirectoryName, organizationId))) {
639616
for {
617+
(dataSource, dataLayer) <- dataSourceRepository.getDataSourceAndDataLayer(organizationId,
618+
datasetDirectoryName,
619+
dataLayerName)
640620
segmentIds <- segmentIdsForAgglomerateIdIfNeeded(
641-
organizationId,
642-
datasetDirectoryName,
643-
dataLayerName,
621+
dataSource.id,
622+
dataLayer,
644623
request.body.mappingName,
645624
request.body.editableMappingTracingId,
646625
segmentId.toLong,
@@ -674,12 +653,14 @@ class DataSourceController @Inject()(
674653
accessTokenService.validateAccessFromTokenContext(
675654
UserAccessRequest.readDataSources(DataSourceId(datasetDirectoryName, organizationId))) {
676655
for {
656+
(dataSource, dataLayer) <- dataSourceRepository.getDataSourceAndDataLayer(organizationId,
657+
datasetDirectoryName,
658+
dataLayerName)
677659
segmentIdsAndBucketPositions <- Fox.serialCombined(request.body.segmentIds) { segmentOrAgglomerateId =>
678660
for {
679661
segmentIds <- segmentIdsForAgglomerateIdIfNeeded(
680-
organizationId,
681-
datasetDirectoryName,
682-
dataLayerName,
662+
dataSource.id,
663+
dataLayer,
683664
request.body.mappingName,
684665
request.body.editableMappingTracingId,
685666
segmentOrAgglomerateId,

webknossos-datastore/app/com/scalableminds/webknossos/datastore/dataformats/DatasetArrayBucketProvider.scala

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -38,10 +38,10 @@ class DatasetArrayBucketProvider(dataLayer: DataLayer,
3838
datasetArray <- datasetArrayCache.getOrLoad(readInstruction.bucket.mag,
3939
_ => openDatasetArrayWithTimeLogging(readInstruction))
4040
bucket = readInstruction.bucket
41-
shape = Vec3Int.full(bucket.bucketLength)
4241
offset = Vec3Int(bucket.topLeft.voxelXInMag, bucket.topLeft.voxelYInMag, bucket.topLeft.voxelZInMag)
43-
bucketData <- datasetArray.readBytesWithAdditionalCoordinates(shape,
44-
offset,
42+
shape = Vec3Int.full(bucket.bucketLength)
43+
bucketData <- datasetArray.readBytesWithAdditionalCoordinates(offset,
44+
shape,
4545
bucket.additionalCoordinates,
4646
dataLayer.elementClass == ElementClass.uint24)
4747
} yield bucketData

webknossos-datastore/app/com/scalableminds/webknossos/datastore/dataformats/wkw/WKWHeader.scala

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -78,7 +78,7 @@ case class WKWHeader(
7878
}
7979
}
8080

81-
override def datasetShape: Option[Array[Int]] = None
81+
override def datasetShape: Option[Array[Long]] = None
8282

8383
override def chunkShape: Array[Int] =
8484
Array(numChannels, numVoxelsPerChunkDimension, numVoxelsPerChunkDimension, numVoxelsPerChunkDimension)

webknossos-datastore/app/com/scalableminds/webknossos/datastore/datareaders/AxisOrder.scala

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -96,6 +96,9 @@ case class FullAxisOrder(axes: Seq[Axis]) {
9696
def permuteIndicesArrayToWk(indices: Array[Int]): Array[Int] =
9797
arrayToWkPermutation.map(indices(_))
9898

99+
def permuteIndicesArrayToWkLong(indices: Array[Long]): Array[Long] =
100+
arrayToWkPermutation.map(indices(_))
101+
99102
def toWkLibsJson: JsValue =
100103
Json.toJson(axes.zipWithIndex.collect {
101104
case (axis, index) if axis.name == "x" || axis.name == "y" || axis.name == "z" =>

webknossos-datastore/app/com/scalableminds/webknossos/datastore/datareaders/ChunkUtils.scala

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,21 +1,21 @@
11
package com.scalableminds.webknossos.datastore.datareaders
22

33
object ChunkUtils {
4-
def computeChunkIndices(arrayShapeOpt: Option[Array[Int]],
4+
def computeChunkIndices(arrayShapeOpt: Option[Array[Long]],
55
arrayChunkShape: Array[Int],
66
selectedShape: Array[Int],
7-
selectedOffset: Array[Int]): List[Array[Int]] = {
7+
selectedOffset: Array[Long]): Seq[Array[Int]] = {
88
val nDims = arrayChunkShape.length
99
val start = new Array[Int](nDims)
1010
val end = new Array[Int](nDims)
1111
var numChunks = 1
1212
for (dim <- 0 until nDims) {
13-
val largestPossibleIndex = arrayShapeOpt.map(arrayShape => (arrayShape(dim) - 1) / arrayChunkShape(dim))
13+
val largestPossibleIndex = arrayShapeOpt.map(arrayShape => ((arrayShape(dim) - 1) / arrayChunkShape(dim)).toInt)
1414
val smallestPossibleIndex = 0
15-
val startIndexRaw = selectedOffset(dim) / arrayChunkShape(dim)
15+
val startIndexRaw = (selectedOffset(dim) / arrayChunkShape(dim)).toInt
1616
val startIndexClamped =
1717
Math.max(smallestPossibleIndex, Math.min(largestPossibleIndex.getOrElse(startIndexRaw), startIndexRaw))
18-
val endIndexRaw = (selectedOffset(dim) + selectedShape(dim) - 1) / arrayChunkShape(dim)
18+
val endIndexRaw = ((selectedOffset(dim) + selectedShape(dim) - 1) / arrayChunkShape(dim)).toInt
1919
val endIndexClampedToBbox =
2020
Math.max(smallestPossibleIndex, Math.min(largestPossibleIndex.getOrElse(endIndexRaw), endIndexRaw))
2121
val endIndexClamped = Math.max(startIndexClamped, endIndexClampedToBbox) // end index must be greater or equal to start index
@@ -38,6 +38,6 @@ object ChunkUtils {
3838
dimIndex = -1
3939
}
4040
}
41-
chunkIndices.toList
41+
chunkIndices.toSeq
4242
}
4343
}

0 commit comments

Comments
 (0)