-
Notifications
You must be signed in to change notification settings - Fork 2
Description
Feature Request
We would like to add an option to retrieve datasets to merlin. Currently there is a 'PSI-ra' option when retrieving from scicat. We would like to support similar functionality for merlin and other central archiving locations.
Ra implementation
(Please edit if any of this information is incorrect)
The current PSI-ra retrieval workflow is as follows:
- Each ra pgroup has a 'retrieve' directory owned by the retrieval service user
- SciCat creates a retrieval job:
{
"id": "c0a7cab3-acd7-4474-be75-b81024c775c8",
"emailJobInitiator": "spencer.bliven@psi.ch",
"type": "retrieve",
"jobParams": {
"username": "oidc.bliven_s",
"destinationPath": "/archive/retrieve",
"option": "PSI-RA"
},
"jobStatusMessage": "finishedSuccessful",
"datasetList": [
{
"pid": "20.500.11935/a1704aba-285b-4f95-b48d-36a10930694f",
"files": []
}
],
"jobResultObject": {
"result": {
"rc": "0",
"jobid": "76033"
}
}
}
- Arima fetches the data from tape, places it in
/das/work/<pgroup>/retrieve/<user>/<pid>
and reports success - users copy/move the data to the desired destination
Permissions rely on ACLs to allow both the service use and the pgroup members to access the directory.
Differences to merlin
Merlin does not use DUO or pgroups. Most users use a-groups and may archive from user directories or project directories, which do not correspond 1:1 with a-groups. This means that a mechanism must be added to allow users to select a path when retrieving a dataset.
Implementation steps
The minimal implementation in the backend would require:
- A way to grant the service user write access to the destination folder.
- At first this could be a fixed
retrieve
directory for each project like ra - Better would be a script that would set the appropriate permissions/acls on whatever directory the user specified. This could be incorporated into the datasetRetriever tool, and could validate some permissions at run time (e.g that the user has permission to read the dataset and permission to write to the destination folder to clean up).
- At first this could be a fixed
- Modify Job model in REST api to capture destination server and path
- Modify Arima to write to the correct server and path
Front-end changes:
- datasetRetriever modifications to set up the directory, validate settings, and pass the correct paths to SciCat
- New SciCat retrieval option with a field for the destination
- (Optional) File browser on SciCat to select the files. This would probably require a microservice running somewhere with access to all the central filesystems which would validate user permissions and return file lists.