Skip to content

Add support for viewing data flow maps #26

@anjackson

Description

@anjackson

As part of the work to understand how digital preservation gets done in real institutions, I'm trying to understand what the overall flow of information is in these cases. i.e. tracing the path of the bitstreams from system to system at the overall organisational level.

I've come up with an approach based on space-time diagramming combined with Metro map styling. It takes a sequence of data event information and visualises it in quite a nice way. Some more features are needed still:

  • More diagrams, including very simple OAIS patterns, UKWA etc. And a way to flick between institutions and workflows.
  • When you click on one, update a panel that gives a place for lots of lovely gnarly detail and links. Or possibly copy these popovers
  • Station names all have to be unique rather than just being labels. That should be changed.
  • Multiple source-target pairs at the some timestamp would be useful sometimes.
  • Allow places to include 'boundaries', and use the underling library to snake a 'river' through the diagram.
  • add 'domain' to places and switch the river there, maybe adding a "pseudo line" vertical to make that clear? Adding a domain adds a line along the side, adding a next domain adds a river, switches sides. So adding boundaries, in effect. Use community as bounds
  • The shiftCoords currently forced to be the same for both ends of a move/copy, which makes some alignments difficult. Need some way to let the start be the same as the parent, but then the final shift be different.
  • Start/end labels not working very well. End labels work fine if there's no start. Maybe use a different approach to laying out the locations on the 'in' side. (on the suggested domain bracket instead).
  • support a simple text markup (see below) and mapping it from YAML.
    • Allow space/time spacing to be changed in the workflow config.
    • Nice error if a target of e.g. a delete does not exist.
    • Fix it so renders still work if places/data etc. are not declared first.
    • markerAt /fraction of line/ is quite useful, use @N+0.5
    • markerShiftCoords could maybe be a reinterpretation of `@N+0.5I?
    • interchange is automatic, for 'transform' events only. Maybe @N+I as above?
  • Write up as somewhere between the anything-goes of CoW and the highly detailed https://educopia.org/research-project/ossarcflow/ noting relationships with https://en.m.wikipedia.org/wiki/Data-flow_diagram https://en.m.wikipedia.org/wiki/Swimlane https://epithumia.github.io/pyrailroad/
# Data types and descriptions:
data sip "Submission Information Package" color="#ff0000"

# Locations where data can be stored:
location producer "Producer"
location ingest "Ingest Storage"
location archive "Archival Storage"
location access "Access Storage"

# Domains where locations are maintained:
domain dc "Designated Community"
domain ar "The Archive"
domain man "Management"

# Then the sequence of events in this dataflow...

# We start by transferring a package from an external party:
start sip@producer.dc
move sip@producer.dc sip@ingest.ar "Transfer to the archive"
space

# We then prepare the item for ingest to the archival storage storage system:
derive sip@ingest aip@ingest "Generate AIP from SIP"
copy sip@ingest aip@archive "Copy to archival storage" 
# And delete the temporary files:
delete sip@ingest, aip@ingest
space

# When access is requested, we generate an access copy:
copy aip@archive aip@access "Retrieve the AIP"
derive aip@access dip@access "Generate the DIP"
copy dip@access dip@consumer.dc "Send the DIP"
delete aip@access, dip@access

# And we're done:
end

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

Status

In Progress

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions