Skip to content

Recommendation for storing and versioning AIPs without the use of BagIt #83

@shsdev

Description

@shsdev

For the versioning of AIPs the plan is to recommend the use of OCFL.

Assuming the following structure for an original submission information package example.sip.001.tar stored as version v00000 and an AIP urn+uuid+81bd3aa2-7350-44f6-ad54-d8181858605a.tar stored as version v00001:

├── 0=ocfl_object_1.0
├── inventory.json
├── inventory.json.sha512
├── v00000
│   └── example.sip.001.tar
└── v00001
    └── urn+uuid+81bd3aa2-7350-44f6-ad54-d8181858605a.tar

The inventory.json could look as follows:

{
    "digestAlgorithm": "sha512",
    "fixity": {
        "md5": {
            "f97f90b429a84bdd0bfb88b6d037b351": [
                "v00000/example.sip.001.tar"
            ],
            "ed7f48df08c5c1f134c02dcfd9ff6098": [
                "v00001/urn+uuid+81bd3aa2-7350-44f6-ad54-d8181858605a.tar"
            ]
        },
        "sha256": {
            "47782fd210d3933bda9045d923c4370b2632c39826096bfa86b2860d07397742": [
                "v00000/example.sip.001.tar"
            ],
            "ac1d70378c2be8cf818b490292b51ccabba55b2192004eda67a7822f60072612": [
                "v00001/urn+uuid+81bd3aa2-7350-44f6-ad54-d8181858605a.tar"
            ]
        }
    },
    "head": "v00001",
    "id": "urn:uuid:81bd3aa2-7350-44f6-ad54-d8181858605a",
    "manifest": {
        "c676d28b0d0a5aa345aea7995fbdf36b06981923af85d2234de5157c11173c032435fc3da4f513c717bba4bb912d0f9c7165750c46ed821bffdc22def79606c7": [
            "v00000/example.sip.001.tar"
        ],
        "1aef284e408d991ba6abf9973c4bcb02c1a2a94c951cd119cd040249878ac2d2ce790d78c0416327fd92efaab8f0536be1b0fb0a2f17a8cbe0069b72d16f7988": [
            "v00001/urn+uuid+81bd3aa2-7350-44f6-ad54-d8181858605a.tar"
        ]
    },
    "type": "https://ocfl.io/1.0/spec/#inventory",
    "versions": {
        "v00000": {
            "created": "2024-04-09T21:08:54Z",
            "message": "Original SIP",
            "state": {
                "c676d28b0d0a5aa345aea7995fbdf36b06981923af85d2234de5157c11173c032435fc3da4f513c717bba4bb912d0f9c7165750c46ed821bffdc22def79606c7": [
                    "v00000/example.sip.001.tar"
                ]
            }
        },
        "v00001": {
            "created": "2024-04-09T21:08:55Z",
            "message": "AIP (ingest)",
            "state": {
                "1aef284e408d991ba6abf9973c4bcb02c1a2a94c951cd119cd040249878ac2d2ce790d78c0416327fd92efaab8f0536be1b0fb0a2f17a8cbe0069b72d16f7988": [
                    "v00001/urn+uuid+81bd3aa2-7350-44f6-ad54-d8181858605a.tar"
                ]
            }
        }
    }
}

Note that there is an overlap of fixity information which is provided in the METS already.

The question for voting is if the container files example.sip.001.tar for the original SIP and urn+uuid+81bd3aa2-7350-44f6-ad54-d8181858605a.tar for the AIP should be wrapped in a bagit container, for example:

├── bag-info.txt
├── bagit.txt
├── data
│   ├── metadata
│   │   ├── descriptive
│   │   │   └── ead.xml
│   │   ├── metadata.json
│   │   └── preservation
│   │       └── premis.xml
│   ├── METS.xml
│   ├── processing.log
│   ├── representations
│   │   └── 1710641a-bfa1-48cc-b41f-4220606679ae
│   │       ├── data
│   │       │   └── example.pdf
│   │       ├── metadata
│   │       │   └── preservation
│   │       │       └── premis.xml
│   │       └── METS.xml
│   └── state.json
├── manifest-sha256.txt
├── manifest-sha512.txt
├── tagmanifest-sha256.txt
└── tagmanifest-sha512.txt

Note that this way fixity information would possibly be provided in up to four layers:

  • OCFL
  • BAGIT
  • METS
  • PREMIS

To reduce complexity and redundancy, the proposal is store the E-ARK information package as TAR files instead of wrapping them as bagit containers as shown in the example above.

The E-ARK AIP container file urn+uuid+81bd3aa2-7350-44f6-ad54-d8181858605a.tar would then have the following form, for example:

urn+uuid+81bd3aa2-7350-44f6-ad54-d8181858605a
├── metadata
│   ├── descriptive
│   │   └── ead.xml
│   ├── metadata.json
│   ├── other
│   │   ├── processing.log
│   │   └── state.json
│   └── preservation
│       ├── premis_202401094-230854Z_event_sipcreation.xml
│       └── premis_20240409-230854Z_event_ingest.xml
├── METS.xml
├── representations
│   └── 09502a26-f822-407c-ad0a-4d7e64052a91
│       ├── data
│       │   └── example.pdf
│       ├── metadata
│       │   └── preservation
│       │       └── premis.xml
│       └── METS.xml
└── schemas
    ├── csip.xsd
    ├── ead3.xsd
    ├── IP.xsd
    ├── mets_1_11.xsd
    ├── premis-v2-2.xsd
    └── xlink.xsd

The suggestion is:

As part of the general AIP recommendations, the proposal is to store the E-ARK information package as TAR files instead of wrapping them as bagit containers.

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

Status

In progress

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions