Difference between revisions of "Development/Archivematica integration"

From AtoM wiki
(Fix breadcrumb)
(Start of an atom -> archivematica syncronisation spec)
Line 3: Line 3:
  
  
 +
Access to Memory (AtoM) and [https://www.archivematica.org Archivematica] have a strong connection - they’re developed by the same vendor and Archivematica uses AtoM as its display frontend. Despite that communication is not currently bidirectional and when an AtoM site has data loaded there is currently no mechanism to synchronise the two platforms.
  
 +
This page attempts to summarise any discoveries from [https://groups.google.com/forum/#!topic/ica-atom-users/TEkIGSUTEvo/overview the discussion started on AtoM-users] and will, in time, start to form a description of what integration might look like.
  
 +
There is also a similar project for [https://github.com/eprintsug/EPrintsArchivematica integrating EPrints with Archivematica] who’s standards document includes useful information including layout and other legwork required for an export like this.
  
  
 +
== Integration should cover ==
 +
* An AtoM instance that contains records to which Archivematica is added as a backend
 +
* An AtoM instance which is installed parallel to Archivematica but which is uploaded to directly
 +
 +
This will mean a variety of factors need to be considered (and questions asked) including
 +
* Data uploaded to AtoM should be exportable to Archivematica for ingest and when the processed data is returned to AtoM no duplicate entries should be recorded.
 +
* Optional support for Metadata only updates
 +
* Should thumbnails generated by AtoM be exported and preserved?
 +
* Does AtoM store revision history? Should we Archive revisions if it does?
 +
* How will metadata only updates be handled
 +
* What to do with external digital objects. ensure they are skipped with warning by export process? Export the thumbnails/metadata? Detection code is available [https://github.com/artefactual/atom/blob/stable/2.4.x/lib/model/QubitDigitalObject.php#L1569 in the repository]
 +
 +
== Options for integration ==
 +
 +
=== Option one: Data exported to directory ===
 +
 +
Export required data+metadata into a single directory which Archivematica imports ; AtoM then imports the export package (DIP) created by Archivematica
 +
 +
==== Export stage ====
 +
Data which we can export (if Archivematica [https://www.archivematica.org/en/docs/archivematica-1.8/user-manual/transfer/transfer/#the-transfer-tab transfer] can use it) which would help populate fields in Archivematica and can then be used to re-match with AtoM after processing.
 +
 +
* Transfer type (File type of item)
 +
* Transfer name (current items name or description)
 +
* Accession number (Accession number of item being exported)
 +
* Access system ID (AtoM install performing export)
 +
* Approve automatically (make this optional ; some may want it and some not)
 +
 +
[https://www.archivematica.org/en/docs/archivematica-1.8/user-manual/transfer/transfer/#preparing-digital-objects-for-transfer Export structure is important], as with [https://github.com/eprintsug/EPrintsArchivematica EPrints with Archivematica integration] I suggest this project supplies various bits of data (metadata/ , checksums, documentation, etc) as described on [https://www.archivematica.org/en/docs/archivematica-1.8/user-manual/transfer/transfer/#preparing-digital-objects-for-transfer Archivematicas transfer page].
 +
 +
Working draft of filesystem layout
 +
* object_slug-unique_identifer directory (object slug is the slug stored in DB, unique identifier is however AtoM identifies the object internally)
 +
** Metadata directory, files in here are formatted per [https://www.archivematica.org/en/docs/archivematica-1.8/user-manual/transfer/import-metadata/#import-metadata Archivematica metadata import documentation] and should be populated with data from AtoM
 +
*** metadata.csv (mandatory)
 +
*** rights.csv (mandatory if rights are specified)
 +
*** checksum.md5, checksum.sha1, or checksum.sha256 TBC: check which format/s AtoM uses and restrict output to that/those format/s
 +
*** All other available metadata: optional, populate if supplied by AtoM and configured to export
 +
** Objects directory, where actual files will go
 +
*** Structure TODO
 +
 +
AtoM uses [https://github.com/artefactual/atom/blob/stable/2.4.x/lib/model/QubitDigitalObject.php#L1796 getAssetPath] to access its objects so we can lean on that to find files we want
 +
 +
 +
==== Reimport ====
 +
Import is currently managed by AtoM already, but will need to be augmented to deal with new issues.
 +
 +
* AtoM will have the original format files and the normalised format following the conversion. A way to remove the original files will need to be included
 +
* AtoM will need to ensure it doesn’t enter a loop of: export files -> import from AM -> update files with those from AM -> trigger export of newly imported files.
 +
* Duplicates in AtoM. Archivematica's DIP upload creates new stub information objects when it deposits the DIP objects - it won't replace existing objects. How manageable this problem is will depend on how much identifying metadata we can feed through Archivematica
 +
 +
 +
 +
=== Option two: Data exported to SIP/bag ===
 +
 +
Export packages (SIPs, bagit, or similar) which can be imported to Archivematica; AtoM then imports the export package (DIP) created by Archivematica.
 +
 +
This is the same as option one, but includes a packaging step at the end.
 +
 +
 +
=== Option three: Bidirectional communication ===
 +
 +
Atom and Archivematica currently use SWORD1 (a unidirectional protocol) to push data from Archivematica to AtoM. Under this option they would use an existing protocol with bi directional support (eg SWORD2 or the work in progress SWORD3, see  http://swordapp.org/).
 +
 +
* Both applications need changes in this instance
 +
** Archivematica doesn't support plugins so its not as easy as a plugin at each end to add support.
 +
* TBD: what would they communicate and how - would a SIP/bag be sent over SWORD or would individual files and associated data be sent and Archivematica packages it up?
 +
 +
 +
 +
== See also ==
 +
 +
* Archivematica's [https://www.archivematica.org/en/docs/archivematica-1.8/getting-started/overview/technical/#format-policies Format Policy] Registry (currently in the process of becoming its own project to be used by multiple digital preservation systems; now known as the Preservation Action Registry) to manage derivatives creation in a more intelligent way.
 +
 +
* TODO describe https://mediaarea.net/Events/2018-10-26_NoTimeToWait3/presentations/07.%20Martin%20Wrigley%20-%20Preservation%20Action%20Registry/OPF%20PAR%20Presentation%20notime%20to%20wait%20event.pdf
  
  

Revision as of 16:08, 28 November 2018

Development > Development/Archivematica_integration

Access to Memory (AtoM) and Archivematica have a strong connection - they’re developed by the same vendor and Archivematica uses AtoM as its display frontend. Despite that communication is not currently bidirectional and when an AtoM site has data loaded there is currently no mechanism to synchronise the two platforms.

This page attempts to summarise any discoveries from the discussion started on AtoM-users and will, in time, start to form a description of what integration might look like.

There is also a similar project for integrating EPrints with Archivematica who’s standards document includes useful information including layout and other legwork required for an export like this.

Integration should cover

  • An AtoM instance that contains records to which Archivematica is added as a backend
  • An AtoM instance which is installed parallel to Archivematica but which is uploaded to directly

This will mean a variety of factors need to be considered (and questions asked) including

  • Data uploaded to AtoM should be exportable to Archivematica for ingest and when the processed data is returned to AtoM no duplicate entries should be recorded.
  • Optional support for Metadata only updates
  • Should thumbnails generated by AtoM be exported and preserved?
  • Does AtoM store revision history? Should we Archive revisions if it does?
  • How will metadata only updates be handled
  • What to do with external digital objects. ensure they are skipped with warning by export process? Export the thumbnails/metadata? Detection code is available in the repository

Options for integration

Option one: Data exported to directory

Export required data+metadata into a single directory which Archivematica imports ; AtoM then imports the export package (DIP) created by Archivematica

Export stage

Data which we can export (if Archivematica transfer can use it) which would help populate fields in Archivematica and can then be used to re-match with AtoM after processing.

  • Transfer type (File type of item)
  • Transfer name (current items name or description)
  • Accession number (Accession number of item being exported)
  • Access system ID (AtoM install performing export)
  • Approve automatically (make this optional ; some may want it and some not)

Export structure is important, as with EPrints with Archivematica integration I suggest this project supplies various bits of data (metadata/ , checksums, documentation, etc) as described on Archivematicas transfer page.

Working draft of filesystem layout

  • object_slug-unique_identifer directory (object slug is the slug stored in DB, unique identifier is however AtoM identifies the object internally)
    • Metadata directory, files in here are formatted per Archivematica metadata import documentation and should be populated with data from AtoM
      • metadata.csv (mandatory)
      • rights.csv (mandatory if rights are specified)
      • checksum.md5, checksum.sha1, or checksum.sha256 TBC: check which format/s AtoM uses and restrict output to that/those format/s
      • All other available metadata: optional, populate if supplied by AtoM and configured to export
    • Objects directory, where actual files will go
      • Structure TODO

AtoM uses getAssetPath to access its objects so we can lean on that to find files we want

Reimport

Import is currently managed by AtoM already, but will need to be augmented to deal with new issues.

  • AtoM will have the original format files and the normalised format following the conversion. A way to remove the original files will need to be included
  • AtoM will need to ensure it doesn’t enter a loop of: export files -> import from AM -> update files with those from AM -> trigger export of newly imported files.
  • Duplicates in AtoM. Archivematica's DIP upload creates new stub information objects when it deposits the DIP objects - it won't replace existing objects. How manageable this problem is will depend on how much identifying metadata we can feed through Archivematica

Option two: Data exported to SIP/bag

Export packages (SIPs, bagit, or similar) which can be imported to Archivematica; AtoM then imports the export package (DIP) created by Archivematica.

This is the same as option one, but includes a packaging step at the end.

Option three: Bidirectional communication

Atom and Archivematica currently use SWORD1 (a unidirectional protocol) to push data from Archivematica to AtoM. Under this option they would use an existing protocol with bi directional support (eg SWORD2 or the work in progress SWORD3, see http://swordapp.org/).

  • Both applications need changes in this instance
    • Archivematica doesn't support plugins so its not as easy as a plugin at each end to add support.
  • TBD: what would they communicate and how - would a SIP/bag be sent over SWORD or would individual files and associated data be sent and Archivematica packages it up?

See also

  • Archivematica's Format Policy Registry (currently in the process of becoming its own project to be used by multiple digital preservation systems; now known as the Preservation Action Registry) to manage derivatives creation in a more intelligent way.