BCAUL pilot project - Qubit export EAD

From AtoM wiki
Revision as of 13:31, 29 July 2015 by Dan (talk | contribs) (fix page title)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Main Page > Development > Development/Projects > Development/Projects/BCAUL Pilot > Development/Projects/BCAUL Pilot/Metadata mapping > EAD

Note

This is historical development documentation, migrated from the now-defunct Artefactual wiki. The content was first added there on September 26, 2008, and last updated on October 30, 2008. For more information, see the landing page for this development project: BCAUL Pilot Project. The content was moved to the AtoM wiki on July 29, 2015.


Notes

  • This table identifies the EAD elements output by Qubit's EAD export, mapping EAD tags to Qubit fields and methods.
  • In the Output column, red text indicates database content (table_name::field_name).
  • Qubit generally adheres to the Research Libraries Group's RLG Best Practice Guidelines for Encoded Archival Description (August 2002). Exceptions are indicated below in the Notes column.
  • When exporting EAD, Qubit sets the relatedencoding attribute of <eadheader> to "MARC21" and <archdes> to "ISAD(G)".
  • When exporting EAD, Qubit does not use the <frontmatter> element; when importing, Qubit ignores any data wrapped in <frontmatter> tags.

Tags

EAD header tags

<eadheader> | <eadid> | <filedesc> | <titlestmt> | <titleproper> | <author> | <editionstmt> | <edition> | <publicationstmt> | <publisher> | <date> | <address> | <seriesstmt> | <notestmt> | <profiledesc> | <creation> | <langusage> | <desrules> | <revisiondesc>

ARCHDES tags

<archdes>

<did> tags: <did> | <origination> | <unittitle> | <unitdate> | <physdesc> | <phystech> | <abstract> | <physloc> | <originalsloc> | <repository> | <unitid> | <langmaterial> | <materialspec>

Other tags: <bioghist> | <scopecontent> | <arrangement> | <controlaccess> | <accessrestrict> | <accruals> | <acqinfo> | <altformavail> | <appraisal> | <custodhist> | <prefercite> | <processinfo> | <userestrict> | <relatedmaterial> | <separatedmaterial> | <otherfindaid> | <bibliography> | <odd>

Lower level tags: <dsc> | <c> | <daogrp> | <daoloc> | <daodesc>

Mapping

EAD header tags

EAD element Output Repeatable? Notes

<eadheader>   No Do not include relatedencoding attribute here.

Use different relatedencoding values for <eadheader> (MARC21) and <archdesc> (ISAD(G)).

  langencoding= "iso639-1" No Qubit uses language codes from Symfony framework.
  • ISO 639-1 = 2-letter alpha codes.
  • ISO 639-2 = 3-letter alpha codes.
  • Symfony appears to use 639-1 codes where possible and 639-2 codes only when no two-letter code is available.
  • Qubit therefore uses 639-2 in some cases, 639-1 in others.
  scriptencoding= "iso15924" No Qubit uses ISO-compliant 4-letter alpha codes for scripts from Symfony framework.
  relatedencoding= "MARC21" No Data in the <eadheader> section primarily relates to finding aid as a quasi-published work.
  • Therefore map to MARC21 as bibliographic standard.
  respositoryencoding= "iso15511" No Alpha-numeric code (maximum 16 characters) uniquely identifying any library or relating institution in the world.
  • Typically formed by 2-letter country code of authority that issues identifier, followed by dash, followed by identifier.
  • Qubit forms by concatenating the repository's Country code and Identifier.
  countryencoding= "iso3166-1" No Qubit uses ISO-compliant 2-letter alpha codes for countries from Symfony framework.
  dateencoding= "iso8601" No Qubit normalizes dates without dashes, e.g. September 29 2008 = 20080929.

  <eadid> information_object::identifier [get_current_date_timestamp] No Identifies document as unique instance of an EAD document.
  • Concatenate database id number with time-stamp.
  • Database id points to description.
  • Time-stamp differentiates different EAD outputs of same description.
    countrycode= contact_information::country_code No Get Country code value from primary contact of repository.

Assumes repository is the same as the agency responsible for maintaining description (not always the case).

  • In future iterations, should link to information_object::institution_responsible_identifier.
    mainagencycode= repository::identifier No Assumes repository is the same as the agency responsible for maintaining description (not always the case).
  • In future iterations, should link to information_object::institution_responsible_identifier.
    encodinganalog= "865$u" No MARC21 856$u = Electronic location and access / Uniform Resource Identifier'
    url= [get_url_of_server] + "/information/show/id/information_object::id" No Qubit permanent url.

RLG Guidelines mandate using at least one of publicid, identifier, or url attributes.

  <filedesc>   No  

    <titlestmt>   No  

      <titleproper> "Finding aid: " information_object::title No "Finding aid" tag needs to be translable.

In future iterations, administrator should have interface to set default titles.

        encodinganalog= "245$a" No MARC21 245$a = Title statement
      <author> information_object::revision_history No In future iteration, Finding aid may be separate Qubit object with own metadata.
        encodinganalog= "245$c" No MARC21 245$c = Title statement / Statement of responsibility
      <sponsor>   No Not currently supported by Qubit; data may be captured in future iteration.
    <editionstmt>   No  
      <edition> information_object::edition No  
        encodinganalog= "250$a" No MARC21 250$a = Edition statement

    <publicationstmt>   No  

      <publisher> actor::authorized_form_of_name No Get name of repository from actor table.

Assumes repository is the same as the agency responsible for publishing description (not always the case).

  • In future iterations, should link to information_object::institution_responsible_identifier.
        encodinganalog= "260$b" No MARC 21 260$b = Publication, distribution, etc / Name of publisher, distributor, etc.

      <date> [Get current date] No Use current date as date of publication, formatted as text "MonthName Day, Year".
  • Dates of revision registered in <revision> tag.
        encodinganalog= "260$c" No MARC21 260$c = Publication, distribution, etc / Date of publication, distribution, etc
        normal= [get_current_date] No Normalize current date as "YYYYMMDD"

      <address>   No Get address info from repository's primary contact record.
  • In future iterations, should link to information_object::institution_responsible_identifier.
        encodinganalog= "260$a" No MARC21 260$a = Publication, distribution, etc / Place of publication, distribution, etc.

        <addressline>   <addressline>actor::authorized_form_of_name</addressline>

  <addressline>contact_information::street</addressline>

  <addressline>contact_information::city contact_information::region contact_information::country_code contact_information::postal_code</addressline>

  <addressline>Telephone: contact_information::telephone</addressline>

  <addressline>Fax: contact_information::fax</addressline>

  <addressline>Email: contact_information::email</addressline>

  <addressline>URL: contact_information::url</addressline>

No Return info in separate <addressline> tags.

    <seriesstmt>   No Not currently supported. In future iterations, Qubit may include a separate finding_aid object with its own metadata, which may includes a field for information relating to the published monographic series to which the finding aid belongs.

    <notestmt>   Yes Not currently supported. In future iterations, Qubit may include a separate finding_aid object with its own metadata, which may includes notes relating to its publication.

  <profiledesc>   No  

    <creation> "EAD finding aid output from ICA-AtoM by " [get_user_name] " on " <date normal="YYYYMMDD">[get_current_date]</date> No Indicates that EAD was machine-generated rather than manually coded and uses current date and user name.
      encodinganalog= "500" No MARC21 500 = General note

    <langusage>   No  

      <language> property::name="language_of_information_object_description" value=" " Yes Get values from related records in property table.

Code each language in separate <language></language> tags.

        encodinganalog= "41" No MARC21 41 = Language code
        langcode= property::name="language_of_information_object_description" value=" " No  
        scriptcode= property::name="script_of_information_object_description" value=" " No  

    <desrules> information_object::rules No Get value from text field. There is no MARC encoding analog.

  <revisiondesc>   No Not currently supported.
  • Qubit stores information relating to revisions in a single text field, Revision history, but this cannot be easily normalized into separate <change><item> entries as required by EAD (one <item></item> tag for each revision).

Future iteration:

  • Get data for these tags from Qubit versioning module.

ARCHDES tags

EAD element Output Repeatable? Notes

<archdes>   No  

  level= term::name No Get term::name (name of Level of description) via information_object::level_of_description_id

  relatedencoding= "ISAD(G) 2nd edition, 2000" No Map <archdesc> tags to ISAD(G) elements, as ISAD(G) is the standard for archival description on which ICA-AtoM is built.

Problem: what if ICA-AtoM takes in a description originally based on other standard (e.g. RAD, DACS), but now exports EAD as if ISAD(G) were the original source standard?

Future iteration:

  • Take value from information_object::rules field.
  • Use disfferent encodinganalog values through <archdesc> depending on the standard.
  • Assumes that values in rules are controlled through taxonomy.
DID tags

EAD element Output Repeatable? Notes

  <did>   No  

    <origination>   Yes Get <origination> values from event table (creation events).
      <corpname>

      <famname>

      <persname>

actor::authorized_form_of_name Yes Get actor name via related creation event.

Use <corpname>, <famname> or <persname> as appropriate (from actor::entity_type_id).

Where multiple creators are registered, return each in its own <origination> tags.

        encodinganalog= "3.2.1" No ISAD(G) 3.2.1 = Name of creator(s)
        role= term::name No Get term::name via creation event::actor_role_id

    <unittitle> information_object::title

information_object::alternate_title

Yes Return Title and Alternate title in separate <unittitle> tags.
      encodinganalog= "3.1.2" No ISAD(G) 3.1.2 = Title
      type= "alternate" No Use only for Alternate title values.

    <unitdate> event::date_display Yes Get date information from related events.

Return multiple dates each in its own <unitdate></unitdate> tags.

      type=   No Not currently supported. While users can enter either inclusive or bulk dates in Date display field, there is no way to easily extract the Type from the data.
      normal= event::start_date

event::end_date

No Normalize as YYYYMMDD/YYYYMMDD
      encodinganalog= "3.1.3" No ISAD(G) 3.1.3 = Date(s)
      datechar= term::name No Get term::name via event::type_id

    <physdesc>   No  

      <extent> information_object::extent_and_medium No While EAD can accommodate multiple <extent> tags, Qubit stores all extent-related data in one field as a single string, can't easily normalize into multiple extent statements.
        encodinganalog= "3.1.5" No ISAD(G) 3.1.5 = Extent and medium.

    <phystech> information_object::physical_characteristics No  
      encodinganalog= "3.4.4" No ISAD(G) 3.4.4 = Physical characteristics.

    <abstract>   No Not currently supported.

Future iterations:

  • May include a field for brief summary of description.

    <container> physical_object::name No Get from related physical_object record.

Problem: EAD distinguishes <container> (storage device, e.g. cartons, boxes, reels, folders) and <physloc> (place where storage devices are located - building, room, stack, shelf).

  • Qubit doesn't distinguish, treats all physical locations as containers within containers within containers.

    <physloc> physical_object::location No Get info from and only include if there is a related physical_object record.

Include audience attribute = "internal" to make non-public?

    <originalsloc> information_object::location_of_originals No  
      encodinganalog= "3.5.1" No ISAD(G) 3.5.1 = Existence and location of originals.

    <repository>   No Get repository's data from related actor and repository records via repository_id value.

No ISAD(G) analog for <repository> element or its sub-elements.

      <corpname> actor::authorized_form_of_name No  

      <address>   No Get address info from repository's primary contact record.
  • In future iterations, should link to information_object::institution_responsible_identifier.

        <addressline>   <addressline>actor::authorized_form_of_name</addressline>

  <addressline>contact_information::street</addressline>

  <addressline>contact_information::city contact_information::region contact_information::country_code contact_information::postal_code</addressline>

  <addressline>Telephone: contact_information::telephone</addressline>

  <addressline>Fax: contact_information::fax</addressline>

  <addressline>Email: contact_information::email</addressline>

  <addressline>URL: contact_information::url</addressline>

No Return info in separate <addressline> tags.

    <unitid> information_object::identifier No  
      countrycode= contact_information::country_code No Get repository's country code from primary contact's related contact_information record.
      repositorycode= repository::identifier   Get repository code from related repository record.
      encodinganalog= "3.1.1"   ISAD(G) 3.1.1 = Reference code.

    <langmaterial>   No  
      encodinganalog= "3.4.3" No ISAD(G) 3.4.3 = Language / scripts of material.

      <language> property::information_object_language Yes Qubit stores code of language; transform to full string.
        langcode= property::information_object_language No  
        scriptcode= property::information_object_script No Problem: how to connect scripts and languages?
  • Qubit stores as unrelated properties.

    <materialspec>     Not currently supported by Qubit. But note that RAD version requires this EAD element for Class of material specific details (handled as properties).
Other tags

EAD element Output Repeatable? Notes

  <bioghist> actor::history Yes Can include multiple admin / bio histories (any actor registered as creator in creation event).
    encodinganalog= "3.2.2" No ISAD(G) 3.2.2 = Administrative / biographical history.

  <scopecontent> information_object::scope_and_content No  
    encodinganalog= "3.3.1" No ISAD(G) 3.3.1 = Scope and content.

  <arrangement> information_object::arrangement No  
    encodinganalog= "3.3.4" No ISAD(G) 3.3.4 = System of arrangement.

  <controlaccess>   No Return access points.
    <corpname> actor::authorized_form_of_name Yes Get actor via event.
    <persname> actor::authorized_form_of_name Yes Get actor via event.
    <famname> actor::authorized_form_of_name Yes Get actor via event.
    <geogname> term::name Yes Get term via object_term_relation.
    <subject> term::name Yes Get term via object_term_relation.
    <genreform>   Yes Not currently supported.
    <occupation>   Yes Not currently supported.
    <function>   Yes Not currently supported.
    <title>   Yes Not currently supported.

  <accessrestrict> information_object::access_condition No  
    encodinganalog= "3.4.1" No ISAD(G) 3.4.1 = Conditions governing access.

  <accruals> information_object::accruals No  
    encodinganalog= "3.3.3" No ISAD(G) 3.3.3 = Accruals.

  <acqinfo> information_object::acquisition No  
    encodinganalog= "3.2.4" No ISAD(G) 3.2.4 = Immediate source of acquisition or transfer.

  <altformavail> information_object::location_of_copies No  
    encodinganalog= "3.5.2" No ISAD(G) 3.5.2 = Existence and location of copies.

  <appraisal> information_object::appraisal No  
    encodinganalog= "3.3.2" No ISAD(G) 3.3.2 = Appraisal, destruction and scheduling information.

  <custodhist> information_object::archival_history No  
    encodinganalog= "3.2.3" No ISAD(G) 3.2.3 = Archival history.

  <prefercite>     Qubit does not currently support this element.

  <processinfo> information_object::revision_history No  
    encodinganalog= "3.7.3"   ISAD(G) 3.7.3 = Date(s) fo description.

  <userestrict> information_object::reproduction_conditions No  
    encodinganalog= "3.4.2" No ISAD(G) 3.4.2 = Conditions governing reproduction.

  <relatedmaterial> information_object::related_units_of_description No Neither ISAD(G) nor Qubit makes distinction available in EAD between <relatedmaterial> (may be of use to researcher, but not related by provenance, accumulation or use) and <separatedmaterial> (related by provenance but physically dispersed, eg. to different repositories).
  • Need to choose between EAD elements, but will results sometimes in poor EAD data.
  • E.g. Qubit Related units of description field will be output as <relatedmaterial> but may in fact contain only data that is properly <separatedmaterial>.
    encodinganalog= "3.5.3" No ISAD(G) 3.5.3 = Related units of description.

  <separatedmaterial>   No Not currently supported by Qubit.
  • Qubit does not distinguish between "related" and "separated" material (all contained in single Related units of description field).

  <otherfindaid> information_object::finding_aids No  
    encodinganalog= "3.4.5" No ISAD(G) 3.4.5 = Finding aids.

  <bibliography> information_object::publication_note No  
    encodinganalog= "3.5.4" No ISAD(G) 3.5.4 = Publication note.

  <odd> note::content Yes  
    encodinganalog= "3.6.1" No ISAD(G) 3.6.1 = Notes.
Lower level tags

EAD element Output Repeatable? Notes

  <dsc>   No  
    type= "combined" No "combined" indicates that a given series is always followed immediately by listing of its lower-level contents (any sub-series, files, items).

    <c01>

    <c02>

    <c03> ...

[Return <c01>, <c02> etc according to level of description] No Use numbered components <c01> rather than unnumbered <c>.

The number should equal the number of parents above the description in the hierarchy.

      level= term::name No Get name of level of description from term table via information_object::level_of_description_id field.

Note that relation between component number and level attribute not constant, but depends on the number of levels in the hierarchy of description.

  • E.g. an item registered directly to a series = <c02 level="item">.
  • E.g. a file registered to a sub-series included in a sub-fonds within a fonds = <c04 level="file">.

      <daogrp>   No Get related digital objects in separate <daoloc></daoloc> tags.
  • Return all three digital objects (master, reference, thumbnail) or just one?

      <daoloc>   Yes  
        role= digital_object::mime_type No  
        label=   No Returns "master", "reference" or "thumbnail".
        href=   No  

        <daodesc>   No Not currently supported by Qubit.
  • Provides description of individual digital object (child of <daoloc>) or group (child of <daogrp>).