earth online

SAFE 2.x Basic Information

Overview

SAFE 2.x Basic Information

Introduction

The SAFE (Standard Archive Format for Europe) format has been developed to be used in an Open Archival Information System (OAIS) as an Archival Information Package (AIP) whose original target of preservation is the data associated to a specific Earth Observation space mission or instrument.

The Content Information of this AIP is made up by a set of Data Objects specified according to the Preserved Data Set Composition recommended by the LTDP Guidelines (i.e. primary/secondary data, metadata, and browse images if they are generated during the processing) and the associated Representation Information, needed to make the Content Data Objects understandable in the long term.

The following figure represents a conceptual description of a SAFE AIP:

SAFE AIP description

 

However, SAFE has not been designed to be used as a single physical AIP because this would have undesirable consequences in the archive:

  • EO Auxiliary data redundancy
  • Representation Information Redundancy
  • Lack of flexibility in case of Representation Information language updates
  • Collection Metadata redundancy

These undesirable consequences are addressed in SAFE by externalising several types of information from the AIP:

  • Externalisation of the EO Auxiliary data
  • Externalisation of the representation information
  • Externalisation of the EO Collection metadata

In line with the aforementioned, a logical SAFE AIP is actually made up by a set of physical SAFE Information Packages containing all the data needed to build the complete (logical) SAFE AIP.

The next figure provides a conceptual breakdown of the different types of SAFE package:

Logical SAFE AIP

XFDU conformance and SAFE conformance classes

SAFE packages have been designed to be instances of XFDU where the data stored in a package are linked to their associated metadata expressed according to a well-controlled semantics used to interpret the target of preservation in the long term.

The files are bundled into a single container called "Package Interchange File" along with a file called "manifest". The XFDU manifest describes the relations among the data and metadata files included in the package.

The XFDU standard is fully compliant with the OAIS Reference Model and provides an abstract mechanism for wrapping data and metadata into a single "Package Interchange File" to facilitate archiving. However, XFDU is a generic system that needs to be restricted for use in some environments requiring a more specific control of the package content. In that sense, SAFE restricts the generic areas of XFDU according to the specific needs of Earth Observation and provides semantics for the Earth Observation (EO) domain in order to improve the interoperability between the ground segment facilities.

Although the main objective of the SAFE format is the long term preservation of the Content Information, it can be used as well in certain environments where long term preservation is not a priority (operational uses), or in those cases where the Representation Information of the products to preserve cannot be described by using the methods specified for SAFE (i.e. XML/DFDL schemas).

This is the case of EO Auxiliary files or EO Products in a self-describing encoding format like HDF or netCDF in which the length of some fields may vary among files of the same type. It can also be the case of those file formats using a compression method that does not fulfil the conditions for preservation settled in SAFE for compressed formats (i.e. the decompression algorithm must be well-known or preserved as part of a SAFE package and the decompressed EO Product can be represented using the mechanisms specified for SAFE).

In order to accommodate these cases in SAFE, two conformance classes have been specified for the generation of SAFE packages:

  • LTDP Conformance Class: Fully compliant with OAIS-RM and fully verifying the conditions for data preservation. To be used for the generation of SAFE packages where long-term data preservation needs to be ensured to the maximum extent possible with SAFE. This class specifically requires the Representation Information to be provided for all Data Objects.
     
  • Operational Conformance Class: Some of the OAIS conditions for preservation are relaxed for this class. To be used for the generation of SAFE packages in those environments that want to use SAFE but where long-term data preservation is not the ultimate goal and/or cannot be properly ensured due to limitations in the ability to generate adequate Representation Information. This class specifically assumes that the Representation Information may be omitted for some or all Data Objects.

Representation Information

According to OAIS-RM definitions, the Representation Information maps a Data Object into more meaningful concepts and has to be preserved to make the Content Data Object understandable to the Designated Community in the long term.

As previously pointed out, the Representation Information is externalised in SAFE. This means that the formal description of a Data Object is stored in a dedicated SAFE package different from the one containing the described Data Object. Consequently, each SAFE package includes external references (in the "manifest" file) to the SAFE packages containing the Representation Information applicable to the enclosed Data Object. The Representation Information of a Data Object is specified at bit level by another Data Object included in a separate SAFE package.

Representation Information

 

A consequence of this is that Representation Information itself is expressed in a recursive way because it is made up by its own data (Representation Information for other data) and other Representation Information. To break this potentially endless loop, a minimum set of information ("root element") has to be considered as part of a common Knowledge Base for which it is not necessary to provide Representation Information (because it is assumed that such information is well-known by the designated community and therefore its preservation is ensured for the long term by definition). The whole set of Representation Information objects up to that minimum set of information is considered the Representation Information Network.

SAFE is an XFDU instance and therefore relies on the XML standard. It is assumed that this standard is broadly known by the designated community, and thus the W3C XML standard is adopted as one of the components of the minimum set of information to consider as part of the SAFE Knowledge Base described in the previous paragraph.

However, the W3C XML standard and particularly the XML Schema standard are limited when it comes to representing binary and, more generally, non-XML data. To address this limitation, SAFE adopts the Data Format Description Language (DFDL) as the standard language used for the description of the Representation Information of non-XML files. DFDL is a standard modelling language supported by the Open Grid Forum (OGF), that supplements the syntax of an XML Schema file to provide enough descriptive information about binary and non-XML files. DFDL has been designed for describing general text and binary data, allowing the representation of arbitrary data structures and enabling multiple programs to interchange data directly, benefitting from a common description that can be used to interpret the data.

Representation Information

 

This language has been adopted for SAFE because it is an open source standard with free publicly accessible documentation and it is capable of providing Data Object structure descriptions up to the bit level. The DFDL language is based on XML W3C Schemas, which ensures that the whole Representation Information Network defined for SAFE is preserved in the long term.

Preservation Description Information

OAIS describes the Preservation Description Information (PDI) as the information that is necessary for adequate preservation of the Content Information. The SAFE manifest provides the predefined metadata categories and classifications via enumerated attributes that follow the OAIS information model. OAIS subdivides Preservation Description Information into Provenance, Reference and Fixity information.

SAFE redefines XFDU to ensure that the Provenance information is always available in a SAFE package. Provenance Information is provided in the Manifest of SAFE EO Product and EO Auxiliary packages to convey the origin or source of the Content Information, any changes that may have taken place since it was originated, and who has had its custody since it was originated.

For what concerns Fixity information, SAFE manifest files provides a "checksum" complex type that can be used to convey information on the data integrity of a Data Object within a SAFE package. The authenticity mechanisms used to compute the checksum value can also be specified.

SAFE Package Organisation and contents

The SAFE design encompasses six different types of Package, as detailed in the figure below. Gray packages are Representation Information Packages. The figure also summarises the content expected for each type of package:

SAFE package organisation


 

Tweet