CSW at a Glance

The purpose of this document is to provide a quick overview of the OGC metadata Catalogue Service for the participants in the GeoNode project to easily digest the concepts and get us all in the same page wrt this quite complex spec.

What is it

CSW stands for Catalogue Service for the Web and is an OGC service specification just as WMS, WFS, etc. (The acronym was obviously chosen because WCS was already taken for Web Coverage Service, otherwise OGC could have named it Web Catalogue Service).

The CSW spec, at its current version, 2.0.2, is split out in various documents, according to its complexity :(

The spec itself, formally OpenGIS® Catalogue Services Specification, defines the DCP (Distributed Computing Platform) interfaces, and the operations to publish, query and operate upon a catalogue of metadata for geospatial data and service instances.

So, as long as the spec defines a client-server architecture for geospatial metadata handling, it is pretty abstract in order to allow for more specific definitions of message structures based on combinations of metadata schemas and protocol bindings.

The protocol bindings could be HTTP, CORBA, or Z39.50; and the information models either Dublin Core, ISO 19115/19139, or ebRIM (an adaptation of ISO19115 to  EbXML). Moreover, specific Application Profiles could be defined on either of the information models, for particular domains. Finally, the information models are abstract too, in the case of ISO 19115 it's quite extensive and convoluted, and it's XML realization, ISO 19139, defines the XML schema for it.

With that in mind, we're going to focus on what the most extended combination is, and the one better supported by GeoNetwork: namely an HTTP Catalogue Service for ISO 19115 Metadata encoded according to the ISO 19139 schema.

The granularity of the ISO19115 information model permits a CSW instance to manage and publish metadata Records for:

  • A Dataset Collection: a collection of datasets and datasets collectioins; such as a series of dataset sharing the same produc specification
  • A Dataset: an identifiable collection of data; such as a digital set of Features or a hardcopy map
  • A FeatureType
  • A Feature instance
  • An Attribute Type

(may be more, no ISO 19115 copy at hand)

How does it work

The abstract spec defines a set of operations that is not mirrored 1-1 in the ISO Metadata Application Profile. So we're going to focus on these concrete set of operations for the typical OWS like service we're interested in.

These operations are:

  • GetCapabilities: Mandatory operation, usual purpose;
  • GetRecords: Mandatory operation, perform a constrained search over the metadata records and return either the matching record count or record values according, as specified in the request. As for the record values, the request may also specify a level of detail for the returned records, which can be one of full, brief, or summary;
  • GetRecordById: Mandatory operation, retrieves a representation (per default the default representation) of one or more specific catalogue records using their identifier(s) (which is mapped to the fileIdentifier attribute of an ISO19139 document);
  • GetDomain: Optional operation, used to obtain runtime information about the range of values of a metadata record element or request parameter. This type of runtime information is useful for generating user interfaces with meaningful pick lists or for generating query predicates that have a higher chance of actually identifying a result set.
  • DescribeRecord: discover elements of the information model supported by the target catalogue service. Aka, return the XML schema for a given record;
  • Transaction: usual Insert, Delete, Update operations in a transactional OWS;
  • Harvest: aggregate the catalogue's metadata records with the ones from an external resource, either another CSW instance, a set of files in the file system, the capabilities documents from other OWS instances, etc.

Of interest to the project are the GetRecords, GetRecordById, and Transaction operations. Transaction is available only through HTTP POST method, just as in WFS. The other two can be performed either through GET or POST methods, with CGI like or XML syntaxes respectively.

GetRecords

The GetRecords operation is pretty powerful. It supports not only requesting metadata records at different levels of detail (full, summary, and brief) and in different output schemas, but also allows for paging and sorting, as well as stating the filter criteria in different query languages. The two official query languages are Filter - as in OGC Filter specification 1.1 -, and CQL - acronym for OGC's Common Query Language-. For supporting both query languages, GeoNetwork relies on GeoTools (for parsing and encoding purposes).

Both languages offer pretty much the same power. Usually a predicate is composed of an attribute based filter with a logical, spatial or comparison operand. The attribute in a predicate shall be encoded according to the language rules, and attributes are often nested (according to the record's xml schema). If using Filter 1.1 as the query language, a nested attribute is addressed as an XPath location path (eg, gmd:MD_Metadata/gmd:contact/gmd:CI_ResponsibleParty/gmd:individualName. If using CQL the attribute is addressed using dots as separator: gmd:MD_Metadata.gmd:contact.gmd:CI_ResponsibleParty.gmd:individualName

But being this a metadata search, and a good bit of the most useful information in a metadata record being plain text, it makes sense for a "search them all" functionality to exist. To make a full text search a special attribute name shall be used: AnyText. For example, the following CQL expression instructs the CSW server to search for all the metadata records with the word Africa at any place: AnyText LIKE '%Africa%'.

GeoNetwork relies on the Lucene text search engine to perform full text searches. So a GetRecords request containing a query predicate like the above (on AnyText) is the simplest way to expose the Lucene search to GeoNode.

A typical use case scenario for a client-server interaction through the GetRecords operation is for the Client to first perform a GetRecords operation with the desired query predicate but using the RESULTTYPE=HITS parameter and value. This request returns a minimal xml document stating how many records match the criteria. Then the Client uses paging and either brief or summary result typed in issueing the same request but with RESULTTYPE=RECORDS, in order to limit to amount and size of records returned and present them to the user. When the user wants a full record to be shown, then the GetRecordById operation comes to play.

GetRecordById

The GetRecordById operation is quite self explanatory. The thing to take into account for it is that the identifier it is going to search for is the fileIdentifier property for a MD_Metadata record, and this id can be assigned by the user at the time of inserting a record into GeoNetwork. It is common (and recommended) practice for these identifiers to be  UUIDs. And they're going to be important to the project because they'll be the most direct link between a GeoNetwork's metadata record and a GeoServer's dataset.

Transaction

The Transaction operation, unlike GetRecords and GetRecordById, can only be performed using HTTP POST method, and is composed of a single XML document defining the transaction. The document's body, being either for an Insert, Update or Delete operation is made of a Transaction root element and a nested Insert, Update, or Delete element. The response for a successful request is always an xml document stating the number of records affected by the operation.

A transaction Insert shall contain the full metadata record (in ISO 19139 form) to be inserted.

Sample request:

<csw:Transaction service="CSW"  version="2.0.2" xmlns:csw="http://www.opengis.net/cat/csw" >
  <csw:Insert>
     <gmd:MD_Metadata xmlns:gmd="http://www.isotc211.org/2005/gmd"  {... a lot more ns declarations...} >
     <gmd:fileIdentifier>
         <gco:CharacterString>1dde15b5-f497-416e-8e5e-744542658bd7</gco:CharacterString>
     </gmd:fileIdentifier>
     <gmd:language><gco:CharacterString>eng</gco:CharacterString></gmd:language>
     <gmd:characterSet>
         <gmd:MD_CharacterSetCode codeListValue="utf8" 
                              codeList="http://www.isotc211.org/2005/resources/codeList.xml#MD_CharacterSetCode"/>
     </gmd:characterSet>
     <gmd:contact>
     .... a lot more elements ....
     </gmd:MD_Metadata>
  </csw:Insert>
</csw:Transaction>

Sample response:

<csw:TransactionResponse xmlns:csw="http://www.opengis.net/cat/csw" .... >
   <csw:TransactionSummary>
      <csw:totalInserted>1</csw:totalInserted>
   </csw:TransactionSummary>
</csw:TransactionResponse>

A transaction Update contains the set of attribute names to modify along with their new values. Which records to modify is specified through a filtering criteria.

Sample request:

<csw:Transaction service="CSW"  version="2.0.2"  xmlns:csw="http://www.opengis.net/cat/csw" ... >
  <csw:Update>
    <csw:RecordProperty>
      <csw:Name>/gmd:MD_Metadata/gmd:identificationInfo/gmd:MD_DataIdentification/gmd:abstract</csw:Name>
      <csw:Value>Major hydrological basins and their sub-basins.</csw:Value>
    </csw:RecordProperty>
    
    <csw:Constraint version="1.1.0">
      <ogc:Filter>
        <ogc:PropertyIsEqualTo>
          <ogc:PropertyName>/gmd:MD_Metadata/gmd:fileIdentifier</ogc:PropertyName>
          <ogc:Literal>4e4390ff-ab16-44f1-bb4d-812d1da6b7e3</ogc:Literal>
        </ogc:PropertyIsEqualTo>
      </ogc:Filter>
    </csw:Constraint>
  </csw:Update>
</csw:Transaction>

Sample response:

<csw:TransactionResponse xmlns:csw="http://www.opengis.net/cat/csw" ... >
   <csw:TransactionSummary>
      <csw:totalUpdated>1</csw:totalUpdated>
   </csw:TransactionSummary>
</csw:TransactionResponse>

A transaction Delete works similarly, specifying which records to delete through a filtering criteria.

Sample request:

<csw:Transaction service="CSW" version="2.0.2" xmlns:csw="http://www.opengis.net/cat/csw" ... >
  <csw:Delete typeName="gmd:MD_Metadata">
    <csw:Constraint version="1.1.0">
      <ogc:Filter>
        <ogc:PropertyIsEqualTo>
          <ogc:PropertyName>/gmd:MD_Metadata/gmd:fileIdentifier</ogc:PropertyName>
          <ogc:Literal>4e4390ff-ab16-44f1-bb4d-812d1da6b7e3</ogc:Literal>
        </ogc:PropertyIsEqualTo>
      </ogc:Filter>
    </csw:Constraint>
  </csw:Delete>
</csw:Transaction>

Sample response:

<csw:TransactionResponse xmlns:csw="http://www.opengis.net/cat/csw" ... >
   <csw:TransactionSummary>
      <csw:totalDeleted>1</csw:totalDeleted>
   </csw:TransactionSummary>
</csw:TransactionResponse>

Requests and responses

For a detailed list of example requests and set up instructions see the CSW_client_implementation_guidelines page.

To get a sense of how some of them look like, try the following requests: