GetRecrods operation
The GetRecords operation is pretty powerful. It supports not only requesting metadata records at different levels of detail (full, summary, and brief) and in different output schemas, but also allows for paging and sorting, as well as stating the filter criteria in different query languages. The two official query languages are Filter - as in OGC Filter specification 1.1 -, and CQL - acronym for OGC's Common Query Language-. For supporting both query languages, GeoNetwork relies on GeoTools (for parsing and encoding purposes).
Both languages offer pretty much the same power. Usually a predicate is composed of an attribute based filter with a logical, spatial or comparison operand. The attribute in a predicate shall be encoded according to the language rules, and attributes are often nested (according to the record's xml schema). If using Filter 1.1 as the query language, a nested attribute is addressed as an XPath location path (eg, gmd:MD_Metadata/gmd:contact/gmd:CI_ResponsibleParty/gmd:individualName. If using CQL the attribute is addressed using dots as separator: gmd:MD_Metadata.gmd:contact.gmd:CI_ResponsibleParty.gmd:individualName.
But being this a metadata search, and a good bit of the most useful information in a metadata record being plain text, it makes sense for a "search them all" functionality to exist. To perform a full text search, a special attribute name shall be used: AnyText. For example, the following CQL expression instructs the CSW server to search for all the metadata records with the word Africa at any place: AnyText LIKE '%Africa%'.
GeoNetwork relies on the Lucene text search engine to perform full text searches. So a GetRecords request containing a query predicate like the above (on AnyText) is the simplest way to expose the Lucene search to GeoNode.
Typical use case
A typical use case scenario for a client-server interaction through the GetRecords operation is for the Client to first perform a GetRecords operation with the desired query predicate but using the RESULTTYPE=HITS parameter and value. This request returns a minimal XML document stating how many records match the criteria. Then the Client uses paging and either brief or summary result types in issuing the same request but with RESULTTYPE=RECORDS, in order to limit to amount and size of records returned and present them to the user. When the user wants a full record to be shown, then the GetRecordById operation comes to play.
Anatomy of the requests
GetRecords support both HTTTP GET and POST methods in the usual OWS way.
A minimal request for HTTP GET assumes the form: http://<csw endpoint>?SERVICE=CSW&VERSION=2.0.2&REQUEST=GetRecords&TYPENAMES=<TYPE_NAME>&CONSTRAINTLANGUAGE=<constraint language>&CONSTRAINT_LANGUAGE_VERSION=<clang version>.
Where <TYPE_NAME> assumes either the value csw:Record or gmd:MD_Metadata. The parameter is meant to specify which entities from the information model to query, but in GeoNetwork it seems to cause no effect. So csw:Record seems like a reasonable default for all requests.
The CONSTRAINTLANGUAGE and CONSTRAINT_LANGUAGE_VERSION parameters allows to specify whether the query predicate is expressed in OGC CQL or Filter format. And the parameter values shall be either CQL_TEXT or FILTER.
Note CONSTRAINTLANGUAGE and CONSTRAINT_LANGUAGE_VERSION are mandatory by spec only if a query constraint is specified (through the CONSTRAINT parameter), but GeoNetwork fails if they're not present even if no constraint is specified.
A minimal working GetRecords request then would be like http://localhost:8080/geonetwork/srv/en/csw?REQUEST=GetRecords&SERVICE=CSW&VERSION=2.0.2&OUTPUTSCHEMA=http://www.isotc211.org/2005/gmd&CONSTRAINTLANGUAGE=CQL_TEXT&RESULTTYPE=results&TYPENAMES=csw:Record
And its XML counterpart:
<?xml version="1.0" encoding="UTF-8"?>
<csw:GetRecords
xmlns:csw="http://www.opengis.net/cat/csw/2.0.2"
service="CSW"
version="2.0.2"
resultType="results"
outputSchema="http://www.isotc211.org/2005/gmd">
<csw:Query typeNames="csw:Record">
</csw:Query>
</csw:GetRecords>
Optional but important request parameters
There are a number of optional GetRecords parameters that we need to keep in mind:
OUTPUTSCHEMA
The OUTPUTSHEMA parameter specifies in which form CSW shall return the metadata records. The ISO19139 profile (which we're interested in) might not be the server's default, but Dublin Core or an older form of ISO19115. Hence, in order to simplify our life we'll work always with a single representation, and to do so we should always use the OUTPUTSCHEMA request parameter with the value http://www.isotc211.org/2005/gmd.
ELEMENTSETNAME
The ELEMENTSETNAME parameter takes one of the values [brief|summary|full]. Default value is "summary". It aims at limiting the number of properties returned for a metadata record, supposedly going from the most fundamental ones to the full set of (available) properties for a given record.
NAMESPACE
The NAMESPACE parameter (only for HTTP GET method) allows to specify the namespace to prefix mapping for any element present in a query constraint (such as, for example, csw:Record or gmd:MD_Metadata). It's form is like in NAMESPACE=(xmlns:csw=http://www.opengis.net/cat/csw/2.0.2),(xmlns:gmd=http://www.isotc211.org/2005/gmd)
RESULTTYPE
Value shall be either results or hits. The former returns the matching records, the later an empty GetRecordsResponse document with the root element having a numberOfRecordsMatched="<record count>" attribtue.
STARTPOSITION
Used to implement paged requests, generally in conjunction with the MAXRECORDS parameter. Indicates at which record index to start returning the request's matching records. Defaults to 1.
MAXRECORDS
Used to limit the number of records returned by a request. Defaults to 10.
CONSTRAINT
The predicate expression specified in the language indicated by the CONSTRAINTLANGUAGE parameter.
For example: CONSTRAINT=AnyText LIKE '%Africa%' if CONSTRAINTLANGUAGE=CQL_TEXT, or CONSTRAINT=<Filter xmlns="http://www.opengis.net/ogc" xmlns:gml="http://www.opengis.net/gml"><PropertyIsLike wildCard="%" singleChar="_" escape="\"><PropertyName>AnyText</PropertyName><Literal>%Africa%</Literal></PropertyIsLike> if CONSTRAINTLANGUAGE=FILTER.
(Note in both cases the parameter values should have been properly URL formatted.)
SORTBY
Ordered list of Character String, comma separated names of metadata elements to use for sorting the response.
The format of each list item is metadata_elemen_ name:A indicating an ascending sort or metadata_ element_name:D indicating descending sort. About metadata_elemen_ name: use only the plain name (not case sensitive) without any prefixes etc, because these are uniquely defined. Example: Denominator instead of SpatialResolution.Denominator
Sample requests
The simplest "get them all" request
POST method version:
<?xml version="1.0" encoding="UTF-8"?>
<csw:GetRecords
xmlns:csw="http://www.opengis.net/cat/csw/2.0.2"
service="CSW"
version="2.0.2"
resultType="results"
outputSchema="http://www.isotc211.org/2005/gmd">
<csw:Query typeNames="csw:Record">
<csw:ElementSetName>brief</csw:ElementSetName>
</csw:Query>
</csw:GetRecords>
The response:
<?xml version="1.0" encoding="UTF-8"?>
<csw:GetRecordsResponse
xmlns:csw="http://www.opengis.net/cat/csw/2.0.2"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.opengis.net/cat/csw/2.0.2 http://schemas.opengis.net/csw/2.0.2/CSW-discovery.xsd">
<csw:SearchStatus timestamp="2010-02-18T12:34:02" />
<csw:SearchResults numberOfRecordsMatched="5" numberOfRecordsReturned="5" elementSet="brief" nextRecord="0">
<gmd:MD_Metadata
xmlns:gmd="http://www.isotc211.org/2005/gmd"
xmlns:gml="http://www.opengis.net/gml"
xmlns:gts="http://www.isotc211.org/2005/gts"
xmlns:gco="http://www.isotc211.org/2005/gco"
xmlns:geonet="http://www.fao.org/geonetwork">
<gmd:fileIdentifier>
<gco:CharacterString>a669c913-1d64-47a4-8467-f15dab218400</gco:CharacterString>
</gmd:fileIdentifier>
<gmd:identificationInfo>
<gmd:MD_DataIdentification>
<gmd:citation>
<gmd:CI_Citation>
<gmd:title>
<gco:CharacterString>Hydrological Basins in Africa (Sample record, please remove!)</gco:CharacterString>
</gmd:title>
</gmd:CI_Citation>
</gmd:citation>
<gmd:graphicOverview>
<gmd:MD_BrowseGraphic>
<gmd:fileName>
<gco:CharacterString>thumbnail_s.gif</gco:CharacterString>
</gmd:fileName>
</gmd:MD_BrowseGraphic>
</gmd:graphicOverview>
<gmd:graphicOverview>
<gmd:MD_BrowseGraphic>
<gmd:fileName>
<gco:CharacterString>thumbnail.gif</gco:CharacterString>
</gmd:fileName>
</gmd:MD_BrowseGraphic>
</gmd:graphicOverview>
<gmd:extent>
<gmd:EX_Extent>
<gmd:geographicElement>
<gmd:EX_GeographicBoundingBox>
<gmd:westBoundLongitude>
<gco:Decimal>-17.3</gco:Decimal>
</gmd:westBoundLongitude>
<gmd:southBoundLatitude>
<gco:Decimal>-34.6</gco:Decimal>
</gmd:southBoundLatitude>
<gmd:eastBoundLongitude>
<gco:Decimal>51.1</gco:Decimal>
</gmd:eastBoundLongitude>
<gmd:northBoundLatitude>
<gco:Decimal>38.2</gco:Decimal>
</gmd:northBoundLatitude>
</gmd:EX_GeographicBoundingBox>
</gmd:geographicElement>
</gmd:EX_Extent>
</gmd:extent>
</gmd:MD_DataIdentification>
</gmd:identificationInfo>
</gmd:MD_Metadata>
.... four more records here ....
</csw:SearchResults>
</csw:GetRecordsResponse>
A full text search query
Note in the above example some parameter values are (and shall be) properly URL encoded to avoid request ambiguity, though they're displayed as not encoded for easy reading. For instance the values of the namespace, constraint and typeNames are URL encoded.
POST method version:
<?xml version="1.0" encoding="UTF-8"?>
<csw:GetRecords
xmlns:csw="http://www.opengis.net/cat/csw/2.0.2"
service="CSW"
version="2.0.2"
resultType="results"
outputSchema="http://www.isotc211.org/2005/gmd">
<csw:Query typeNames="csw:Record">
<csw:ElementSetName>brief</csw:ElementSetName>
<csw:Constraint version="1.1.0">
<csw:CqlText>AnyText like '%Africa%'</csw:CqlText>
</csw:Constraint>
</csw:Query>
</csw:GetRecords>
Spatial queries
Metadata records may contain bounds information for the resource they describe. Something like:
<gmd:MD_Metadata>
<gmd:fileIdentifier>
<gco:CharacterString>a669c913-1d64-47a4-8467-f15dab218400</gco:CharacterString>
</gmd:fileIdentifier>
<gmd:identificationInfo>
<gmd:MD_DataIdentification>
....
<gmd:extent>
<gmd:EX_Extent>
<gmd:geographicElement>
<gmd:EX_GeographicBoundingBox>
<gmd:westBoundLongitude>
<gco:Decimal>-17.3</gco:Decimal>
</gmd:westBoundLongitude>
<gmd:southBoundLatitude>
<gco:Decimal>-34.6</gco:Decimal>
</gmd:southBoundLatitude>
<gmd:eastBoundLongitude>
<gco:Decimal>51.1</gco:Decimal>
</gmd:eastBoundLongitude>
<gmd:northBoundLatitude>
<gco:Decimal>38.2</gco:Decimal>
</gmd:northBoundLatitude>
</gmd:EX_GeographicBoundingBox>
</gmd:geographicElement>
</gmd:EX_Extent>
</gmd:extent>
</gmd:MD_DataIdentification>
</gmd:identificationInfo>
So you can perform a simple bounding box based query by using the BBOX( BoundingBox, <minx>, <miny>, <maxx>, <maxy>) CQL predicate, like in:
Or as a POST request:
<?xml version="1.0" encoding="UTF-8"?>
<csw:GetRecords
xmlns:csw="http://www.opengis.net/cat/csw/2.0.2"
xmlns:ogc="http://www.opengis.net/ogc"
xmlns:gml="http://www.opengis.net/gml"
service="CSW"
version="2.0.2"
resultType="results"
outputSchema="http://www.isotc211.org/2005/gmd">
<csw:Query typeNames="gmd:MD_Metadata">
<csw:ElementSetName>brief</csw:ElementSetName>
<csw:Constraint version="1.1.0">
<ogc:Filter>
<ogc:BBOX>
<ogc:PropertyName>BoundingBox</ogc:PropertyName>
<gml:Envelope>
<gml:lowerCorner>-17 -34</gml:lowerCorner>
<gml:upperCorner>-16.5 -33.5</gml:upperCorner>
</gml:Envelope>
</ogc:BBOX>
</ogc:Filter>
</csw:Constraint>
</csw:Query>
</csw:GetRecords>
Sorting and paging
Just combine your request with the query predicate you need with the SORTBY, MAXRECORDS, and STARTPOSITION parameters as needed (remembering STARTPOSITION lowest index is 1, not 0.
Would result in something like:
<?xml version="1.0" encoding="UTF-8"?>
<csw:GetRecordsResponse ...>
<csw:SearchStatus timestamp="2010-02-18T16:34:00" />
<csw:SearchResults numberOfRecordsMatched="5" numberOfRecordsReturned="2" elementSet="brief" nextRecord="4">
<MD_Metadata>
....
</MD_Metadata>
<MD_Metadata>
....
</MD_Metadata>
</csw:SearchResults>
</csw:GetRecordsResponse>
