CSW Client Implementation Guidelines

This document focuses on providing a smooth introduction to integrating GeoNode and GeoNetwork, by giving exemplar requests based on the GeoNode/GN integration approach and use cases. For a brief introduction to CSW take a look at CSW_at_a_glance

If too boring go straight to CSW_sample_requests_and_responses.

Relevant documents

The approach

GeoNode is going to require storing and querying information from both GeoNetwork an GeoServer, and will act as the integration layer between GeoServer datasets and their corresponding metadata records.

To do so, every dataset upload in GeoNode will require to:

  • Create a unique identifier for the metadata record
  • Upload the dataset to GeoServer, with a link back to the metadata record in the GeoNetwork instance
  • Get the dataset's geographic bounding box, either by asking GeoServer of by other means
  • Grab a prototype metadata record in xml form and fill in the metadata information, including the user provided information and the introspected one such as bounding box
  • Insert the metadata record into GeoNetwork

For the client, all of those shall be seen as an atomic transaction, so care shall be taken to recover at any possible failure point to minimize the chance of letting the system in an inconsistent state.

What metadata?

ISO 19115 defines a wide set of metadata properties for a geographic resource.

For instance, it provides to store information regarding:

  • language
  • contact information
  • spatial representation info
  • reference system info
  • citation info such as title, abstract, creation and edition dates, representation forms such as digital or paper maps, purpose the dataset was created for, completeness status, maintenancet frequency, thumbnails, descriptive keywords, spatial resolution, topic categories, geographical extent, and more.
  • data quality info

NOTE: At some point, we will need to come up with a prototype metadata record based on the ISO 19139 encoding for ISO metadata (19115), thus selecting what metadata properties we'll want to store when GeoNode uploads a new dataset. The ISO 1915 standard is not freely available. It can be purchased at the  ISO site. The UML models can be browsed online at the  TC211 site though. However, that is usually not sufficient information as to make an informed decision on what's actually the structure of a metadata record for a given domain. Note shall be taken, hence, that we'll need not to under valuate the effort required to get to an agreed prototypical metadata record, and the process would require various roundtrips with the customer.

GeoNetwork considerations

GeoNetwork does not work internally with a structured object model to represent metadata object trees (such as, for example, the GeoAPI Metadata interfaces  org.opengis.metadata package and subpackages).

Instead, GeoNetwork stores the xml document representations of a metadata record in its native form (the form they were submitted in). The xml schema for a metadata record, though, shall be one of the supported ones in GeoNetwork. If you upload a metadata record in Dublin Core format and then request it in ISO 19139 format, GeoNetwork applies an XSL Transformation to serve it. Actually GeoNetwork works extensively with XSLT and DOM representations of the metadata records.

The one weird implication of this approach is that it is hard, if not impossible, for different metadata records sharing some identification information for example, to actually share it. That is, the abstract model defines both Classes and Data Types, and the relationships between them. But in GeoNetwork there's no notion of relationship between two metadata entities other than containment. So you can't, for example, update a Responsible Party information and automatically get all the metadata records referring to it being updated. You have to update the metadata records themselves, and get each record's copy of the citation info being updated.

GeoNetwork set up

We need to use trunk due to a couple bug fixes that otherwise would not be present and wouldn't allow us to make full text searches properly.

Relevant links:

Development

The following are the condensed instructions I had to follow (collected from various places) to get a working development environment

GeoNetwork is an OSGEO project. As most OSGEO projects, it uses some OSGEO infrastructure and some of its own.

  • Requirements: Java 6, ant, maven, eclipse
  • Check out the source:
    $mkdir <your projects dir>/geonetwork && cd <your projects dir>/geonetwork
    $svn co https://geonetwork.svn.sourceforge.net/svnroot/geonetwork/trunk
    $cd trunk
    $ ls -F
    bin/		cachingxslt/	data/		gast/		jeeves/		jsbuild/	src/		test/
    build.xml	csw/		docs/		installer/	jetty/		schematrons/	target/		web/
    
  • Build: GN did not yet migrate to maven though it's planned. You need ant to build the project:
    $ant
    $ls web/
    geonetwork	geonetwork.war	geoserver	intermap	intermap.war
    

The integrated geoserver is rather old, but we can get rid of it

TODO: verify and document how to do so and what it is exactly used for. Whether it is necessary or not, etc.

  • Import project into eclipse:

It would be easier if they already switched to maven, specially because I like to have the project sources separated from the eclipse workspace. As it's not the case yet, lets just go with the instructions at  http://trac.osgeo.org/geonetwork/wiki/HowToRunUnderJettyInEclipse, they're good enough.

  • Prepare the database:

GeoNetwork comes with an embedded McKoi database. It can also connect to PostgreSQL and MySQL. So far we can stick to the embedded one. But it needs some set up before you can use GeoNetwork for the first time, even for development:

  1. Start up GAST (use either gast.bat or gast.sh in the bin directory as appropriate for you).
  2. In GAST, Configure your settings:
    1. Configuration -> DBMS: change the port number if you like and press Save
    2. Database -> Setup: press the Setup button! You may get a warning about circular references; this is "normal".
    3. Quit GAST.
  3. You can now start GeoNetwork using the start-geonetwork.(sh,bat) script. You got it at  http://localhost:8080/geonetwork/
  • Load sample data

If you want to load the sample data, start up GAST again, i.e., after you started GeoNetwork. Select Options from the Config menu and enter your database settings - for default McKoi, select "Use this account" and use admin as username and password. Then go to Database -> Sample data and press the Import button.

PostgreSQL

For Mac, I use the  binary distribution from  http://www.postgresql.org/download/macosx. But read the README file in the .dmg image before installing to set shared mem OS parameters.

Production

Sample requests

Check the CSW_sample_requests_and_responses page for a detailed list of sample requests and responses, or use the script attached at GN_Transactions_in_python to try some for yourself.

GeoNetwork Transactions in Python

See GN_Transactions_in_python