Metadata

Introduction

Metadata is data about data, and can be used to define, structure, manage and discover information. In the context of a specimen dataset, metadata includes the address of the collection, the number of specimens, the taxonomical scope, the names and definitions of the dataset fields, etc.

Metadata is no different than ‘regular data’ : one person’s data is often another person’s metadata. For example, the address of a collection is metadata for a specimen dataset, but data for a registry of collections. For a data user, good metadata will enable him to discover data and assess their appropriateness for particular needs.

Metadata standards

Data standards are used to exchange metadata (primarily used for machine-machine interaction). In the biodiversity informatics community, the standards used are:

All standards are expressed as XML. Datasets which are published via the IPT automatically express their metadata as EML.

Registries

In order to allow the discovery of data, a dataset/collection not only needs metadata, but also needs to be registered somewhere. For collections, such indexes/registries include:

Registered collections can choose a unique code (e.g. MT), which can be referenced in literature. Unfortunately, some codes are not unique across discipline or continent, which is one of the reasons why the Global Biodiversity Information Facility (GBIF), the Biodiversity Information Standards (TDWG) and the Royal Botanic Garden Edinburgh developed the Biodiversity Collection Index (BCI). With the aim to centralize information in a single registry, the BCI is now merged with Index Herbarorium, and the Registry of Biological Repository to form the Global Registry of Biodiversity Repositories (GRBio)

Global Registry of Biodiversity Repositories (GRBio)

The Global Registry of Biodiversity Repositories (GRBio) is a world-wide index to biological collections. Information (metadata) about each collection has been harvested from existing registries (BCI, IH), and users can update or add information on the GRBio website. The globally unique LSID identifier assigned by the BCI are deprecated and the DarwinCore triplet institutionCode:collectionCode:catalogNumber is now used to identify each voucher specimen.

GBIF Registry

In 2009, GBIF has developed a web interface, the Global Biodiversity Resources Discovery System (GBRDS), enabling an automatic registration of published datasets. This interface has evolved in a Web API enabling the search of metadata during the publication process. Datasets published on the Canadensys IPT will be registered automatically.

Metadata and Canadensys

Since most Canadensys collections are already registered in the Global Registry of Biodiversity Repositories (GRBio) (via Index Herbariorum and the Insect and Spider Collections of the World), we will use the GRBio’s services as a unified way to harvest collection metadata. Curators should review their collection information in GRBio to insure it is up to date.