Georeferencing

Introduction

Georeferencing is the process of interpreting textual descriptions of places into spatial descriptions (i.e. geographic coordinates). E.g. “Bear Island, Lake Temagami, Nipissing District, Ontario, Canada” → decimal latitude: 46.9831216, decimal longitude: -80.0681018, uncertainty: 1410m (show on map). Georeferencing biological specimens – which often only have textual descriptions of their locality – allows the information to be displayed on a map and used in spatial analyses.

Although extremely useful, georeferencing is also time-consuming and should be tackled as a community, with the right tools and guidelines. From Chapman & Wieczorek, 2006. Guide to best practices for georeferencing, p. 21:

By far the most difficult issue in georeferencing primary species occurrence data is the massive amount of legacy data held in the world’s museums, herbaria, universities, etc. Most modern collectors are now using GPSs or large scale maps to locate their collection events, and thus most of the new data entering institutions already include georeferences. Most museums beginning to database their collections, however, are faced with the massive task of georeferencing the huge backlog of data in their collections, much of it with very little or vague location information. This document aims to assist these institutions with georeferencing their legacy data.

Tools

Resources for Canada

Documents

API

An API (Application Programming Interface) is a set of protocols and methods allowing softwares to communicate and exchange services. Web based API, in the context of georeferencing, allow to retrieve geographic coordinates from adresses and localities, or the opposite.

API requests can be automated in data cleaning tools, such as OpenRefine or R.

Web services available:

Georeferencing and Canadensys

One of the goals of Canadensys is to georeference a large part of the published specimen information as accurately as possible, and to provide uncertainties for our coordinates, as it provides key information to determine the data’s fitness for use and thus the data quality. It has not been decided how we are going to georeference as a network, but it is clear we should look for ways to georeference collaboratively. This greatly reduces the duplication of efforts and can generate more accurate results, since specimens can be grouped per province/locality in the overall network (instead of per collection), and thus be georeferenced by the people with the most knowledge of these provinces/localities.

Canadensys participants received an introduction to georeferencing techniques and best practices at the Canadensys & UC Berkeley Georeferencing workshop (Ottawa 2010).