10-step guide to managing images with your biodiversity data

Introduction

The management of digital images is fast becoming a significant part of a collections manager’s duties and requires a diverse yet specialized set of skills.
The following 10-step guide is meant to provide an introduction to basic ways to organize, store, and distribute digital images. A more thorough guide has been produced by our friends at iDigBio.

1. Establish a storage protocol for safe and secure backups of your images

We recommend that you keep no less than two sets of your images in different locations : one on-site for ease of retrieval and one off-site in the event of catastrophic loss. Backups should be periodically verified to ensure that they can be restored. A backup is not useful unless you have the confidence that it can actually be restored when needed.

2. Establish a file (re)naming convention

You should ensure that all file names are unique throughout your collection even if they are distributed among a variety of nested folders. You never know when and if an IT technician might one day decide to collapse your folder structure for ease of maintenance. For most operating systems, a file name can be up 255 characters long and may contain letters, numbers, and underscores. Not all file systems are case-sensitive, so we recommend that all letters be uppercase in a file name (making it harder to confuse a ‘1’ (one) with the letter ‘l’), OR that all letters be lowercase (making it harder to confuse a ‘0’ (zero) with the letter ‘O’). File names that contain non-ASCII text, slashes, brackets, etc. are best avoided because certain characters have special meaning to some operating systems.

Example 1 : You can name each image with a human- or database-generated unique ID such as a database auto-increment. Although this technique is easy to do, it poses some risk of duplication.

Example 2 : You can name each image with a unique identifier that uses random number and letters generated by a computer such as a UUID (Universally Unique Identifier). This technique is not easy to do, but eliminates the risk of duplication.

Example 3 : You can name each image with the museum accession code, suffixed with numbers, letters, or a machine-generated code (e.g. MT0011212_ABVG1BH6). This ensures retrievability and uniqueness.

Try to think ahead about what file naming convention you might adopt if you have different categories of images and wish to encode this metadata in the file name. For example, it may be desirable to indicate that an image represents a dry specimen, the habitus in the field taken prior to collection, or that of fieldnotes.

3. Choose an appropriate file format for your images

You should keep a non-proprietary format for your backup or archive (e.g. DNG, RAW) at the native resolution.

For example, examine scantips for the most common image file formats (e.g. jpg, jpeg2000, DNG, tiff, raw …).

4. Convert images to an expected format ready for exchange

If your archival or source files are too large to share with your colleagues, you must use a converter and set up automated, batch conversion scripts. Multiple converters are available for free. Here is a non-exhaustive list : imagemagick working with over 200 formats, FreePhotoConverter for jpg, bmp, gif, png, tiff, FastStone supports JPEG, BMP, GIF, PNG, TIFF and JPEG2000, ImageBatch for JPEG, PNG, GIF and BMP formats.

5. Store your images in a publicly accessible resource

Your images will likely have to be publicly accessible via a URL, either via a public file hosting service or on a publicly accessible server. The service or server needs to be stable and active at all times because aggregators like the Global Biodiversity Information Facility (GBIF) and Canadensys as well as search engines may pull your images for presentation on web pages in real-time.

Public file hosting services

By way of example, the Cercle des mycologues de Montréal Fungarium (CMMF) stores its images in a DropBox account.

Example : https://dl.dropboxusercontent.com/u/10639207/cmmf-photos/3091.jpg

DropBox Pro stores up to 1 TB (1000GB) for CA$11/month and DropBox Business offers unlimited storage for CA$17/user/month with a minimum of 5 users (as indicated at the time this post was published). Other file hosting services in the cloud may work just as well, such as Google Drive, Windows OneDrive, Carbonite, and Oracle Cloud.

Institutional server

For example, volunteers at the Marie-Victorin Herbarium (MT) takes photographs of specimens with a camera linked to LightRoom software. The images are saved to disk in RAW format for archive purposes then batch converted into JPG and placed on a publicly accessible server. The server is maintained by Canadensys staff on the Université de Montréal computing infrastructure.

Example : http://media.canadensys.net/mt-specimens/large/MT00163567.jpg

6. Decide what license(s) will be applied to your images, and who holds the rights (rights Holder) if applicable

You may wish to license raw biodiversity data and multimedia files separately. However, both objects should have machine-readable license(s).

While we recommend that CC0 be used for biodiversity data (read this to know why), Canadensys recommends the application of one of 3 Creative Commons licenses to multimedia files : CC0, CC-BY, or CC-BY-NC.

CC0 waives all rights and places the resource in the public domain so that others may freely build upon, enhance and reuse the work for any purposes without restriction under copyright or database law.

BY calls upon provenance. This license lets others distribute and build upon your work, even commercially, but users are legally obligated to credit you.

NC means Non Commercial, which is different than « No profit ». Images licensed under a NC clause cannot be legally used for biology textbooks or for a conference advertisement without the explicit consent of the creator.

Canadensys advises against the application of SA or ND clauses to Creative Commons licenses:

SA means Share-Alike, a copy “left” restriction. It is sometimes refered to as a parasitic license because it requires that a compilation use the same license should any of its elements have a SA clause. This limits its reuse because of possible licence interoperability issues.

ND means No Derivative. Canadensys can not use material with this license since No derivative includes No Resizing. Only the URL link to the image will be presented; the image itself will not appear on the Canadensys Explorer.

7. List necessary metadata terms

Who took the photograph? When? Can I reuse it? Answers to these questions and others should be provided in the metadata along with the image itself. The Audubon Media Description extension thoroughly represents many metadata elements for biodiversity images and is used by iDigBio. Canadensys has elected to use the Simple Multimedia extension to Darwin Core. It allows for the representation of essential metadata without being overly verbose and is easily associated with checklist and occurrence records.

The first step to managing multimedia is to select terms from either extension that are relevant to your multimedia. You can look at our interpretation of terms from the Simple Multimedia extension and follow our recommendations for their use.

8. Remodel your database

Adjust your database schema to accommodate image file paths, URLs of specimens & their metadata (see point 7) and to link your imageID to your occurenceID.

9. Adjust your data export script to produce a Multimedia text file

Visit our 7-step guide for a refresher on how to publish data in our Integrated Publishing Toolkit (IPT). The export of multimedia data follows a very similar process. We do not provide specific guidance here because of the diversity of database software used by collections managers. If you need more assistance with database design or data export, please contact us.

10. Mapping process with the extension

Once you have uploaded and mapped your core dataset in the IPT (according to the 7-step guide), you can upload your multimedia dataset and map your fields with the Simple Multimedia extension terms.

Why are images so important to biodiversity science?

  • Images associated with observation data
    • To verify species identification without physical proof
    • To provide information on the habitat and surrounding community of species
  • Images associated with specimen data
    • To capture the appearance of specimens if they are at risk of deteriorating
    • To secure this heritage for posterity (whatever happens to the physical object)
    • To increase access to the collection to all citizens of the world
    • To provide source information on the label
    • To nurture citizen science projects (crowd sourcing) and accelerate digitization
  • Images of specimens before the collecting event
    • To facilitate identification by providing images of different parts of the organism before curation
    • To provide rich taxonomic and ecological information
  • Images of fieldnotes associated with observations or specimen data
    • To conserve annotations from a collector in its original language
  • Images associated with a taxon in a checklist
    • To refer to the type specimen
    • To highlight specific characters useful for identification

More input on digital asset management can be found here:

Citation

As all documents on this website, this guide is published under CC-BY. The preferred citation is:

Shorthouse, D. & B. Rivière. 2014. 10-step guide to managing images with your biodiversity data. Canadensys. https://community.canadensys.net/publication/multimedia-publication-guide