Delightful Explorations
Friday, 30 November 2012
The Canadensys Explorer is reaching 1M collection records. As this landmark nears, we realize that its dynamic map is becoming less usable. The recognition of patterns and the ability to zoom in to specific collection localities are becoming challenging tasks. With the help of Vizzuality, we are investigating dynamic clustering methods that are triggered when most of the country is visible to you but revert to single point display when you zoom in.
This is what we currently show on the Explorer:
You can filter records using various criteria (e.g. scientific name, dataset name, date) and the point collection localities are redisplayed. We would like the Explorer to retain its great performance but we would also like to make it more usable. Some of the clustering techniques we are testing are grid-based, k-means-based, and country-based.
Grid-based Clustering
This algorithm splits the entire world into equal area boxes. Then we group points within boxes. This technique can also be used to generate heat maps.
k-Means Clustering
This is an iterative algorithm that splits the world into a predetermined number of clusters. Membership in a cluster is determined by the closest mean. There is a tendency of k-means to produce equi-sized clusters. Occasionally, it does not always produce a pleasing result.
Country-based Clustering
This is a relatively simple algorithm that counts the number of georeferenced localities within country borders. However, the approach reveals problems when a country is a split territory (e.g. USA) or when specimens were collected in the ocean (e.g. algae).
Other Solutions
Other algorithms like DBSCAN or Expectation-Maximization could be tested but we are concerned that these are too slow to be useful in a dynamic map.
Recent Work
The CartoDB team at Vizzuality has been very active and we are thankful for their technical support and the open source software they produce. They recently released a hexagon density map that we think is interesting.
We Want Your Feedback
Before we devote more time to this, we would like your feedback. Are you satisfied with the current view of specimens in the Explorer? Are you concerned that >1M points will be less useful at initial zoom levels? Would you appreciate a cleaner presentation? If so, which of the above methods are most pleasing? Do you have experience with clustering methods and can offer advice?
Labels: | Explorer |
---|