[Taxacom] iSpecies

Wed Jan 27 03:29:28 CST 2016

This thread is getting more and more interesting. A few quick comments.

1) Scope of iSpecies - As Tony points out, iSpecies is grabbing information from elsewhere, saving a few mouse clicks. Apart from adding more sources (and hence potentially saving a few more clicks) the real value of this approach will be does it reveal information that you might not get from visiting each source separately? This requires integrating the data from across the sources, rather than simply redisplay it. This is ultimately what I after, the iSpecies demo is simply a “least common denominator” proof of concept. Ideally integrating across sources would make use of consistent identifiers (something we’ve not been good at providing).

2) Obviously I’m biased but I’ve enjoyed playing with the demo, I’ve found journals that I didn’t realise were online (and hence added more articles to http://bionames.org ), and I’ve found cases were there are obvious problems with the data, e.g. http://ispecies.org/?q=Sotalia%20fluviatilis  I’m also realising the Open Tree of Life tree is very poorly resolved :(

3) Problems with data lead us to annotation, a topic which keeps coming up and one we’ve failed to tackle with much success. One development that looks interesting is http://hypothes.is which aims to provide a tool for annotating the web at large, including scholarly content https://hypothes.is/annotating-all-knowledge/  I’ve played with it with BioStor, see http://iphylo.blogspot.co.uk/2015/09/hypothesis-revisited-annotating.html and a live example at http://biostor.org/reference/147608/page/1 (click on the little “<“ arrow on the top right to see the annotations). We need consistent, stable identifiers for the things we care about, and ideally a way of identifying annotators (e.g., ORCID), then we could really start to see some interesting things happen.

4) Tools for annotating data don’t solve the issue of will anybody take the annotations and fix the problem. people seem to assume this will happen, but I suspect for the vast majority of data sets it’s unlikely. Resolving issues with data can be very time consuming, who has the resources to do this across all our data? I suspect we should be thinking of automated techniques for taking original data and subsequence annotations and computing the probability that the statements made are correct. This is something the big search engines like Google are doing, machine learning on a massive scale to build “knowledge bases”.

Regards

Rod

---------------------------------------------------------
Roderic Page
Professor of Taxonomy
Institute of Biodiversity, Animal Health and Comparative Medicine
College of Medical, Veterinary and Life Sciences
Graham Kerr Building
University of Glasgow
Glasgow G12 8QQ, UK

Email:  Roderic.Page at glasgow.ac.uk<mailto:Roderic.Page at glasgow.ac.uk>
Tel:  +44 141 330 4778
Skype:  rdmpage
Facebook:  http://www.facebook.com/rdmpage
LinkedIn:  http://uk.linkedin.com/in/rdmpage
Twitter:  http://twitter.com/rdmpage
Blog:  http://iphylo.blogspot.com
ORCID:  http://orcid.org/0000-0002-7101-9767
Citations:  http://scholar.google.co.uk/citations?hl=en&user=4Z5WABAAAAAJ
ResearchGate https://www.researchgate.net/profile/Roderic_Page