[Taxacom] A question for GBIF regarding data harvests from iNaturalist

John Shuey jshuey at TNC.ORG
Wed Dec 22 15:11:53 CST 2021


These iNaturalist threads highlight the pros and cons of citizen science data repositories – (FYI - I’ve found similar problems in Biosis data).  Mostly I’m here to point out that these sites can be useful, but to qualify the use of sites like this as a data source.  First, I’ll note that I’m interested in Mesoamerican Lepidoptera – primarily Belize -  so there is an abundance of data in such sites – especially iNaturalist.  And second, I have never actually contributed data to sites like this, but I have provided IDs and obvioulsy, I skimmed data.

iNaturalist Belize butterfly data summary -  There are over 4,000 records of butterflies from Belize representing ~520 species (~49% of the known species pool).  The data is dominated by common widespread species of disturbed habitats, especially if the bug is big, pretty and feeds on garden flowers.  Like all ecological samples, 75% of the records  are comprised of the most common 10-15% species.

On the downside, many records are over identified or unidentifiable (hence not RG).  Research grade identifications are often total bullcrap.   The problems are mostly associated with bad photos and/or over identification, where hacks assign easy names to species complexes that cannot be IDed from photos.  Two idiots who agree = one research grade record.   If you are going to use this data, every record must be verified.  So after my review, about 55% of the iNat butterfly data in Belize are useable – (we’re talking butterflies here – imagine the issues with curculionids).  And of the 520 “species” recorded from Belize,  about 85% represent records that are solid enough for my use.

On the upside - there are some truly amazing records  hidden in this mess  There are true gems in the remaining data set, including around 10 species otherwise not known from Belize (and a score of major habitat/range extensions).  At least one easily recognized new species has been photographed more times than I have netted it…  Unique to Belize and butterflies, many of the photographers are tourists – and they tend to visit during northern hemisphere winter (Belize’s dry season), which I generally avoid.  Their observations help fill that seasonal gap during which species richness is at its lowest.   So, these data are a pretty nice supplement to my time in the field!

But the bottom line is - I find it hard to believe that anyone or organization is skimming off iNat RG records as valid without any QA/QC.

John Shuey

From: Taxacom <taxacom-bounces at mailman.nhm.ku.edu> On Behalf Of Stephen Thorpe via Taxacom
Sent: Wednesday, December 22, 2021 3:35 PM
To: jmiller at gbif.org; Taxacom <taxacom at mailman.nhm.ku.edu>
Subject: [Taxacom] A question for GBIF regarding data harvests from iNaturalist

Hi Joe,
As you know, GBIF periodically harvests Research Grade observations from iNaturalist. What isn't quite clear, but which I think would be well worth clarifying, if you could, please, is what happens to observations which drop back out of Research Grade? Do they drop out of GBIF at the next harvest? This is important for the reason that there are two types of cases, and the consequences are very different for observations of each type: (1) observations of well-known species; and (2) observations reliant on expert IDs.
For type (1) observations, it can be reasonably assumed that dropping back out of RG will rarely happen, and if it does happen for inadequate reasons, then the community ID will be restored fairly quickly, since it involves a well-known species that many iNat users are familiar with.
For type (2) observations, however, IDs may be based on just a couple of experts. An RG observation of this kind can be dropped out of RG by any iNat user, who chooses to disagree for whatever reason, be it scientific or personal or whatever. The lack of further experts means that RG is likely not to be able to be restored very easily!
So, my question is, for type (2) observations that were RG long enough to have been harvested by GBIF, if they subsequently drop out of RG on iNat, do they drop out of GBIF at the next data harvest? If so, then data already in GBIF, harvested from iNat, is vulnerable to the whims of single users on iNat, which, to my mind at least, is a concern!
Cheers, Stephen
_______________________________________________
Taxacom Mailing List

Send Taxacom mailing list submissions to: taxacom at mailman.nhm.ku.edu<mailto:taxacom at mailman.nhm.ku.edu>
For list information; to subscribe or unsubscribe, visit: http://mailman.nhm.ku.edu/cgi-bin/mailman/listinfo/taxacom<http://mailman.nhm.ku.edu/cgi-bin/mailman/listinfo/taxacom>
You can reach the person managing the list at: taxacom-owner at mailman.nhm.ku.edu<mailto:taxacom-owner at mailman.nhm.ku.edu>
The Taxacom email archive back to 1992 can be searched at: http://taxacom.markmail.org<http://taxacom.markmail.org>

Nurturing nuance while assailing ambiguity for about 34 years, 1987-2021.


More information about the Taxacom mailing list