[Taxacom] A question for GBIF regarding data harvests from iNaturalist
Stephen Thorpe
stephen_thorpe at yahoo.co.nz
Wed Dec 22 15:45:13 CST 2021
John Shuey said "But the bottom line is - I find it hard to believe that anyone or organization is skimming off iNat RG records as valid without any QA/QC."
Well, they are! By "skimming off", you mean harvesting. Your comment highlights a possible misunderstanding of the nature of records and identifications. There is a continuum in "validity". At least iNat RG records are not based on a single person's unchallenged identification, which is the case for many data sources straight out of collections. Any taxonomist is capable of making a mistake, and some are definitely sloppier that others. iNat RG records are not necessarily any less valid than anything else. It is just that each source of data/records has its own problems and iNat is no exception. Only if we take time to look into it, can we be a good judge of such validity.
Stephen
On Thursday, 23 December 2021, 10:12:09 am NZDT, John Shuey via Taxacom <taxacom at mailman.nhm.ku.edu> wrote:
These iNaturalist threads highlight the pros and cons of citizen science data repositories – (FYI - I’ve found similar problems in Biosis data). Mostly I’m here to point out that these sites can be useful, but to qualify the use of sites like this as a data source. First, I’ll note that I’m interested in Mesoamerican Lepidoptera – primarily Belize - so there is an abundance of data in such sites – especially iNaturalist. And second, I have never actually contributed data to sites like this, but I have provided IDs and obvioulsy, I skimmed data.
iNaturalist Belize butterfly data summary - There are over 4,000 records of butterflies from Belize representing ~520 species (~49% of the known species pool). The data is dominated by common widespread species of disturbed habitats, especially if the bug is big, pretty and feeds on garden flowers. Like all ecological samples, 75% of the records are comprised of the most common 10-15% species.
On the downside, many records are over identified or unidentifiable (hence not RG). Research grade identifications are often total bullcrap. The problems are mostly associated with bad photos and/or over identification, where hacks assign easy names to species complexes that cannot be IDed from photos. Two idiots who agree = one research grade record. If you are going to use this data, every record must be verified. So after my review, about 55% of the iNat butterfly data in Belize are useable – (we’re talking butterflies here – imagine the issues with curculionids). And of the 520 “species” recorded from Belize, about 85% represent records that are solid enough for my use.
On the upside - there are some truly amazing records hidden in this mess There are true gems in the remaining data set, including around 10 species otherwise not known from Belize (and a score of major habitat/range extensions). At least one easily recognized new species has been photographed more times than I have netted it… Unique to Belize and butterflies, many of the photographers are tourists – and they tend to visit during northern hemisphere winter (Belize’s dry season), which I generally avoid. Their observations help fill that seasonal gap during which species richness is at its lowest. So, these data are a pretty nice supplement to my time in the field!
But the bottom line is - I find it hard to believe that anyone or organization is skimming off iNat RG records as valid without any QA/QC.
John Shuey
From: Taxacom <taxacom-bounces at mailman.nhm.ku.edu> On Behalf Of Stephen Thorpe via Taxacom
Sent: Wednesday, December 22, 2021 3:35 PM
To: jmiller at gbif.org; Taxacom <taxacom at mailman.nhm.ku.edu>
Subject: [Taxacom] A question for GBIF regarding data harvests from iNaturalist
Hi Joe,
As you know, GBIF periodically harvests Research Grade observations from iNaturalist. What isn't quite clear, but which I think would be well worth clarifying, if you could, please, is what happens to observations which drop back out of Research Grade? Do they drop out of GBIF at the next harvest? This is important for the reason that there are two types of cases, and the consequences are very different for observations of each type: (1) observations of well-known species; and (2) observations reliant on expert IDs.
For type (1) observations, it can be reasonably assumed that dropping back out of RG will rarely happen, and if it does happen for inadequate reasons, then the community ID will be restored fairly quickly, since it involves a well-known species that many iNat users are familiar with.
For type (2) observations, however, IDs may be based on just a couple of experts. An RG observation of this kind can be dropped out of RG by any iNat user, who chooses to disagree for whatever reason, be it scientific or personal or whatever. The lack of further experts means that RG is likely not to be able to be restored very easily!
So, my question is, for type (2) observations that were RG long enough to have been harvested by GBIF, if they subsequently drop out of RG on iNat, do they drop out of GBIF at the next data harvest? If so, then data already in GBIF, harvested from iNat, is vulnerable to the whims of single users on iNat, which, to my mind at least, is a concern!
Cheers, Stephen
_______________________________________________
Taxacom Mailing List
Send Taxacom mailing list submissions to: taxacom at mailman.nhm.ku.edu<mailto:taxacom at mailman.nhm.ku.edu>
For list information; to subscribe or unsubscribe, visit: http://mailman.nhm.ku.edu/cgi-bin/mailman/listinfo/taxacom<http://mailman.nhm.ku.edu/cgi-bin/mailman/listinfo/taxacom>
You can reach the person managing the list at: taxacom-owner at mailman.nhm.ku.edu<mailto:taxacom-owner at mailman.nhm.ku.edu>
The Taxacom email archive back to 1992 can be searched at: http://taxacom.markmail.org<http://taxacom.markmail.org>
Nurturing nuance while assailing ambiguity for about 34 years, 1987-2021.
_______________________________________________
Taxacom Mailing List
Send Taxacom mailing list submissions to: taxacom at mailman.nhm.ku.edu
For list information; to subscribe or unsubscribe, visit: http://mailman.nhm.ku.edu/cgi-bin/mailman/listinfo/taxacom
You can reach the person managing the list at: taxacom-owner at mailman.nhm.ku.edu
The Taxacom email archive back to 1992 can be searched at: http://taxacom.markmail.org
Nurturing nuance while assailing ambiguity for about 34 years, 1987-2021.
More information about the Taxacom
mailing list