[Taxacom] A question for GBIF regarding data harvests from iNaturalist (Alec McClay)

Stephen Thorpe stephen_thorpe at yahoo.co.nz
Fri Dec 24 14:26:35 CST 2021


 Hi Mark,You are correct, but please be aware than in cases of misidentified RG observations, they will drop back out of GBIF if the misidentification is subsequently corrected on iNat. This can be seen as an improvement in data quality over time. What I am talking about, however, is the reverse. GBIF did have (maybe still does have, for the moment) a solid portfolio of RG Balta bicolor observations, sourced from iNat, and due largely to my work. Danilo Hegg, however, unilaterally decided that both I and our official government biosecurity authority (MPI) were wrong, and so rolled back all the IDs, without any prior discussion. He was not even aware of the relevant facts when he did this! He interpreted those facts, when I informed him of them, in a highly biased way, having already made up his mind on the matter. Nobody else on iNat has the relevant knowledge of the taxa concerned, to make a meaningful judgement. It really comes down to whether or not I was justified in following the ID by MPI, and I suggest this is a perfectly reasonable thing to do, in the absence of any convincing evidence to the contrary (and merely looking superficially like a roach in the genus Ellipsidion is not convincing evidence!) So, here we have a problem in which good, solid data, already on GBIF, can be removed by the actions of a single stubborn and misguided iNat user. This does not lead to an improvement in data quality over time, quite the reverse!Cheers, Stephen
    On Saturday, 25 December 2021, 09:11:10 am NZDT, Mark Egger via Taxacom <taxacom at mailman.nhm.ku.edu> wrote:  
 
 As a “curator” for a few genera of flowering plants of iNaturalist, I would urge great caution in assessing the reliability of “data harvests” from the site, even/especially for “Research Grade” observations. In the genera I monitor, I very frequently come across RG identifications that are clearly incorrect. This stems from the fact that one inexperienced user can “confirm” a machine-generated suggestion, and then this faulty identification can be confirmed by an equally inexperienced observer, often a friend of the original poster. That’s all that is needed to make an observation RG! I’ve seen cases where an obviously incorrect identification has been confirmed by as many as 4 or 5 users. Of course, herbarium specimens are subject to incorrect identifications as well and still end up in public databases, but the sheer number of observations coming  in daily to iNaturalist makes it particularly subject to these sorts of errors, especially when a certain percentage of the posters don’t seem able to distinguish a red fall leaf from a red-colored inflorescence. While curators find and correct many such cases, many more undoubtedly escape timely detection.

That being said, iNaturalist is a wonderful tool for collecting observations from regions that are understudied, and real discoveries can be made there. For instance, I’ve seen posts of what are clearly undescribed or little-known species, especially from Mexico. But the fact remains that bulk data imports from iNat should be analyzed carefully prior to using them, especially for distributional studies. And it is also vital to understand that the label of RG does not at all mean that the identification has been made by experts in the given group. Sometimes this is the case, but in many others it is definitely not.

Mark

> On Dec 24, 2021, at 10:00 AM, taxacom-request at mailman.nhm.ku.edu wrote:
> 
> Daily News from the Taxacom Mailing List 
> 
> When responding to a message, please do not copy the entire digest into your reply.
> ____________________________________
> 
> 
> Today's Topics:
> 
>  1. Re: A question for GBIF regarding data harvests from
>      iNaturalist (Alec McClay)
> 
> 
> ----------------------------------------------------------------------
> 
> Message: 1
> Date: Fri, 24 Dec 2021 12:17:47 -0500
> From: Alec McClay <alec.mcclay at shaw.ca>
> To: taxacom at mailman.nhm.ku.edu
> Subject: Re: [Taxacom] A question for GBIF regarding data harvests
>     from    iNaturalist
> Message-ID: <eccd6f76-0c80-8ea0-8ce2-6a09189a75b3 at shaw.ca>
> Content-Type: text/plain; charset=UTF-8; format=flowed
> 
> According to this discussion on the iNaturalist forum 
> https://forum.inaturalist.org/t/observations-of-cultivated-plants-on-gbif/5296/4 
> (from someone who works for GBIF) "For the record, the good iNat folks 
> control what goes in the dataset that we ingest. If they 
> add/update/delete a record, we do the same when ingesting".
> 
> Merry Christmas and Happy New Year to all.
> 
> Alec.
> 
> On 2021-12-23 1:00 p.m., taxacom-request at mailman.nhm.ku.edu wrote:
>> Date: Wed, 22 Dec 2021 20:35:02 +0000 (UTC)
>> From: Stephen Thorpe<stephen_thorpe at yahoo.co.nz>
>> To:"jmiller at gbif.org"  <jmiller at gbif.org>,  Taxacom
>>     <taxacom at mailman.nhm.ku.edu>
>> Subject: [Taxacom] A question for GBIF regarding data harvests from
>>     iNaturalist
>> Message-ID:<1223057220.1448512.1640205302753 at mail.yahoo.com>
>> Content-Type: text/plain; charset=UTF-8
>> 
>> Hi Joe,
>> As you know, GBIF periodically harvests Research Grade observations from iNaturalist. What isn't quite clear, but which I think would be well worth clarifying, if you could, please, is what happens to observations which drop back out of Research Grade? Do they drop out of GBIF at the next harvest? This is important for the reason that there are two types of cases, and the consequences are very different for observations of each type: (1) observations of well-known species; and (2) observations reliant on expert IDs.
>> For type (1) observations, it can be reasonably assumed that dropping back out of RG will rarely happen, and if it does happen for inadequate reasons, then the community ID will be restored fairly quickly, since it involves a well-known species that many iNat users are familiar with.
>> For type (2) observations, however, IDs may be based on just a couple of experts. An RG observation of this kind can be dropped out of RG by any iNat user, who chooses to disagree for whatever reason, be it scientific or personal or whatever. The lack of further experts means that RG is likely not to be able to be restored very easily!
>> So, my question is, for type (2) observations that were RG long enough to have been harvested by GBIF, if they subsequently drop out of RG on iNat, do they drop out of GBIF at the next data harvest? If so, then data already in GBIF, harvested from iNat, is vulnerable to the whims of single users on iNat, which, to my mind at least, is a concern!
>> Cheers, Stephen
> 
> -- 
> Alec McClay
> 12 Roseglen Private
> Ottawa, ON K1H 1B6
> Canada
> 613-739-8499 (home)
> 343-988-4077 (mobile)
> 
> 
> 
> ------------------------------
> 
> Subject: Digest Footer
> 
> Taxacom Mailing List
> 
> Send Taxacom mailing list submissions to taxacom at mailman.nhm.ku.edu
> For list information; to subscribe or unsubscribe, visit: http://mailman.nhm.ku.edu/cgi-bin/mailman/listinfo/taxacom
> You can reach the person managing the list at: taxacom-owner at mailman.nhm.ku.edu
> The Taxacom email archive back to 1992 can be searched at: http://taxacom.markmail.org
> 
> Nurturing nuance while assailing ambiguity for about 34 years, 1987-2021.
> 
> 
> ------------------------------
> 
> End of Taxacom Digest, Vol 188, Issue 18
> ****************************************

_______________________________________________
Taxacom Mailing List

Send Taxacom mailing list submissions to: taxacom at mailman.nhm.ku.edu
For list information; to subscribe or unsubscribe, visit: http://mailman.nhm.ku.edu/cgi-bin/mailman/listinfo/taxacom
You can reach the person managing the list at: taxacom-owner at mailman.nhm.ku.edu
The Taxacom email archive back to 1992 can be searched at: http://taxacom.markmail.org

Nurturing nuance while assailing ambiguity for about 34 years, 1987-2021.
  


More information about the Taxacom mailing list