[Taxacom] Paywall our taxonomic tidbit

Mon Jan 18 02:02:47 CST 2016

In a sense, this whole discussion is misguided. The Internet is not about articles as we have been used to in the pre-digital age, and not about a PDF, even though we can ship this via email or in some cases access via a mouse click (Open Access). 

The Internet is about linking data and building a knowledge management system or knowledge graph. This is well beyond the sum of data in the articles. And paywalls are walls that inhibit building such as network. If we maintain them, then we cannot make use of the new properties that the Internet provides us.

Open linked data also allows text and data mining over potentially the entire corpus of not only taxonomic literature, but well beyond. 

Taxonomic articles can be very rich in data. It allows others to look at hour contributions in a way we don't. Bibliographic citations allow to build citation networks and measure the use of our literature. Taxonomic treatments and included citations allow building up by machine the catalogue of life. DNABarcode, collection codes allow to understand who and where specimens of which collection have been used.

A network allows to enter our knowledge from very different angles, such as a specimen, a location, a collector, a DNA-sequence and ask questions of who collected in location x in a give period? Who widely are species distributed in a given area? Who is the host of a particular species? This all, besides being able to look at a single treatment, a single article, a single key.

A lot of the elements for this is developed outside our community and we just need to make us of it. DOIs for articles are the unit for citing and identifying articles supported by CrossRef and DataCite. Persistent identifiers are being used for authors (e.g. ORCID). Solutions adopted in our community are for specimens (e.g. httpURI in CETAF and other subscribers of the Bouchout Declaration), BarCodes, Names (e.g. LSIDs in Zoobank), httpURI for treatments in Plazi. All of them are deployed and are crystallization points for the big network, because they are used such to cross reference, within the sciences and beyond. Wikipata makes use of PIDs from NCBI, IT IS, GBIF, Plazi, EOL, which all are data that orginated from published records.

But more importantly, we have one of the most advanced publication scheme in the sciences with the Biodiversity Data Journal, Zookeys and the reminder of the Pensoft journals. This allows not just getting a PDF or html of the content. But at the moment of publication, the data within is either directly pushed to GBIF, EOL, or Plazi, or  from the latter to NCBI, Wikidata. 

Theses Open Access journals, paid upfront, are much cheaper than the average Daniel listed below.  More importantly, any PDF produced now needs somebody who extracts the data within, such as add the names to dedicated databases, extract body length as traits, extracts observation records, extract images or tables, extract the treatments and bibliographic records, if we want to make this piece of knowledge accessible within the Internet and open for an efficient data mining. So, it is not just publishing or access costs that count, but the almost unsurmountable costs of reuse that prohibit making our biodiversity knowledge part of the global knowledge graph, cloud or just our cultural heritage.

There is no way around Opend Access. If we don't do it, and our knowledge is really that relevant, in the very near future we will pay dearly because the big publishers will not only ask a huge amount of money to access our journals or produce them OA, but more importantly, they will charge hilariously access fee to the knowledge base they create by making use of all the data we deliver them for free. And if you do want to do science, you will depend on this access, that most of us will not be able to afford. 

So, the discussion must be how we build our knowledge management system that makes us part of the bigger picture. And I think the dire state of biodiversity with an increasing pressure, and the exciting rapidly developing genomic data make it adamant that we too use the state of the art tools to communicate our science and provide access to it - especially as our community is among the leaders in this area in the sciences.

Donat

-----Original Message-----
From: Taxacom [mailto:taxacom-bounces at mailman.nhm.ku.edu] On Behalf Of Daniel Mietchen
Sent: Monday, January 18, 2016 3:48 AM
To: Taxacom <taxacom at mailman.nhm.ku.edu>
Subject: Re: [Taxacom] Paywall our taxonomic tidbit

It may be worth considering here that in the current system, billions of dollars are going to the publishing industry every year already (globally, and across all disciplines), and have been doing so for many years.

From http://doi.org/10.1038/495426a : "Data from the consulting firm Outsell in Burlingame, California, suggest that the science-publishing industry generated $9.4 billion in revenue in 2011 and published around 1.8 million English-language articles — an average revenue per article of roughly $5,000. Analysts estimate profit margins at 20–30% for the industry, so the average cost to the publisher of producing an article is likely to be around $3,500–4,000."

Most of this is through subscriptions (by libraries, corporations or individuals), some of it through advertising, some from other sources (e.g. database access, membership schemes). Most of this is invisible to most researchers, the exceptions being things like page charges or color figure charges in traditional venues or OA fees more recently.

Now consider a thought experiment: If every single one of the ca. 2 million articles we publish every year would be published for an OA fee in the PLOS ONE range (ca. USD 1,500), that would cost USD 3 billion altogether, which is roughly the amount of *profit* the publishing industry is making now.

While many traditional publishers (and especially their hybrid
journals) hover well above those 1,500 dollars, many newer ones have OA fees well below that, often due to more efficient workflows. So if OA at the efficiency of PLOS ONE or better were to replace the traditional publishing model, this would mean significant savings (billions per year eventually) for the scientific community - and thus the public - which we could use to build an infrastructure that would make scholarly communication more efficient, to include things beyond PDF and discovery mechanisms beyond citations and journal TOC alerts.

Besides, the educational value of a paywall to lay readers interested in taxonomy rarely tops that of a relevant OA paper.

Daniel
_______________________________________________
Taxacom Mailing List
Taxacom at mailman.nhm.ku.edu
http://mailman.nhm.ku.edu/cgi-bin/mailman/listinfo/taxacom
The Taxacom Archive back to 1992 may be searched at: http://taxacom.markmail.org

Celebrating 29 years of Taxacom in 2016.