Skip to content

The flowering ORCID

As part of my Large Datasets campaign, I have now downloaded and processed the latest data from ORCID. This yielded 655,706 people (47,435 or 7% in Wikidata), and 13,438,786 publications (1,079,305 or 8% in Wikidata) with a DOI or PubMed ID (to be precise, these are publications-per-person, so the same paper might be counted multiple times; however, that’s still 1,033,146 unique Wikidata items, so not much of a difference).

Number of papers, ORCID, first, and last name

Looking at the data, there are 14,883 authors, with ten or more papers already on Wikidata, that do either not have an item, or their item does not have an ORCID ID associated. So I am now setting a bot (my trusted Reinheitsgebot) to work at creating items for those authors, and then changing the appropriate author name string statement to author proper, preserving qualifiers and references, and adding the original name string as a new qualifier (like so).

By chance, one of the most prolific authors of scientific publications not yet on Wikidata turned out to be a (distant) colleague of mine, Rick Price, who is now linked as the author of ~100 papers.

I have now set the bot to create the author items for the authors with >=10 papers on Wikidata. I am aware that ORCID authorships are essentially “self-reported”, but I do check that a paper if not claimed by two people with the same surname in the ORCID dataset (in which case I pass it over). Please report any systematic (!) bot malfunctions to me through the usual channels.

Update: This will create up to 263,893 new author (P50) links on Wikidata.