Skip to content

Recommendations

Reading Recommending Images to Wikidata Items by Miriam, which highlights missing areas of image coverage in Wikidata (despite being the most complete site in the WikimediaVerse, image-wise), and strategies to address the issue, I was reminded of an annoying problem I have run into a few times.

My WD-FIST tool uses (primarily) SPARQL to find items that might require images, and that usually works well. However, some larger queries do time out, either on SPARQL, or the subsequent image discovery/filtering steps. Getting a list of all items about women with image candidates occasionally works, but not reliably so; all humans is out of the question.

So I started an extension to WD-FIST: A caching mechanism that would run some large queries in a slightly different way, on a regular basis, and offer the results in the well-known WD-FIST interface. My first attempt is “humans”, and you can see some results here. As of now, there are 275,500 candidate images for 160,508 items; the link shows you all images that are used on three or more Wikipedias associated with the same item (to improve signal-to-noise ratio).

One drawback of this system is that it has some “false positive” items; because it bypasses SPARQL, it gets some items that link to “human” (Q5), but not via “instance of” (P31). Also, matching an image to an items, or using “ignore” on the images, might not immediately reflect on reload, but the daily update should take care of that.

Update code is here.