Tolkien Linked Open Data Project

Initial Concept

From its inception, the Digital Tolkien Project has intended to produce Linked Data. The home page has always mentioned “Linked Open Data around people, places, events” but other high-level goals such as “Citation schemes, chronology, and bibliography in modern electronic formats” essentially amount to some sort of Linked Data approach too. Even the “Machine-actionable invented language description” relates to Linguistic Linked Open Data.

The Digital Tolkien Project has also always had the goal of building what is to things like the Tolkien Gateway what Wikidata or DBpedia are to Wikipedia. Wikidata and DBpedia are Linked Open Data repositories.

At the heart of the principles of Linked Data is the idea of identifying things with URIs. Structured data about those things can then be expressed using a controlled vocabulary and serialized in formats like JSON-LD. Linked Open Data is just Linked Data that has an open license.

The power of Linked Data is that it does not require the information to be centralized. If there is agreement on the identifying URIs and the controlled vocabulary, different groups can contribute information in an interoperable way. I am therefore proposing that the Digital Tolkien Project work with a range of other projects to agree on identifiers for Tolkien Linked Open Data. To encourage interoperability, I propose the domain name used for the URIs not be digitaltolkien.com but rather something more neutral, even if the Digital Tolkien Project takes the responsibility of managing the project and infrastructure. The structured data and controlled vocabulary can follow once identifiers start to be agreed upon.

The sorts of things we will want to develop identifiers for include (but are not limited to):

  • passages of published texts (based on the citation system efforts of DTP, LRC, etc)
  • works and editions and manifestations (based on FRBRoo and some of the initial bibliographic modelling work)
  • artwork
  • manuscripts
  • secondary-world characters
  • secondary-world places
  • secondary-world events
  • lexical items (English, foreign, and invented)
  • letters and other primary-world documents
  • primary-world people (much of this will already exist)
  • primary-world places (much of this will already exist)

Many identifiers can build on existing work: TCG Letter numbers, passage citations, Tolkien Art Index, eldamo, etc. Entities can easily be linked to alternative identifiers (OED entries, VIAF, geonames.org, etc as well as things like Wikidata and DBpedia themselves where they exist) and also web pages (e.g.the relevant entries in the TCG Letters Guide, the Tolkien Gateway, etc).

Controlled vocabularies can be layered on top of these for different purposes. There is a lot of existing work on controlled vocabularies that is relevant but the identifiers are orthogonal to this (one of the virtues of the Linked Data approach).

One of the things all this would enable is better linking of resources in a decentralized way. For example, the Prancing Pony Podcast could index their episodes to passages or Philology Fair words discussed. Signum could index Exploring Lord of the Rings. Scholarly articles in Mallorn and other journals discussing characters or passages or words could be indexed. Ardacraft locations could be linked to passages of texts. Artwork could be linked to characters or places or events. Jordan Rannell’s soundscapes could link timecode to passage.

This sort of thing can be built (even by independent groups of people) if we firstly have agreed upon identifiers. A consistent ontology and controlled vocabulary can come next.

The next steps are to:

  • get interested parties in the loop
  • set up a regular discussion cycle
  • continue to develop the types of entities we want to identify
  • agree upon a URI scheme for each entity type

The artifacts that will be produced out of this project (at various stages) are:

  • documentation on identifier schemes
  • documentation on controlled vocabularies and schemas
  • Linked Open Data available as a data dump
  • Linked Open Data available via API (e.g. SPARQL)
  • tooling around both one-off and on-going extraction of data from other sources
  • tooling around annotation, ontology editing, etc (e.g. rimbë)