Seeing Your Data Differently Through The Lens Of Linked Data

by Erin Canning, Ontology Systems Analyst at LINCS (Linked Infrastructure for Networked Cultural Scholarship); @eecanning on Twitter

Linked data is a way of structuring data so that it can be connected to data from other sources and queried all together. For museums, this discussion often comes up in regards to collections data: how can different cultural heritage institutions structure their data so that someone could ask a question of multiple museum databases at the same time? “Show me all the museums with works made by Rebecca Belmore” for example. This involves two components:

Shared references to authorities, so that we can say with certainty that “Rebecca Belmore” in two databases is the same artist, and not different artists who happen to have the same name.
Shared structure, so that when we ask about “works made by” somebody, all of the data sources understand what this concept of something being made by somebody actually is.

With shared authorities we can link together many datasets, and with shared structures we can ask questions formulated in the same way.

So what is the value of linking collections data? Doing so can help museums to share information about their collections with new audiences, to help researchers and academics to ask comprehensive and complex researcher questions, and to add new information to their own databases by pulling in knowledge from external sources. Linked data can help museums to share information out, and bring information in: after all, there are many experts working at institutions around the world.

There are also internal benefits to cultural heritage institutions engaging in linked data solely within their own institutions, even if they are not sharing that data out. For example, if multiple internal databases are structured to share references to the same authorities, then a change only needs to occur once LP in a database, at the authority level, in order to be made across the many places that data is used. The LUX project at Yale University has been exploring such benefits, discussed recently by Timothy Thompson and Robert Sanderson at the 2021 LD4 conference.

Engaging in linked data also has a second opportunity: it can give museums, and those who work with museum data, a way to see their data in a new light as it gets transformed into a new structure. After all, the databases we use to hold information do not just store data, but work to create meaning. The structure of the database defines what fields are allowed, which other fields they relate to, and what kinds of relationships are permissible. As such, this structure creates a world of meaning for the data that then gets entered, and tells users what kinds of information can be considered as valid data. After all, if there is no field in the database for the kind of information or relationship that someone might want to record, then that information must sit outside of the institutional source of information. This is not about a cat sitting in any box regardless of fit so much as there not being a box for the cat to sit in at all.

Therefore, exploring new or alternative data structures gives museums an opportunity to consider what information infrastructures exist in the context of their data and what this has meant for how they think about their data. To begin with, linked data ontologies allow users to precisely define the nature of the relationships that exist between different fields: what does it mean for one field to reference another, or for a foreign key constraint to exist in a particular area? Linked data, structured as triples (object-predicate-subject) allow us to name that relationship.

Linked data structures can also do more than just naming the kinds of connections that exist in our data. One of the most established ontologies for cultural heritage data, CIDOC CRM, transforms data further by placing it in an event-centric point of view. The CRM focuses on events—actions, occurrences, labour—as central to the creation of any information. In this structure, a painting does not have a painter (artwork<->person, or artwork<->hasA<->person), but a painter took actions, used materials and space and influences, to create a painting at some point in time (artwork<->event<->person, or artwork<->cameFrom<->event<->performedBy<->person). In this view, the data structure says that not only can we think about naming what the relationship between an object and a creator might be, but we can pull it apart even further to focus on the labour that took place to create—or conserve, move, share, etc.—the object in the first place. In a collections database the object is the central element around which all other data revolves; in this structure, the work that people do is the item of central concern. It allows us to view objects from a viewpoint that prioritizes people.

CIDOC CRM is the core ontology that we use at my workplace, LINCS, a digital humanities infrastructure project that seeks to bring together research datasets made by humanities and social science researchers along with those of cultural heritage institutions. This event-centric framing supports a level of abstraction that helps us to build connections across these diverse datasets, but it also brings focus on the people that are hidden in every level of the data: the first question we have to ask of any dataset coming into LINCS is, what acts of labour were involved in making the things described in this data happen?

The work that it takes to create data is often hidden, especially as far as the history of museum documentation is concerned. The relationship between museum documentation and labour has a deep history, and I would encourage interested folks to check out work such as Hannah Turner’s recent book Cataloguing Culture for a deeper dive. In the meantime, it is a worthwhile exercise to start to think about what the data structures you use mean for how you understand your data, and to ask what other ways of understanding might be possible.

Have you recently done research on a specific area in your field? Do you have a topic at top of mind? Contribute to the MCN Insights blog! Check out the series guidelines, then fill out this FORM for consideration.