Linked Data Explained: You’re No Dummy

Linked data has been a hot topic lately for both web developers and librarians. But unless you are highly technical, many of the online definitions for linked data might make your eyes glaze over and wonder just what exactly is this thing everyone is talking about. Don’t worry - you are not a dummy. But computers definitely are, which is exactly why we need linked data on the web.

Why linked data?

To understand linked data, it is first helpful to understand why it is important. The Web, as it was first developed, consists of a series of documents (or web pages) linked together. Traditionally these pages contained unstructured information, formatted simply as text. Users could click on links and visit pages on a variety of websites, learning knowledge about a subject as they surfed the web. But while computers and search engines can index millions of web pages, they aren’t very smart about making sense of all that information. Search engines can only “read” the web page as text, and “see” which sites are linked. To a computer, the text on a web page is just a pile of keywords. Humans, on the other hand, can understand how different pieces of information relate to each other, and it is these relationships that add meaning to data. In order for computers to understand these relationships, we need to add structure to the data. Humans can deconstruct the information on a web page, and mark it up using tags that computers can understand, which enables robotic search engines to better find information for users. Even more importantly, search engines can link this data across websites, creating large graphs of information built from pages across the entire web. These knowledge graphs can be used by search engines and third-party applications to enable users to quickly find the answers they need and access more complete information across multiple pages and “silos” of information. Data once hidden away in a database can become more visible and findable once linked data structure and relationships are added.

Adding structure to data

To add structure to data, linked data relies on a concept called RDF (resource description framework.) To understand how RDF adds structure to information, think back to your elementary school grammar lessons. We learned about proper sentence structure, where every sentence must contain a subject (a noun), a predicate (a verb), and an object (a noun). RDF uses this same concept to create what is called a “triple”. Triples are important to specify relationships between keywords, and allows computers to link information across the world wide web. For example, think of a simple fact expressed as a triple: (“John Steinbeck”)(“wrote”)(“East of Eden”). In addition to expressing relationships as triples, linked data also employs common vocabularies for specific terms, so that “John Steinbeck” would be identified as the name of a person, and “East of Eden” would be identified as the name of a book. A common vocabulary that is used is from schema.org.

Linked data graphs

Adding structure to your data is the necessary ingredient, but seeing how this data connects and relates across the web is where linked data becomes really powerful. For example, your web page could mention “East of Eden” is a book by John Steinbeck. Another web page says that John Steinbeck lived in Salinas, California. Now perhaps there is another web page on another website that says that “East of Eden” features a character named Cathy Ames. Yet another site states that “East of Eden” is set in Salinas, California. By each website properly marking up this data with the relationships, a computer can now understand all these relationships and know that John Steinbeck created a character named Cathy Ames, and she lived in Salinas, California. Each piece of structured information builds upon each other, creating a large graph of information that can be used by search engines or other applications.

Linked data graph example using John Steinbeck

Google uses this type of information graph to automatically generate answers to user queries.

Google screenshot for the query "Who is Cathy Ames?"

A website or app could also use this type of linked data to automatically generate pages of places where famous people lived or books took place. The possibilities are endless to help make information more findable and ready to use.

Linked data opportunities for libraries

While linked data creates opportunities for all kinds of organizations to help make their content more discoverable across the web, this is particularly so for libraries. Libraries share a mission to make information easily accessible to all, and individual libraries have unique collections and content that contribute to the entirety of human knowledge. As libraries increasingly adapt to the digital landscape, one area that presents a huge opportunity is to convert library metadata into linked data. Exposing library catalog information as linked data can provide more visibility to your holdings and highlight unique items that relate to other items across the web. Making resources more findable will help your users better find information to assist them in their research and work. For a good example, see OCLC’s WorldCat, which, as of April 2014, includes 197 million bibliographic work descriptions as linked data.

Want to learn more about linked data?

Watch Tim Berners-Lee’s TED Talk: “The Next Web”