Linked-Data in the Academic Bibliography

| 1 Comment
In light of our recent study "DRIVER Technology Watch Report", we've started a little experiment  adding semantic annotations to all records stored in the Academic Bibliography. With help of minor changes to the application, which are invisible to the human eye  but easily readable by machines harvesting our web pages, we hope to gain insight in new models for data exchange.

Traditionally library applications exchanged metadata by providing regular dumps of all the data contained in the databases. These dumps in the MARC interchange format allowed for periodic synchronization of remote databases. Although  MARC is still widely used in libraries it became very cumbersome to get access to up to date dumps.

In the '90s of the previous century, during the advent of the Internet, several proposals saw the dawn of light to provided easier, simplified access to library data sets. One of the more successful  solutions was the Open Archives Initiative Protocol for Metadata Harvesting [OAI-PMH]. Herbert Van de Sompel, of Ghent University, played a leading role in the development of OAI-PMH. A protocol, persistent links and identifiers were introduced to get 24/7 access to the complete dataset of applications such as institutional repositories and library catalogs (see: http://biblio.ugent.be/oai and http://aleph.ugent.be:8080/OAI/rug01 for local OAI-PMH endpoints to the Academic Bibliography and Aleph catalog resp.)

Many examples of easier access to data sets using OAI-PMH are visible on the Internet, most notably: OAIster (a US initiative providing access to 23 million publications worldwide) and DRIVER (an European initiative providing access to 1 million full-text articles in Europe). For further information on programming tools to access OAI-PMH collections look here.

Although the OAI-PMH protocol is very widely adopted in the library and e-learning community it has got mixed acceptance outside this world. Google supported for some time OAI-PMH to update the Google Scholar data set but switched to Sitemaps later on (see the sitemap of the Academic Bibliography here). Microsoft is currently supporting OAI-PMH in its Zenity platform, but it is not clear how OAI-PMH datasets are used in its Bing search engines.

In the DRIVER technology study our team at Ghent University (Karen Van Godtsenhoven, Peter Reyniers and myself) looked into contemporary solutions for data exchange on the World Wide Web. I would like to highlight two of them: Linked Data and microformats.

Linked Data



Linked Data, also called the Semantic Web (with capital 'S'), or Web 3.0, gained a huge boost after Berners-Lee presentation at TED beginning this year. Using the existing HTTP protocol (esp. Content negotiation ), Cool URI's and RDF a web of interlinked semantic enriched web pages are made available to the world. When Web 1.0 web pages focussed on providing data that are interpretable by humans, the Semantic Web adds interpretability for machines. Berners-Lee outlines the four principles of Linked Data:

  1. Use URI's to identify things (e.g. persons, organizations)
  2. Use HTTP URI's so that these things can be referred to and looked up (e.g. https://biblio.ugent.be/person/801000413319 or https://biblio.ugent.be/organization/GE07)
  3. Provide useful information when the URI is looked up (e.g. we provide a list of all publications of a person or organization in HTML but also RDF format - via content-negotiation)
  4. Include links to other, related URIs in the exposed data to improve discovery of other related information on the Web (e.g. we link to other the publications of co-authors, to the phonebook of the university)
We've also added Linked Data to each record of the Academic Bibliography (see https://biblio.ugent.be/ore/664840). This representation shows the OAI-ORE view of a publication, which is a new OAI protocol and format in line with Linked Data. A great overview of the possibilities of Linked Data and OAI-ORE can be found on the webpage of the Search & Find mini conference we had last year at Ghent University with expert speakers such as Ivan Herman (W3C) and Herbert Van de Sompel (LANL).

Microformats

Microformats are the se
microformats-logo.pngmantic web (with lower-case 's'). They tend to be a more lightweight solution to add semantic annotations to web pages. This technique gained popularity in the  blogosphere in the early 2000's when bloggers started to add blogroll hyperlinks to eachother. This was done by adding rel="friend" attributes to HTML anchors. Invisible to the human eye, these attributes can be picked up by services such as Technorati and Feedster to show how various blogs are interconnected.

Started at grassroots level microformats gained huge popularity. Bill Gates states in the opening keynote of Mix06:

"We need microform
ats and to get people agree on them. It is going to bootstrap exchanging data on the Web [...] we need them for things like contact cards, events, directions [...]"

Yahoo, Google and Microsoft are adding support for indexing microformats in their search engines. Popular formats are:

  • hCard for personal or organization contact info
  • hCalendar for event decriptions and timelines
  • hAtom for syndicated content as might appear in an RSS feed
  • hReview to record review ratings such as "8.5 out of 10"
  • XFN to track relationships on the social graph in a lightweight fashion
[source: http://developer.yahoo.net/blog/archives/2008/05/are_microformat.html]

The Academic Bibliography adds hCard microformats to the personal webpagines of each author and full record views. Firefox adds a nice plugin called Operator to view these annotations in web pages. We hope our microformats will be picked up by the big search engines to ease the discovery of publications created by our researchers.


We encourage reuse of our data. If you want/need access to our datasets, please take a look at our Linked Data/microformats or one of the other numerous API's we provide:

1 Comment

Hey - nice blog, just looking around some websites, seems a really nice platform you are using. I'm currently using Wordpress for a few of my sites but looking to change one of them over to a platform similar to yours as a trial run.
jasminlive

Leave a comment

About this Entry

This page contains a single entry by Patrick Hochstenbach published on October 7, 2009 3:59 PM.

Meercat catalogus toevoegen aan uw website is the next entry in this blog.

Find recent content on the main index or look in the archives to find all content.

Categories

Pages

OpenID accepted here Learn more about OpenID