In light of our recent study "DRIVER Technology Watch Report", we've started a little experiment adding semantic annotations to all records stored in the Academic Bibliography. With help of minor changes to the application, which are invisible to the human eye but easily readable by machines harvesting our web pages, we hope to gain insight in new models for data exchange.
Traditionally library applications exchanged metadata by providing regular dumps of all the data contained in the databases. These dumps in the MARC interchange format allowed for periodic synchronization of remote databases. Although MARC is still widely used in libraries it became very cumbersome to get access to up to date dumps.
In the '90s of the previous century, during the advent of the Internet, several proposals saw the dawn of light to provided easier, simplified access to library data sets. One of the more successful solutions was the Open Archives Initiative Protocol for Metadata Harvesting [OAI-PMH]. Herbert Van de Sompel, of Ghent University, played a leading role in the development of OAI-PMH. A protocol, persistent links and identifiers were introduced to get 24/7 access to the complete dataset of applications such as institutional repositories and library catalogs (see: http://biblio.ugent.be/oai and http://aleph.ugent.be:8080/OAI/rug01 for local OAI-PMH endpoints to the Academic Bibliography and Aleph catalog resp.)
Many examples of easier access to data sets using OAI-PMH are visible on the Internet, most notably: OAIster (a US initiative providing access to 23 million publications worldwide) and DRIVER (an European initiative providing access to 1 million full-text articles in Europe). For further information on programming tools to access OAI-PMH collections look here.
Although the OAI-PMH protocol is very widely adopted in the library and e-learning community it has got mixed acceptance outside this world. Google supported for some time OAI-PMH to update the Google Scholar data set but switched to Sitemaps later on (see the sitemap of the Academic Bibliography here). Microsoft is currently supporting OAI-PMH in its Zenity platform, but it is not clear how OAI-PMH datasets are used in its Bing search engines.
In the DRIVER technology study our team at Ghent University (Karen Van Godtsenhoven, Peter Reyniers and myself) looked into contemporary solutions for data exchange on the World Wide Web. I would like to highlight two of them: Linked Data and microformats.
Linked Data
Linked Data, also called the Semantic Web (with capital 'S'), or Web 3.0, gained a huge boost after Berners-Lee presentation at TED beginning this year. Using the existing HTTP protocol (esp. Content negotiation ), Cool URI's and RDF a web of interlinked semantic enriched web pages are made available to the world. When Web 1.0 web pages focussed on providing data that are interpretable by humans, the Semantic Web adds interpretability for machines. Berners-Lee outlines the four principles of Linked Data:
Microformats
Microformats are the se
mantic web (with lower-case 's'). They tend to be a more lightweight solution to add semantic annotations to web pages. This technique gained popularity in the blogosphere in the early 2000's when bloggers started to add blogroll hyperlinks to eachother. This was done by adding rel="friend" attributes to HTML anchors. Invisible to the human eye, these attributes can be picked up by services such as Technorati and Feedster to show how various blogs are interconnected.
Started at grassroots level microformats gained huge popularity. Bill Gates states in the opening keynote of Mix06:
"We need microformats and to get people agree on them. It is going to bootstrap exchanging data on the Web [...] we need them for things like contact cards, events, directions [...]"
Yahoo, Google and Microsoft are adding support for indexing microformats in their search engines. Popular formats are:
The Academic Bibliography adds hCard microformats to the personal webpagines of each author and full record views. Firefox adds a nice plugin called Operator to view these annotations in web pages. We hope our microformats will be picked up by the big search engines to ease the discovery of publications created by our researchers.
We encourage reuse of our data. If you want/need access to our datasets, please take a look at our Linked Data/microformats or one of the other numerous API's we provide:
In the '90s of the previous century, during the advent of the Internet, several proposals saw the dawn of light to provided easier, simplified access to library data sets. One of the more successful solutions was the Open Archives Initiative Protocol for Metadata Harvesting [OAI-PMH]. Herbert Van de Sompel, of Ghent University, played a leading role in the development of OAI-PMH. A protocol, persistent links and identifiers were introduced to get 24/7 access to the complete dataset of applications such as institutional repositories and library catalogs (see: http://biblio.ugent.be/oai and http://aleph.ugent.be:8080/OAI/rug01 for local OAI-PMH endpoints to the Academic Bibliography and Aleph catalog resp.)
Many examples of easier access to data sets using OAI-PMH are visible on the Internet, most notably: OAIster (a US initiative providing access to 23 million publications worldwide) and DRIVER (an European initiative providing access to 1 million full-text articles in Europe). For further information on programming tools to access OAI-PMH collections look here.
Although the OAI-PMH protocol is very widely adopted in the library and e-learning community it has got mixed acceptance outside this world. Google supported for some time OAI-PMH to update the Google Scholar data set but switched to Sitemaps later on (see the sitemap of the Academic Bibliography here). Microsoft is currently supporting OAI-PMH in its Zenity platform, but it is not clear how OAI-PMH datasets are used in its Bing search engines.
In the DRIVER technology study our team at Ghent University (Karen Van Godtsenhoven, Peter Reyniers and myself) looked into contemporary solutions for data exchange on the World Wide Web. I would like to highlight two of them: Linked Data and microformats.
Linked Data
Linked Data, also called the Semantic Web (with capital 'S'), or Web 3.0, gained a huge boost after Berners-Lee presentation at TED beginning this year. Using the existing HTTP protocol (esp. Content negotiation ), Cool URI's and RDF a web of interlinked semantic enriched web pages are made available to the world. When Web 1.0 web pages focussed on providing data that are interpretable by humans, the Semantic Web adds interpretability for machines. Berners-Lee outlines the four principles of Linked Data:
- Use URI's to identify things (e.g. persons, organizations)
- Use HTTP URI's so that these things can be referred to and looked up (e.g. https://biblio.ugent.be/person/801000413319 or https://biblio.ugent.be/organization/GE07)
- Provide useful information when the URI is looked up (e.g. we provide a list of all publications of a person or organization in HTML but also RDF format - via content-negotiation)
- Include links to other, related URIs in the exposed data to improve discovery of other related information on the Web (e.g. we link to other the publications of co-authors, to the phonebook of the university)
Microformats
Microformats are the se
mantic web (with lower-case 's'). They tend to be a more lightweight solution to add semantic annotations to web pages. This technique gained popularity in the blogosphere in the early 2000's when bloggers started to add blogroll hyperlinks to eachother. This was done by adding rel="friend" attributes to HTML anchors. Invisible to the human eye, these attributes can be picked up by services such as Technorati and Feedster to show how various blogs are interconnected.Started at grassroots level microformats gained huge popularity. Bill Gates states in the opening keynote of Mix06:
"We need microformats and to get people agree on them. It is going to bootstrap exchanging data on the Web [...] we need them for things like contact cards, events, directions [...]"
Yahoo, Google and Microsoft are adding support for indexing microformats in their search engines. Popular formats are:
- hCard for personal or organization contact info
- hCalendar for event decriptions and timelines
- hAtom for syndicated content as might appear in an RSS feed
- hReview to record review ratings such as "8.5 out of 10"
- XFN to track relationships on the social graph in a lightweight fashion
The Academic Bibliography adds hCard microformats to the personal webpagines of each author and full record views. Firefox adds a nice plugin called Operator to view these annotations in web pages. We hope our microformats will be picked up by the big search engines to ease the discovery of publications created by our researchers.
We encourage reuse of our data. If you want/need access to our datasets, please take a look at our Linked Data/microformats or one of the other numerous API's we provide:
- OAI-PMH - http://biblio.ugent.be/oai [OAI-PMH data synchronization protocol]
- SRU - http://biblio.ugent.be/sru [SRU search protocol. see: documentation]
- Dublin Core - https://biblio.ugent.be/dc/664840 [DC/XML representation of record number 664840]
- OAI-ORE - https://biblio.ugent.be/ore/664840 [RDF representation of record number 664840]
- MODS - https://biblio.ugent.be/mods/664840 [MODS representation of record number 664840]
- MPEG-21/DIDL - https://biblio.ugent.be/didl/664840 [DIDL representation of record number 664840]
- METS - https://biblio.ugent.be/mets/664840 [METS representation of record number 664840]
- Sitemaps - https://biblio.ugent.be/siteindex.xml
- RSS -
- https://biblio.ugent.be/person/rss/801000413319 (RSS 1.0 feed for all publications by author with UGent identifiers 801000413319)
- https://biblio.ugent.be/organization/rss/WE05 (RSS 1.0 feed for all publications by UGent department WE05)
- https://biblio.ugent.be/classification/rss/A1 (RSS 1.0 feed for all UGent A1 classification publications)
- https://biblio.ugent.be/person/rss/801000413319 (RSS 1.0 feed for all publications by author with UGent identifiers 801000413319)

Leave a comment