Archive for the ‘semantic web’ Category

Ontology Evaluation – an Essential Part of Ontology Engineering

July 26th, 2012

James Malone reflects on a panel discussion on evaluation and reuse of ontologies. He wants there to be a “a formal, objective and quantifiable process” for “making public judgements on ontologies”. Towards that, he suggests that we need:

  1. A formal set of engineering principles for systematic, disciplined, quantifiable approach to the design, development, operation, and maintenance of ontologies
  2. The use of test driven development, in particular using sets of (if appropriate, user collected) competency questions which an ontology guarantees to answer, with examples of those answers – think of this as similar to unit testing
  3. Cost benefit analysis for adopting frameworks such as upper ontologies, this includes aspects such as cost of training for use in development, cost to end users in understanding ontologies built using such frameworks, cost benefits measured as per metrics such as those above (e.g. answering competency questions) and risk of adoption (such as significant changes or longer term support).

- James Malone, in Why choosing ontologies should not be like choosing Pepsi or Coke, about his International Conference on Biomedical Ontology panel ‘How to deal with sectarianism in biomedical ontology.

Tags: , , ,
Posted in semantic web | Comments (0)

Karen Coyle on Library Linked Data: let’s create data not records

January 12th, 2012

There have been some interesting posts on BIBFRAME recently (noted a few of them).

Karen Coyle also pointed to her recent blog post on transforming bibliographic data into RDF. As she says, for a real library linked data environment,

we need to be creating data, not records, and that we need to create the data first, then build records with it for those applications where records are needed.

Tags: , , , , ,
Posted in information ecosystem, library and information science, semantic web | Comments (1)

A Review of Argumentation for the Social Semantic Web

December 6th, 2011

I’m very pleased to share our “A Review of Argumentation for the Social Semantic Web“.

You are very warmly invited to review this paper. You can post the review as a comment to the manuscript page publicly at SWJ’s website. Informal comments by email are also welcome.

Open review

I adore SWJ’s open review process: publicly available manuscripts are useful. In 11 months the landing page has had “1208 reads” and I’m sure that not all of those are mine! Further, knowing who reviewed a paper can add credibility to the process. (It means quite a lot to me when Simon Buckingham-Shum says “I anticipate that this will become a standard reference for the field.”!)

Two earlier versions

The paper evolved from my first year Ph.D. report. In the process of defining my Ph.D. topic, I reviewed the state-of-art of argumentation for the Social Semantic Web. This was further developed in conversations with my coauthors, my colleague Tudor Groza and my advisor Alexandre Passant.

The outdated first journal submission and second journal submission are available; May’s reviews refer to the first version. A cover letter responding to the reviews summarizes what has changed. Shared since I am always encouraged by seeing how others’ work and ideas have developed over time!

So read the most recent version, and let us know what you think!

Tags: , , , ,
Posted in argumentative discussions, PhD diary, semantic web, social semantic web, social web | Comments (0)

Code4Lib 2012 talk proposals are out

November 21st, 2011

Code4Lib2012 talk proposals are now on the wiki. This year there are 72 proposals for 20-25 slots. I pulled out the talks mentioning semantics (linked data, semantic web, microdata, RDF) for my own convenience (and maybe yours).

Property Graphs And TinkerPop Applications in Digital Libraries

  • Brian Tingle, California Digital Library

TinkerPop is an open source software development group focusing on technologies in the graph database space.
This talk will provide a general introduction to the TinkerPop Graph Stack and the property graph model is uses. The introduction will include code examples and explanations of the property graph models used by the Social Networks in Archival Context project and show how the historical social graph is exposed as a JSON/REST API implemented by a TinkerPop rexster Kibble that contains the application’s graph theory logic. Other graph database applications possible with TinkerPop such as RDF support, and citation analysis will also be discussed.

HTML5 Microdata and

  • Jason Ronallo, North Carolina State University Libraries

When the big search engines announced support for HTML5 microdata and the vocabularies, the balance of power for semantic markup in HTML shifted.

  • What is microdata?
  • Where does microdata fit with regards to other approaches like RDFa and microformats?
  • Where do libraries stand in the worldview of and what can they do about it?
  • How can implementing microdata and optimize your sites for search engines?
  • What tools are available?

“Linked-Data-Ready” Software for Libraries

  • Jennifer Bowen, University of Rochester River Campus Libraries

Linked data is poised to replace MARC as the basis for the new library bibliographic framework. For libraries to benefit from linked data, they must learn about it, experiment with it, demonstrate its usefulness, and take a leadership role in its deployment.

The eXtensible Catalog Organization (XCO) offers open-source software for libraries that is “linked-data-ready.” XC software prepares MARC and Dublin Core metadata for exposure to the semantic web, incorporating FRBR Group 1 entities and registered vocabularies for RDA elements and roles. This presentation will include a software demonstration, proposed software architecture for creation and management of linked data, a vision for how libraries can migrate from MARC to linked data, and an update on XCO progress toward linked data goals.

Your Catalog in Linked Data

  • Tom Johnson, Oregon State University Libraries

Linked Library Data activity over the last year has seen bibliographic data sets and vocabularies proliferating from traditional library
sources. We’ve reached a point where regular libraries don’t have to go it alone to be on the Semantic Web. There is a quickly growing pool of things we can actually ”link to”, and everyone’s existing data can be immediately enriched by participating.

This is a quick and dirty road to getting your catalog onto the Linked Data web. The talk will take you from start to finish, using Free Software tools to establish a namespace, put up a SPARQL endpoint, make a simple data model, convert MARC records to RDF, and link the results to major existing data sets (skipping conveniently over pesky processing time). A small amount of “why linked data?” content will be covered, but the primary goal is to leave you able to reproduce the process and start linking your catalog into the web of data. Appropriate documentation will be on the web.

NoSQL Bibliographic Records: Implementing a Native FRBR Datastore with Redis

  • Jeremy Nelson, Colorado College,

In October, the Library of Congress issued a news release, “A Bibliographic Framework for the Digital Age” outlining a list of requirements for a New Bibliographic Framework Environment. Responding to this challenge, this talk will demonstrate a Redis ( FRBR datastore proof-of-concept that, with a lightweight python-based interface, can meet these requirements.

Because FRBR is an Entity-Relationship model; it is easily implemented as key-value within the primitive data structures provided by Redis. Redis’ flexibility makes it easy to associate arbitrary metadata and vocabularies, like MARC, METS, VRA or MODS, with FRBR entities and inter-operate with legacy and emerging standards and practices like RDA Vocabularies and LinkedData.

ALL TEH METADATAS! or How we use RDF to keep all of the digital object metadata formats thrown at us.

  • Declan Fleming, University of California, San Diego

What’s the right metadata standard to use for a digital repository? There isn’t just one standard that fits documents, videos, newspapers, audio files, local data, etc. And there is no standard to rule them all. So what do you do? At UC San Diego Libraries, we went down a conceptual level and attempted to hold every piece of metadata and give each holding place some context, hopefully in a common namespace. RDF has proven to be the ideal solution, and allows us to work with MODS, PREMIS, MIX, and just about anything else we’ve tried. It also opens up the potential for data re-use and authority control as other metadata owners start thinking about and expressing their data in the same way. I’ll talk about our workflow which takes metadata from a stew of various sources (CSV dumps, spreadsheet data of varying richness, MARC data, and MODS data), normalizes them into METS by our Metadata Specialists who create an assembly plan, and then ingests them into our digital asset management system. The result is a beautiful graph of RDF triples with metadata poised to be expressed as HTML, RSS, METS, XML, and opens linked data possibilities that we are just starting to explore.

UDFR: Building a Registry using Open-Source Semantic Software

  • Stephen Abrams, Associate Director, UC3, California Digital Library
  • Lisa Dawn Colvin, UDFR Project Manager, California Digital Library

Fundamental to effective long-term preservation analysis, planning, and intervention is the deep understanding of the diverse digital formats used to represent content. The Unified Digital Format Registry project (UDFR, will provide an open source platform for an online, semantically-enabled registry of significant format representation information.

We will give an introduction to the UDFR tool and its use within a preservation process.

We will also discuss our experiences of integrating disparate data sources and models into RDF: describing our iterative data modeling process and decisions around integrating vocabularies, data sources and provenance representation.

Finally, we will share how we extended an existing open-source semantic wiki tool, OntoWiki, to create the registry.

saveMLAK: How Librarians, Curators, Archivists and Library Engineers Work Together with Semantic MediaWiki after the Great Earthquake of Japan

  • Yuka Egusa, Senior Researcher of National Institute of Educational Policy Research
  • Makoto Okamoto, Chief Editor of Academic Resource Guide (ARG)

In March 11th 2011, the biggest earthquake and tsunami in the history attacked a large area of northern east region of Japan. A lot of people have worked together to save people in the area. For library community, a wiki named "savelibrary" was launched for sharing information on damages and rescues on the next day of the earthquake. Later then people from museum curators, archivists and community learning centers started similar projects. In April we joined to a project "saveMLAK", and launched a wiki site using Semantic MediaWiki under

As of November 2011, information on over 13,000 cultural organizations are posted on the site by 269 contributors since the launch. The gathered information are organized along with Wiki categories of each type of facilities such library, museum, school, etc. We have held eight edit-a-thons to encourage people to contribute to the wiki.

We will report our activity, how the libraries and museums were damaged and have been recovered with lots of efforts, and how we can do a new style of collaboration with MLAK community, Wiki and other voluntary communities at the crisis.

Conversion by Wikibox, tweaked in Textwrangler. Trimmed email addresses, otherwise these are as-written. Did I miss one? Let me know!

Tags: , , , , , ,
Posted in computer science, library and information science, scholarly communication, semantic web | Comments (0)

Web of data for books?

November 5th, 2011

If you were building a user interface for the Web of data, for books, it just might look like Small Demons.

Unfortunately you can’t see much without logging in, so go get yourself a beta account. (I’ve already complained about asking for a birthday. My new one is 29 Feb 1904, you can help me celebrate in 2012!)

Their data on Ireland is pretty sketchy so far. They do offer to help you buy Guiness on Amazon though. :)

Tags: ,
Posted in books and reading, library and information science, semantic web, social semantic web | Comments (0)

Frank van Harmelen’s laws of information

November 1st, 2011

What are the laws of information? Frank van Harmelen proposes seven laws of information science in his keynote to the Semantic Web community at ISWC2011.1

  1. Factual knowledge is a graph.2
  2. Terminological knowledge is a hierarchy.
  3. Terminological knowledge is much smaller3 than the factual knowledge.
  4. Terminological knowledge is of low complexity.4
  5. Heterogeneity is unavoidable.5
  6. Publication should be distributed, computation should be centralized to decrease speed: “The Web is not a database, and I don’t think it ever will be.”
  7. Knowledge is layered.
What do you think? If they are laws, can they be proven/disproven?

Semantic Web vocabularies in the Tower of Babel

I wish every presentation came with this sort of summary: slides and transcript, presented in a linear fashion. But these laws deserve more attention and discussion–especially from information scientists. So I needed something even punchier to share, (prioritized thanks to Karen).

  1. He presents them as “computer science laws” underlying the Semantic Web; yet they are laws about knowledge. This makes them candidate laws of information science, in my terminology. []
  2. “The vast majority of our factual knowledge consists of simple relationships between things,
    represented as an ground instance of a binary predicate.
    And lots of these relations between things together form a giant graph.” []
  3. by 1-2 orders of magnitude []
  4. This is seen in “the unreasonable effectiveness of low-expressive KR”: ”the information universe is apparently structured in such a way that the double exponential worse case complexity bounds don’t hit us in practice.” []
  5. But heterogeneity is solvable through mostly social, cultural, and economic means (algorithms contribute a little bit). []

Tags: , , ,
Posted in computer science, information ecosystem, library and information science, PhD diary, semantic web | Comments (0)

Quantified Self Europe, two talks proposed

October 12th, 2011

Thanksgiving weekend doesn’t really register in Europe. But this year it will for me: I’m going to Amsterdam for Quantified Self Europe, since I’m lucky enough to have a scholarship covering conference fees.

Today I proposed two talks:

  1. Weight and exercise tracking (which I’ve been doing in various forms for 19 months, currently using a Phillips DirectLife exercise monitor, and a normal scale, collected with the hacker’s diet). Mainly, these are less integrated than they could be, and I’d like to advocate interoperability, APIs, and uniform formats — while hopefully getting some ideas from the audience about quick hacks to improve my current system.
  2. Lifetracking, privacy & the surveillance society. This brings together two themes: First, how individuals’ lifetracking can be seen as a re-enactment of privacy, with changed ideas of what that means (e.g. panopticon, sousveillance, etc.). Second, the increased awareness about the wealth of personal data held by corporations (e.g. German politician Malt Spitz sued to get 6 months of his telcom data). The boundary between public life and private life is continually shifting as communication technology and social norms evolve; this talk investigates how lifetracking and the quantified self movement push the privacy/publicity boundaries in multiple ways. QS increases the public audience for data-driven stories of private lives while also highlighting the need for individuals to control access to and the disposition of their own personal data.

Ironically, self-surveillance was an academic interest of mine before it became a personal one:  Back in 2009, Nathan Yau and I wrote a paper for the ASIST Bulletin about self-surveillance (PDF) [less pretty in HTML]. It helped interest me in the Semantic Web, too: putting data in standard formats would make it easier to make data-driven visualizations, so lifetracking and the quantified self movement is a great usecase for the (social) Semantic Web. QS also shows how privacy cuts both ways and could provide an early-adopter audience for the kind of fine-grained privacy tools a colleague is developing.

(A first reply to Nic’s encouragement)

Tags: , , , , , , ,
Posted in semantic web, social semantic web | Comments (0)

Reading Ontologically?

July 24th, 2011

What are the right ontologies for reading? And what kind of ontology support would let books recombine themselves, on the fly, in novel ways?

Today keyword searches within books and book collections is commonplace, highlighting a word in your ebook reader can bring up a definition, and dictionaries grab recent examples of word use from microblogs.1 But can’t we do more? But what kind of synthesis do we need (and what is possible) for supporting readers of literature, classics, and humanities texts?

Current approaches seem to aim at analysis (e.g. getting an overview of the literary works of a period with “distant reading”/”macroanalysis”) and at creating flexible critical editions (e.g. structural, sometimes overlapping markup, as in TEI-based editions and projects like Wendell Piez’ Sonneteer2.) I would call these “sensemaking” approaches rather than tools for reading.

I was intrigued by the Bible Ontology3 because of their tagline: “ever wanted to read and study the Bible Ontologically?” Yet I don’t really know what they mean by reading ontologically4.

Of course, they have recorded various pieces of data. For instance, for Rebekah, we see her children, siblings, birthplace, book and chapters she figures in, etc.:

Rebekah, from

They offer a SPARQL endpoint, so you can query. For instance, to find all the married women6 (live query result):

PREFIX bop: <>
select ?s ?o where {?s bop:isWifeOf ?o }

Intense and long-term work has gone into Bible concordances, scholarship, etc., so it seems like a great use case for “reading ontologically”. With theologians and others looking at the site, using the SPARQL endpoint, etc., perhaps someone will be able to tell me what that means!

  1. In 2003, Gregory Crane wrote that “Already the books in a digital library are beginning to read one another and to confer among themselves before creating a new synthetic document for review by their human readers.” When I first read it in 2006, that article seemed incredibly visionary to me. Yet these commonplace “syntheses” no longer seem extraordinary to me. []
  2. currently offline, but brilliant; do check back, meanwhile see also his Digital Humanities 2010 talk notes []
  3. It’s a bit disingenuous to advertise their work as an ontology: in fact they have applied the ontology, rather than just creating it. []
  4. even though I’ve given a talk about supporting reading with ontologies! []
  5. The most meaningful of their terms is the bop:isRelatedInEvent, perhaps since these events, like Isaac_blesses_Jacob, would require more analysis to discern. []
  6. Gender is not recorded so we can’t (yet) ask for all the women overall, though I’ve just asked about this. []

Tags: , , , , ,
Posted in books and reading, future of publishing, semantic web | Comments (0)

Enabling a Data Web: Is RDF the only choice?

July 8th, 2011

I’ve been slow in blogging about the Web Science Summer School being held in Galway this week. Check Clare Hooper’s blog for more reactions (starting from her day one post from two days ago).

Wednesday was full of deep and useful talks, but I have to start at the beginning, so I had to wait for a copy of Stefan Decker’s slides.

Hidden in the orientation to DERI, there are a few slides (12-19) which will be new to DERIans. They’re based on an argument Stefan made to the database community recently: any data format enabling the data Web is “more or less” isomorphic to RDF.

The argument goes:
The three enablers for the (document) Web were:

  1. scalability
  2. no censorship
  3. a positive feedback loop (exploiting Metcalf’s Law)1.

Take these as requirements for the data Web. Enabling Metcalf’s Law, according to Stefan, requires:

  1. Global Object Identity.
  2. Composability: The value of data can be increased if it can be combined with other data.

The bulk of his argument focuses on this composability feature. What sort of data format allows composability?

It should:

  1. Have no schema.
  2. Be self-describing.
  3. Be “object centric”.  In order to integrate information about different entities data must be related to these entities.
  4. Be graph-based, because object-centric data sources, when composed, results in a graph, in the general case.

Stefan’s claim is that any data format that fulfills the requirements is “more or less” isomorphic to RDF.

Several parts of this argument confuse me. First, it’s not clear to me that a positive feedback loop is the same as exploiting Metcalf’s Law. Second, can’t information can be composed even when it is not object-centric? (Is it obvious that entities are required, in general?) Third, I vaguely understand that composing object-centric data sources results in a (possibly disjoint) graph: but are graphs the only/best way to think about this? Further, how can I convince myself about this (presumably obvious) fact about data integration.

  1. The value of a communication network is proportional to the number of connections between nodes, or n^2 for n nodes []

Tags: , , , , , , , ,
Posted in information ecosystem, semantic web | Comments (1)

Extended deadline for STLR 2011

April 29th, 2011

We’ve extended the STLR 2011 deadline due to several requests; submissions are now due May 8th.

JCDL workshops are split over two half-days, and we are lucky enough to have *two* keynote speakers: Bernhard Haslhofer of the University of Vienna and Cathy Marshall of Microsoft Research.

Consider submitting!

The 1st Workshop on Semantic Web Technologies for Libraries and Readers

STLR 2011

June 16 (PM) & 17 (AM) 2011
Co-located with the ACM/IEEE Joint Conference on Digital Libraries (JCDL) 2011 Ottawa, Canada

While Semantic Web technologies are successfully being applied to library catalogs and digital libraries, the semantic enhancement of books and other electronic media is ripe for further exploration. Connections between envisioned and emerging scholarly objects (which are doubtless social and semantic) and the digital libraries in which these items will be housed, encountered, and explored have yet to be made and implemented. Likewise, mobile reading brings new opportunities for personalized, context-aware interactions between reader and material, enriched by information such as location, time of day and access history.

This full-day workshop, motivated by the idea that reading is mobile, interactive, social, and material, will be focused on semantically enhancing electronic media as well as on the mobile and social aspects of the Semantic Web for electronic media, libraries and their users. It aims to bring together practitioners and developers involved in semantically enhancing electronic media (including documents, books, research objects, multimedia materials and digital libraries) as well as academics researching more formal aspects of the interactions between such resources and their users. We also particularly invite entrepreneurs and developers interested in enhancing electronic media using Semantic Web technologies with a user-centered approach.

We invite the submission of papers, demonstrations and posters which describe implementations or original research that are related (but are not limited) to the following areas of interest:

  • Strategies for semantic publishing (technical, social, and economic)
  • Approaches for consuming semantic representations of digital documents and electronic media
  • Open and shared semantic bookmarks and annotations for mobile and device-independent use
  • User-centered approaches for semantically annotating reading lists and/or library catalogues
  • Applications of Semantic Web technologies for building personal or context-aware media libraries
  • Approaches for interacting with context-aware electronic media (e.g. location-aware storytelling, context-sensitive mobile applications, use of geolocation, personalization, etc.)
  • Applications for media recommendations and filtering using Semantic Web technologies
  • Applications integrating natural language processing with approaches for semantic annotation of reading materials
  • Applications leveraging the interoperability of semantic annotations for aggregation and crowd-sourcing
  • Approaches for discipline-specific or task-specific information sharing and collaboration
  • Social semantic approaches for using, publishing, and filtering scholarly objects and personal electronic media


*EXTENDED* Paper submission deadline: May 8th 2011
Acceptance notification: June 1st 2011
Camera-ready version: June 8th 2011



Each submission will be independently reviewed by 2-3 program committee members.


  • Alison Callahan, Dept of Biology, Carleton University, Ottawa, Canada
  • Dr. Michel Dumontier, Dept of Biology, Carleton University, Ottawa, Canada
  • Jodi Schneider, DERI, NUI Galway, Ireland
  • Dr. Lars Svensson, German National Library


Please use PDF format for all submissions. Semantically annotated versions of submissions, and submissions in novel digital formats, are encouraged and will be accepted in addition to a PDF version.

All submissions must adhere to the following page limits:
Full length papers: maximum 8 pages
Demonstrations: 2 pages
Posters: 1 page

Use the ACM template for formatting:

Submit using EasyChair:

Tags: , , , , , , , , , , , , ,
Posted in future of publishing, library and information science, PhD diary, scholarly communication, semantic web, social semantic web | Comments (2)