Factor-based summarization

December 13th, 2011
by jodi

Factor-based summarization of reviews is useful:

I’m currently looking for a review of social media summarization. Any pointers?

Tags: , ,
Posted in argumentative discussions, PhD diary, social web | Comments (0)

A Review of Argumentation for the Social Semantic Web

December 6th, 2011
by jodi

I’m very pleased to share our “A Review of Argumentation for the Social Semantic Web“.

You are very warmly invited to review this paper. You can post the review as a comment to the manuscript page publicly at SWJ’s website. Informal comments by email are also welcome.

Open review

I adore SWJ’s open review process: publicly available manuscripts are useful. In 11 months the landing page has had “1208 reads” and I’m sure that not all of those are mine! Further, knowing who reviewed a paper can add credibility to the process. (It means quite a lot to me when Simon Buckingham-Shum says “I anticipate that this will become a standard reference for the field.”!)

Two earlier versions

The paper evolved from my first year Ph.D. report. In the process of defining my Ph.D. topic, I reviewed the state-of-art of argumentation for the Social Semantic Web. This was further developed in conversations with my coauthors, my colleague Tudor Groza and my advisor Alexandre Passant.

The outdated first journal submission and second journal submission are available; May’s reviews refer to the first version. A cover letter responding to the reviews summarizes what has changed. Shared since I am always encouraged by seeing how others’ work and ideas have developed over time!

So read the most recent version, and let us know what you think!

Updated 2012-08-09 to update links to the “final” version.

Tags: , , , ,
Posted in argumentative discussions, PhD diary, semantic web, social semantic web, social web | Comments (0)

Quantified Self & Privacy: followup

December 4th, 2011
by jodi

Our breakout session on privacy was well-attended, with about 15 people, mainly coming from backgrounds in healthcare, insurance, and the like. There was a quite active discussion.

Alpha release privacy icon indicating "your data may be bartered or sold".

In addition to the questions I shared earlier, I was asked to give pointers to a few things I mentioned.

Aza Raskin asked, Could we make privacy policies as simple as CreativeCommons has made licensing? There’s now an alpha release of the privacy icons. Earlier info about the project is also in Aza’s blog.


Several people were familiar with “Say Everything“, a 2007 NY Magazine piece on how publicly many young people were living their lives:

…the idea of a truly private life is already an illusion. Every street in New York has a surveillance camera. Each time you swipe your debit card at Duane Reade or use your MetroCard, that transaction is tracked. Your employer owns your e-mails. The NSA owns your phone calls. Your life is being lived in public whether you choose to acknowledge it or not.

So it may be time to consider the possibility that young people who behave as if privacy doesn’t exist are actually the sane people, not the insane ones.

But I came across that article in conjunction with a story about a teenager whose publicity had caused her embarrassment and harm, when a sports blogger posted her picture along with a 4-paragraph note excerpted as “Meet pole vaulter Allison Stokke. . . . Hubba hubba and other grunting sounds.”

…what could she do now, when a search for her name in Yahoo! revealed almost 310,000 hits? “It’s not like I could e-mail everybody on the Internet,” Stokke said.

She felt like

Her body had been stolen and turned into a public commodity, critiqued in fan forums devoted to everything from hip-hop to Hollywood.


danah boyd, who has extensively researched teens and the Internet, has also written and spoken sagely about privacy.

Here’s an except from a talk she gave: “The Future of Privacy: How Privacy Norms Can Inform Regulation”

All too often, folks presume that privacy is about hiding information or controlling access to information. This is a very limited view. For teens, privacy has more to do with feeling safe and in control of a situation, trusting people and systems, and leveraging an understood context for intimacy.

Let me ground this in an example. If I’m dealing with an illness, I’m not hiding it from people just because I’m not talking about it. If I choose to share my illness, I’m probably not going to start by standing up in the middle of the town square and shouting loudly to everyone who could possibly hear that I’m ill. I may start by gathering my family and sharing in an intimate situation where I feel supported. I open up to them, make myself vulnerable, in exchange for support. This is privacy. I also have expectations about that social situation. I expect my family to respect the situation in which I shared something deeply personal. I _trust_ them to understand how far that information is supposed to be spread. Any one of them is capable of breaking my trust, telling someone against my wishes and expectations, but what’s at stake is the relationship. My agency, my power, in that situation does not stem from me locking my family in a closet after I told them something personal. It comes from the social expectation that they respect the context of the situation.

There are certain structural assumptions baked into this unmediated scenario. First, and most importantly, there is an assumption in everyday interactions that conversations are private-by-default, public through effort. In unmediated situations, publicity takes effort. We have to consciously tell other people what we hear. Shouting to the entire town square is a lot harder than telling just a few people. Even when we share in public places, there’s a huge difference between sitting in a cafe talking with a friend and screaming to the entire room. Sure, people can overhear us in the cafe. And they do. But that doesn’t mean that they’re in the conversation. Sociologist Erving Goffman noted that there’s a societal value of “civil inattention.” Even when we can overhear conversations, we generally try to not listen. Doing so is a way of indicating that we respect others’ space. This isn’t universal and people are always jumping into conversations that they’re not a part of. But all of the parties know that they’re “butting in.”

What’s different about the Internet is not about a radical shift in social norms. What’s different has to do with how the architecture shifts the balance of power in terms of visibility. In online public spaces, interactions are public-by-default, private-through-effort, the exact opposite of what we experience offline. There is no equivalent to the cafe where you can have a private conversation in public with a close friend without thinking about who might overhear. Your online conversations are easily overheard. And they’re often persistent, searchable, and easily spreadable. Online, we have to put effort into limiting how far information flows. We have to consciously act to curb visibility. This runs counter to every experience we’ve ever had in unmediated environments.

When people participate online, they don’t choose what to publicize. They choose what to limit others from seeing. Offline, it takes effort to get something to be seen. Online, it takes effort for things to NOT be seen. This is why it appears that more is public. Because there’s a lot of content out there that people don’t care enough about to lock down. I hear this from teens all the time. “Public by default, privacy when necessary.” Teens turn to private messages or texting or other forms of communication for intimate interactions, but they don’t care enough about certain information to put the effort into locking it down. But this isn’t because they don’t care about privacy. This is because they don’t think that what they’re saying really matters all that much to anyone. Just like you don’t care that your small talk during the conference breaks are overheard by anyone. Of course, teens aren’t aware of how their interactions in aggregate can be used to make serious assumptions about who they are, who they know, and what they might like in terms of advertising. Just like you don’t calculate who to talk to in the halls based on how a surveillance algorithm might interpret your social network.

I’d particularly recommend “Making Sense of Privacy and Publicity” and “Privacy and Publicity in the Context of Big Data“. As danah points out, social media conversations essentially *require* sharing personally identifying information. But it’s personally embarrassing information that people don’t want spread.

Tags: , , , , ,
Posted in random thoughts, social web | Comments (0)

Quantified Self & Privacy: some brief thoughts before the breakout session

November 26th, 2011
by jodi

Today at 3 (in Heidelberg) I’m running a breakout session on QS & Privacy: How can we ensure privacy as we share our data stories? What rights and responsibilities do we have? Where is the public-prviate boundary?

Here are a few provocative thoughts from conversations so far today.

Body Blogger Kiel Gilleade talked about heartrate this morning:

My boss called me to ask whether I was working on a deadline because my heart rate was in the green zone rather than in the red zone like the last paper-writing deadline.

He observes: situational & contextual info is crucial for interpretation.

Tom Hume tweeted:

You don’t control your identity. It’s manufactured by those around you. #qs2011

Joshua Kauffman tweeted:

There is no such thing as personal health data. All matters of health are socially shared and derived. #qs2011

Tags: , , , , ,
Posted in random thoughts, social web | Comments (1)

Exercise & Weight tracking, Quantified Self Europe

November 26th, 2011
by jodi

My talk on exercise and weight tracking at QuantifiedSelfEurope was video’d and will be world-viewable on the Quantified Self blog at some point in the future.

So far being visible has been beneficial, so despite the challenge, I’m sharing slides. As I said during the talk, an exercise monitor is something of a conversation piece: to ask a fat person “Are you trying to lose weight?” is generally rude. But an exercise monitor has been a point of entry to me for useful conversations and interesting ideas (like weight averaging and tracking tools).

I’ll add video when that’s available.

Tags: , , , ,
Posted in random thoughts, social web | Comments (2)

Quantified Self Europe, Saturday morning.

November 26th, 2011
by jodi

What is this Quantified Self stuff, anyway? Here’s a brief intro (prettier PDF version) Nathan Yau and I wrote.

This weekend I’m in Amsterdam for Quantified Self Europe. So far this morning I’ve met Arduino hackers, seen several talks about monitoring heart rate (continuously, cool, or even with swimming goggles) and lung capacity. Oh, and given a talk about Exercise and Weight tracking.

There’s lots of blogging/photoblogging going on. Twitter hashtag (formerly #QSelfEurope) is #QS2011.

Tags: , , , ,
Posted in information ecosystem, random thoughts, social web | Comments (0)

Code4Lib 2012 talk proposals are out

November 21st, 2011
by jodi

Code4Lib2012 talk proposals are now on the wiki. This year there are 72 proposals for 20-25 slots. I pulled out the talks mentioning semantics (linked data, semantic web, microdata, RDF) for my own convenience (and maybe yours).

Property Graphs And TinkerPop Applications in Digital Libraries

  • Brian Tingle, California Digital Library

TinkerPop is an open source software development group focusing on technologies in the graph database space.
This talk will provide a general introduction to the TinkerPop Graph Stack and the property graph model is uses. The introduction will include code examples and explanations of the property graph models used by the Social Networks in Archival Context project and show how the historical social graph is exposed as a JSON/REST API implemented by a TinkerPop rexster Kibble that contains the application’s graph theory logic. Other graph database applications possible with TinkerPop such as RDF support, and citation analysis will also be discussed.

HTML5 Microdata and Schema.org

  • Jason Ronallo, North Carolina State University Libraries

When the big search engines announced support for HTML5 microdata and the schema.org vocabularies, the balance of power for semantic markup in HTML shifted.

  • What is microdata?
  • Where does microdata fit with regards to other approaches like RDFa and microformats?
  • Where do libraries stand in the worldview of Schema.org and what can they do about it?
  • How can implementing microdata and schema.org optimize your sites for search engines?
  • What tools are available?

“Linked-Data-Ready” Software for Libraries

  • Jennifer Bowen, University of Rochester River Campus Libraries

Linked data is poised to replace MARC as the basis for the new library bibliographic framework. For libraries to benefit from linked data, they must learn about it, experiment with it, demonstrate its usefulness, and take a leadership role in its deployment.

The eXtensible Catalog Organization (XCO) offers open-source software for libraries that is “linked-data-ready.” XC software prepares MARC and Dublin Core metadata for exposure to the semantic web, incorporating FRBR Group 1 entities and registered vocabularies for RDA elements and roles. This presentation will include a software demonstration, proposed software architecture for creation and management of linked data, a vision for how libraries can migrate from MARC to linked data, and an update on XCO progress toward linked data goals.

Your Catalog in Linked Data

  • Tom Johnson, Oregon State University Libraries

Linked Library Data activity over the last year has seen bibliographic data sets and vocabularies proliferating from traditional library
sources. We’ve reached a point where regular libraries don’t have to go it alone to be on the Semantic Web. There is a quickly growing pool of things we can actually ”link to”, and everyone’s existing data can be immediately enriched by participating.

This is a quick and dirty road to getting your catalog onto the Linked Data web. The talk will take you from start to finish, using Free Software tools to establish a namespace, put up a SPARQL endpoint, make a simple data model, convert MARC records to RDF, and link the results to major existing data sets (skipping conveniently over pesky processing time). A small amount of “why linked data?” content will be covered, but the primary goal is to leave you able to reproduce the process and start linking your catalog into the web of data. Appropriate documentation will be on the web.

NoSQL Bibliographic Records: Implementing a Native FRBR Datastore with Redis

  • Jeremy Nelson, Colorado College, jeremy.nelson@coloradocollege.edu

In October, the Library of Congress issued a news release, “A Bibliographic Framework for the Digital Age” outlining a list of requirements for a New Bibliographic Framework Environment. Responding to this challenge, this talk will demonstrate a Redis (http://redis.io) FRBR datastore proof-of-concept that, with a lightweight python-based interface, can meet these requirements.

Because FRBR is an Entity-Relationship model; it is easily implemented as key-value within the primitive data structures provided by Redis. Redis’ flexibility makes it easy to associate arbitrary metadata and vocabularies, like MARC, METS, VRA or MODS, with FRBR entities and inter-operate with legacy and emerging standards and practices like RDA Vocabularies and LinkedData.

ALL TEH METADATAS! or How we use RDF to keep all of the digital object metadata formats thrown at us.

  • Declan Fleming, University of California, San Diego

What’s the right metadata standard to use for a digital repository? There isn’t just one standard that fits documents, videos, newspapers, audio files, local data, etc. And there is no standard to rule them all. So what do you do? At UC San Diego Libraries, we went down a conceptual level and attempted to hold every piece of metadata and give each holding place some context, hopefully in a common namespace. RDF has proven to be the ideal solution, and allows us to work with MODS, PREMIS, MIX, and just about anything else we’ve tried. It also opens up the potential for data re-use and authority control as other metadata owners start thinking about and expressing their data in the same way. I’ll talk about our workflow which takes metadata from a stew of various sources (CSV dumps, spreadsheet data of varying richness, MARC data, and MODS data), normalizes them into METS by our Metadata Specialists who create an assembly plan, and then ingests them into our digital asset management system. The result is a HTML, RSS, METS, XML, and opens linked data possibilities that we are just starting to explore.

UDFR: Building a Registry using Open-Source Semantic Software

  • Stephen Abrams, Associate Director, UC3, California Digital Library
  • Lisa Dawn Colvin, UDFR Project Manager, California Digital Library

Fundamental to effective long-term preservation analysis, planning, and intervention is the deep understanding of the diverse digital formats used to represent content. The Unified Digital Format Registry project (UDFR, https://bitbucket.org/udfr/main/wiki/Home) will provide an open source platform for an online, semantically-enabled registry of significant format representation information.

We will give an introduction to the UDFR tool and its use within a preservation process.

We will also discuss our experiences of integrating disparate data sources and models into RDF: describing our iterative data modeling process and decisions around integrating vocabularies, data sources and provenance representation.

Finally, we will share how we extended an existing open-source semantic wiki tool, OntoWiki, to create the registry.

saveMLAK: How Librarians, Curators, Archivists and Library Engineers Work Together with Semantic MediaWiki after the Great Earthquake of Japan

  • Yuka Egusa, Senior Researcher of National Institute of Educational Policy Research
  • Makoto Okamoto, Chief Editor of Academic Resource Guide (ARG)

In March 11th 2011, the biggest earthquake and tsunami in the history attacked a large area of northern east region of Japan. A lot of people have worked together to save people in the area. For library community, a wiki named “savelibrary” was launched for sharing information on damages and rescues on the next day of the earthquake. Later then people from museum curators, archivists and community learning centers started similar projects. In April we joined to a project “saveMLAK”, and launched a wiki site using Semantic MediaWiki under http://savemlak.jp/.

As of November 2011, information on over 13,000 cultural organizations are posted on the site by 269 contributors since the launch. The gathered information are organized along with Wiki categories of each type of facilities such library, museum, school, etc. We have held eight edit-a-thons to encourage people to contribute to the wiki.

We will report our activity, how the libraries and museums were damaged and have been recovered with lots of efforts, and how we can do a new style of collaboration with MLAK community, Wiki and other voluntary communities at the crisis.


Conversion by Wikibox, tweaked in Textwrangler. Trimmed email addresses, otherwise these are as-written. Did I miss one? Let me know!

Tags: , , , , , ,
Posted in computer science, library and information science, scholarly communication, semantic web | Comments (0)

Argumentation on Twitter

November 19th, 2011
by jodi

Here’s an argument made on Twitter:

Difference between cakes and biscuits? When stale, cakes go hard, biscuits go soft. Hence Jaffa Cakes are cakes. (Was official EU ruling).

I just love this example:

  1. First, you can find it with “hence” (see cue phrases from an appendix to Marcu‘s thesis).
  2. Second, the notion of this EU (tax) ruling amuses me.
  3. Third, it shows that 140 characters is enough for a complex argumentative structure. This has three main claims: When stale, cakes go hard, biscuits go soft; Jaffa Cakes are cakes; and [Jaffa Cakes are cakes due to] official EU ruling.
  4. Enthymemes anyone?

It’s hard, though, to draw the line between an argument and an explanation in this context.
Jaffa Cakes, for you North American readers, are a common dessert-y snack in Ireland and the UK. Vaguely like Kandy Kakes found in the Philadelphia area/East Coast, but usually have an orange filling.

Tags: , , , , , , ,
Posted in argumentative discussions, PhD diary, random thoughts, social web | Comments (4)

Time-based comments

November 14th, 2011
by jodi

I’ve been digging SoundCloud lately.

Today I noticed time-based comments in their tracks. It’s a bit disorienting to have comments pop up as you’re listening. Maybe after adjusting, there’s a pleasant sense of having a conversation going on around you. Definitely feels like you’ve got company!

Comments pop up as the track plays

Avatars appear below the track to indicate that there are comments, and you can scroll over avatars to read comments. You can also hide the comments if you prefer.

Entering a comment from the timeline


Comments are indicated by avatar icons in the full view.

Avatar icons appear in the overview

Example track due to Duncan.

Tags: , , , , ,
Posted in argumentative discussions, information ecosystem, PhD diary, social web | Comments (0)

YouTube “I dislike this” button

November 14th, 2011
by jodi

A few weeks ago, I noticed something new on YouTube: an “I dislike this” button.

I wonder how long that’s been there?

 

When I talk about online argumentation, a frequent comment is “too bad there’s only +1 and Like; we need more expressivity”.

See related discussions:

Tags: , , , ,
Posted in argumentative discussions, information ecosystem, PhD diary, social web | Comments (1)