Archive for the ‘information ecosystem’ Category

Understanding Wikipedia through the evolution of a single page

August 26th, 2011

“The only constant is change.” – Heraclitis

How well do you know Wikipedia? Get to know it a little better by looking at how your favorite article changes over time. To inspire you, here are two examples.

Jon Udell’s screencast about ‘Heavy Metal Umlaut’ is a classic, looking back (in 2005) at the first two years of that article. It points out the accumulation of information, vandalism (and its swift reversion), formatting changes, and issues around the verifiability of facts.

In a recent article for the Awl ((The Awl is *woefully* distracting. I urge you not to follow any links. (Thanks a lot Louis!) )), Emily Morris sifts through 2,303 edits of ‘Lolita’ to pull out nitpicking revision comments, interesting diffs, and statistics.

Tags: ,
Posted in books and reading, future of publishing, information ecosystem, library and information science, social web | Comments (0)

Citation management means different things to different people

August 3rd, 2011

I got to talking with a mathematician friend about citation management. We came to the conclusion that “manage PDFs” is my primary goal while “get out good citations” is his primary goal. I thought it would interesting to look at his requirements.

His ideal program would

  1. Organize the PDFs (Papers does this, when it doesn’t botch the author names and the title) preferably in the file system, so I can use Dropbox
  2. Get BibTeX entires from MathSciNet, ACM, etc. EXACTLY AS THEY ARE
  3. Have some decent way to organize notes by “project” or something

He doesn’t care about:

  1. Typing \cite
  2. A “unified” bibliographic database
  3. Social bibliographies (though I am not against them; it is just not a burning issue)

He says:

I guess the point is that, if I am writing something and I know I want to cite it, and I know there is a “official” BibTeX for it, I just need a way to get that more quickly than:

  1. Type the URL
  2. Click on “Proxy this” in my bookmarks bar
  3. Search for the paper
  4. Copy/paste the BibTeX
  5. Edit the cite key to something mnemonic

He followed up with an example of the “awful” awful, lossy markup Papers produces which loses information including the ISSN and DOI; he prefers the minimalist BibTeX. (oops!; he adds “I understated how bad papers is. The real papers entry (top) not only has screwy names, but junk instead of the full journal name. The papers cite key is meaningless noise too (but mathscinet is meaningful noise).”) To get around this, he does the same search/download “a million times”.

AMS Papers2 BibTeX:
@article{AR78,
author = {L Asimow and B Roth},
journal = {Trans. Amer. Math. Soc.},
title = {The rigidity of graphs},
pages = {279--289},
volume = {245},
year = {1978},
}

Papers' The AMS version of the same BibTeX:
@article {AR78,
    AUTHOR = {Asimow, L. and Roth, B.},
     TITLE = {The rigidity of graphs},
   JOURNAL = {Trans. Amer. Math. Soc.},
  FJOURNAL = {Transactions of the American Mathematical Society},
    VOLUME = {245},
      YEAR = {1978},
     PAGES = {279--289},
      ISSN = {0002-9947},
     CODEN = {TAMTAM},
   MRCLASS = {57M15 (05C10 52A40 53B50 73K05)},
  MRNUMBER = {511410 (80i:57004a)},
MRREVIEWER = {G. Laman},
       DOI = {10.2307/1998867},
       URL = {http://dx.doi.org/10.2307/1998867},
}

I’ve just discovered that BibDesk‘s ((See also A short review of BibDesk from MacResearch)) ‘minimize’ does what he wants: its has output is quite close to the AMS Papers2 version:

@article{AR78,
	Author = {Asimow, L. and Roth, B.},
	Journal = {Trans. Amer. Math. Soc.},
	Pages = {279--289},
	Title = {The rigidity of graphs},
	Volume = {245},
	Year = {1978}}

I’d still like to understand the impact the non-minimal BibTeX is having; could be bad citation styles are causing part of the problem.

While we have different needs for citation management, we’re both annoyed by the default filenames many publishers use – like fulltext.pdf and sdarticle.pdf. But I’ll tolerate these, as long as I can get to it from a database index with a nice frontend.

We of course moved on to discussing how research needs an iTunes or, as Geoff Bilder has called it, an iPapers.

This blog post brought to you by Google chat and the number 3.

Tags: , , , , ,
Posted in books and reading, information ecosystem, library and information science, scholarly communication | Comments (0)

QOTD: “move the computation to the data”: the future of nonconsumptive research with Google Books

July 16th, 2011

Douglas Knox touches on the future of “distant reading” ((What’s “distant reading”? Think “text mining of literature”–but it’s deeper than that. It’s also called the macroeconomics of literature (“macroanalysis”) and Hathi Trust taking?))

For rights management reasons and also for material engineering reasons, the research architecture will move the computation to the data. That is, the vision of the future here is not one in which major data providers give access to data in big downloadable chunks for reuse and querying in other contexts, but one in which researchers’ queries are somehow formalized in code that the data provider’s servers will run on the researcher’s behalf, presumably also producing economically sized result sets.

There are also some implicit research goals, for those in cyberinfrastructure, digital humanities support, and people in text mining aiming at supporting humanities scholars:

Whatever we mean by “computation,” that is, can’t be locked up in an interface that tightly binds computation and data. Readers already need (and for the most part do not have) our own agents and our own data, our own algorithms for testing, validating, calibrating, and recording our interaction with the black boxes of external infrastructure.

This kind of blackbox infrastructure contrasts with “using technology critically and experimentally, fiddling with knobs to see what happens, and adjusting based on what they find.” when a scholar is “free to write short scripts and see results in quick cycles of exploration”.

I’m pulling these out of context — from Douglas’ post on the Digital Humanities 2011 conference.

Tags: , , , ,
Posted in books and reading, information ecosystem | Comments (0)

Enabling a Data Web: Is RDF the only choice?

July 8th, 2011

I’ve been slow in blogging about the Web Science Summer School being held in Galway this week. Check Clare Hooper’s blog for more reactions (starting from her day one post from two days ago).

Wednesday was full of deep and useful talks, but I have to start at the beginning, so I had to wait for a copy of Stefan Decker’s slides.

Hidden in the orientation to DERI, there are a few slides (12-19) which will be new to DERIans. They’re based on an argument Stefan made to the database community recently: any data format enabling the data Web is “more or less” isomorphic to RDF.

The argument goes:
The three enablers for the (document) Web were:

  1. scalability
  2. no censorship
  3. a positive feedback loop (exploiting Metcalf’s Law) ((The value of a communication network is proportional to the number of connections between nodes, or n^2 for n nodes)).

Take these as requirements for the data Web. Enabling Metcalf’s Law, according to Stefan, requires:

  1. Global Object Identity.
  2. Composability: The value of data can be increased if it can be combined with other data.

The bulk of his argument focuses on this composability feature. What sort of data format allows composability?

It should:

  1. Have no schema.
  2. Be self-describing.
  3. Be “object centric”.  In order to integrate information about different entities data must be related to these entities.
  4. Be graph-based, because object-centric data sources, when composed, results in a graph, in the general case.

Stefan’s claim is that any data format that fulfills the requirements is “more or less” isomorphic to RDF.

Several parts of this argument confuse me. First, it’s not clear to me that a positive feedback loop is the same as exploiting Metcalf’s Law. Second, can’t information can be composed even when it is not object-centric? (Is it obvious that entities are required, in general?) Third, I vaguely understand that composing object-centric data sources results in a (possibly disjoint) graph: but are graphs the only/best way to think about this? Further, how can I convince myself about this (presumably obvious) fact about data integration.

Tags: , , , , , , , ,
Posted in information ecosystem, semantic web | Comments (1)

Monetization is key to protecting Internet freedom

May 21st, 2011

The long-term freedom of the Internet may depend, in part, on convincing the big players of the content industry to modernize their business models.

Motivated by “protecting” the content industry, the U.S. Congress is discussing proposed legislation that could be used to seize domain names and force websites (even search engines) to remove links.

Congress doesn’t yet understand that there are already safe and effective ways to counter piracy — which don’t threaten Internet freedom. “Piracy happens not because it is cheaper, but because it is more convenient,” as Arvind Narayanan reports, musing on a conversation with Congresswoman Lofgren.

What the Congresswoman was saying was this:

  1. The only way to convince Washington to drop this issue for good is to show that artists and musicians can get paid on the Internet.
  2. Currently they are not seeing any evidence of this. The Congresswoman believes that new technology needs to be developed to let artists get paid. I believe she is entirely wrong about this; see below.
  3. The arguments that have been raised by tech companies and civil liberties groups in Washington all center around free speech; there is nothing wrong with that but it is not a viable strategy in the long run because the issue is going to keep coming back.

Arvind’s response is that the technology needed is already here. That’s old news to technologists, but the technology sector needs to educate Congress, who may not have the time and skills to get this information by themselves.

The dinosaurs of the content industries need to adapt their business models. Piracy is not correlated with a decrease in sales. Piracy happens not because it is cheaper, but because it is more convenient. Businesses need to compete with piracy rather than trying to outlaw it. Artists who’ve understood this are already thriving.

Tags: , , , ,
Posted in future of publishing, information ecosystem, intellectual freedom | Comments (0)

QOTD: Stop crippling ebooks: invent new business models instead

May 16th, 2011

Holding on to old business models is not the way to endear yourself to customers.

But unfortunately this is also, simultaneously, a bad time to be a reader. Because the dinosaurs still don’t get it. Ten years of object lessons from the music industry, and they still don’t get it. We have learned, painfully, that media consumers—be they listeners, watchers, or readers—want one of two things:

  • DRM-free works for a reasonable price
  • or, unlimited single-payment subscription to streaming/DRMed works

Give them either of those things, and they’ll happily pay. Look at iTunes. Look at Netflix. But give them neither, and they’ll pirate. So what are publishers doing?

  • Refusing to sell DRM-free books. My debut novel will be re-e-published by the Friday Project imprint of HarperCollins UK later this year; both its editor and I would like it to be published without DRM; and yet I doubt we will be able to make that happen.
  • crippling library e-books
  • and not offering anything even remotely like a subscription service.

– Jon Evans, When Dinosaurs Ruled the Books, via James Bridle’s Stop Press

Eric Hellman is one of the pioneers of tomorrow’s ebook business models: his company, Gluejar, uses a crowdfunding model to re-release books under Creative Commons licenses. Authors and publishers are paid; fans pay for the books they’re most interested in; and everyone can read and distribute the resulting “unglued” ebooks. Everybody wins.

Tags: , , , , ,
Posted in books and reading, future of publishing, information ecosystem | Comments (0)

Apple seizes control of iOS purchase chain: enforces 30% cut for Apple by prohibiting sales-oriented links from apps to the Web

February 16th, 2011

Apple’s press release about its “new subscription services” seems at first innocuous, and the well-crafted quote ((

“Our philosophy is simple—when Apple brings a new subscriber to the app, Apple earns a 30 percent share; when the publisher brings an existing or new subscriber to the app, the publisher keeps 100 percent and Apple earns nothing,” said Steve Jobs, Apple’s CEO. “All we require is that, if a publisher is making a subscription offer outside of the app, the same (or better) offer be made inside the app, so that customers can easily subscribe with one-click right in the app. We believe that this innovative subscription service will provide publishers with a brand new opportunity to expand digital access to their content onto the iPad, iPod touch and iPhone, delighting both new and existing subscribers.”

– Steve Jobs at “Apple Launches Subscriptions on the App Store“)) from Steve Jobs has been widely reposted:
“when Apple brings a new subscriber to the app, Apple earns a 30 percent share; when the publisher brings an existing or new subscriber to the app, the publisher keeps 100 percent and Apple earns nothing.” Yet analysts reading between the lines have been less than pleased.

Bad for publishers

The problems for publishers? (See also “Steve Jobs to pubs: Our way or highway“)

  • Apple takes a 30% cut of all in-app purchases ((Booksellers call this “the agency model“.))
  • Apps may not bypass in-app purchase: apps may not link to an external website (such as Amazon) ((Apple has confirmed that Kindle’s “Shop in Kindle Store” must be removed.)) that allows customers to buy content or subscriptions.
  • Content available for purchase in the app cannot be cheaper elsewhere.
  • The customer’s demographic information resides with Apple, not with the publisher. Customers must opt-in to share their name, email, and zipcode with the publisher, though Apple will of course have this information.
  • Limited reaction time; changes will be finalized by June 30th.

Bad for customers?

And there are problems for customers, too.

  • Reduction of content available in apps (likely for the near-term).
  • More complex, clunky purchase workflows (possible).
    Publishers may sell material only outside of apps, from their own website, to avoid paying 30% to Apple. Will we see a proliferation of publisher-run stores?
  • Price increases to cover Apple’s commission (likely).
    If enacted, these must apply to all customers, not just iOS device users.
  • Increased lockdown of content in the future (probably).
    Apple already prevents some iBooks customers from reading books they bought and paid, using extra DRM affecting some jailbroken devices. Even though jailbreaking is explicitly legal in the United States. And even though carrier unlock and SIM-free phones are not available in the U.S.

More HTML5 apps?

The upside? Device-independent HTML5 apps may see wider adoption. HTML5 mobile apps work well on iOS, on other mobile platforms, and on laptops and desktops.

For ebooks, HTML5 means Ibis Reader and Book.ish. For publishers looking to break free of Apple, yet satisfy customers, Ibis Reader may be a particularly good choice: this year they are focusing on licensing Ibis Reader, as Liza Daly’s Threepress announced in a savvy and well-timed post, anticipating Apple’s announcement. Having been a beta tester of Ibis Reader, I can recommend it!

If you know of other HTML5 ebook apps, please leave them in the comments.

Tags: , , , , , , , , , , , , , , ,
Posted in books and reading, future of publishing, information ecosystem, iOS: iPad, iPhone, etc. | Comments (0)

6 quotes from Beyond the PDF – Annotations sessions

January 19th, 2011

Moderator Ed Hovy picked out 6 quotes to summarize Beyond the PDF’s sessions on Annotation.

Papers are stories that persuade with data.

But as authors we are lazy and undisciplined.

Communicating between humans and humans and humans and machines.

I should be interested in ontologies, but I just can’t work up the enthusiasm.

Christmas tree of hyperlinks.

You will get sued.

Tags: , ,
Posted in future of publishing, information ecosystem | Comments (1)

Wanted: the ultimate mobile app for scholarly ereading

January 7th, 2011

Nicole Henning suggests that academic libraries and scholarly presses work together to create the ultimate mobile app for scholarly ereading. I think about the requirements a bit differently, in terms of the functional requirements.

The main functions are obtaining materials, reading them, organizing them, keeping them, and sharing them.

For obtaining materials, the key new requirement is to simplify authentication: handle campus authentication systems and personal subscriptions. Multiple credentialed identities should be supported. A secondary consideration is that RSS feeds (e.g. for journal tables of contents) should be supported.

For reading materials, the key requirement is to support multiple formats in the same application. I don’t know of a web app or mobile app that supports PDF, EPUB, and HTML. Reading interfaces matter: look to Stanza and Ibis Reader for best-in-class examples.

For organizing materials, the key is synergy between the user’s data and existing data. Allow tags, folders, and multiple collections. But also leverage existing publisher and library metadata. Keep it flexible, allowing the user to modify metadata for personal use (e.g. for consistency or personal terminology) and to optionally submit corrections.

For keeping materials, import, export, and sync content from the user’s chosen cloud-based storage and WebDAV servers. No other device (e.g. laptop or desktop) should be needed.

For sharing materials, support lightweight micropublishing on social networks and email; networks should be extensible and user-customizable. Sync to or integrate with citation managers and social cataloging/reading list management systems.

Regardless of the ultimate system, I’d stress that device independence is important, meaning that an HTML5 website would probably the place to start: look to Ibis Reader as a model.

Tags: , ,
Posted in books and reading, future of publishing, information ecosystem, library and information science, scholarly communication | Comments (5)

Searching for LaTeX code (Springer only)

January 6th, 2011

Springer’s LaTeX search service (example results) allow searching for LaTeX strings or finding the LaTeX equations in an article. Since LaTeX is used to markup equations in many scientific publications this could be an interesting way to find related work or view an equation-centric summary of a paper.

You can provide a LaTeX string, and Springer says that besides exact matches they can return similar LaTeX strings:
exact matches to a LaTeX search

Or, you can search by DOI or title to get all the equations in a given publication:
results for a particular title

Under each equation in the search results you can click “show LaTeX code”:
show the LaTeX code for an equation
Right now it just searches Springer’s publications; Springer would like to add open access databases and preprint servers. Coverage even in Springer journals seems spotty: I couldn’t find two particular discrete math articles papers, so I’ve written Springer for clarification. As far as I can tell, there’s no way to get from SpringerLink to this LaTeX search yet: it’s a shame, because “show all equations in this article” would be useful, even with the proviso that only LaTeX equations were shown.

A nice touch is their sandbox where you can test LaTeX code, with a LaTeX dictionary conveniently below.

via Eric Hellman

Tags: , , , ,
Posted in future of publishing, information ecosystem, library and information science, math, scholarly communication | Comments (1)