Archive for July, 2011

Annotating PDFs on an iPad: GoodReader and iAnnotatePDF

July 31st, 2011

Colleagues were interested in my recommendations for iPad annotation: GoodReader and iAnnotatePDF. Here’s a brief comparison.

Both save Acrobat-compatible annotations, which can be exported out as text (for instance to see everything you’ve highlighted yellow), offer synching, and multiple styles of annotation. The exact annotation workflow and navigation differ somewhat.

GoodReader’s main strength is the ability to easily pinpoint the exact boundaries of an annotation: a circular magnifying ‘loope’ window automatically pops up. GoodReader also warns you when scanned images don’t have text behind them (offering to OCR them would be a welcome, though challenging enhancement: it would be enough to put them into an OCR-queue you could have Acrobat Pro watch and act on). One weakness (for me at least) is that to get the toolmenu, you must tap in the middle of the screen. My fingers seem expect it to pop up when you tap on the right-hand side of the screen: sometimes that advances the page, but sometimes that just changes the view on the current page. Further, I find its small black-and-white icons somewhat confusing.

I prefer iAnnotatePDF, especially because it saves annotations by default, has customizable navigation, and clearer icons. Its key strength is that annotations are auto-saved, with ‘undo’, ‘delete’, and ‘edit’ functions. Further, the annotation type is maintained between annotations, until you (say) put down the highlighter by clicking an x. This is a small weakness since I find that to switch pages I have to close the annotation tool I’m currently using. Another weakness is that there’s a limited time window for editing existing annotations: just after they are created, annotations can be adjusted, for instance to move the boundaries of text highlights and underlines. Yet after this period has expired, annotations can be deleted, but locations cannot be adjusted (as far as I can tell). Another weakness is that interacting with image-only PDFs can be confusing; without any text, some functions (text highlight, text underline, …) just don’t work, without any warning or notice.

I would be interested in hearing comparisons of the syncing functionality, as well as comparisons to PDFExpert.

Criterion  GoodReader  iAnnotatePDF 
Pageview  default is snap to page (double-spreads show left-to-right)  flow (can see parts of 2 pages at once, top-to-bottom) 
Saving annotations  Must save each annotation  Annotations automatically save 
Navigation  tap left/right to navigate forward/back; scroll only shows the same page  tap, slide, or swipe to navigate (customizable)  
Toolbar  tap in the middle  tap on the right 
Icons  black & white, some are obscure   medium-sized color, some are clearly understandable  

Posted in books and reading, iOS: iPad, iPhone, etc. | Comments (1)

How do you organize papers on your iPad?

July 31st, 2011

You read papers, right? How do you store and organize them? I’m looking for advice on a workflow for annotating PDFs and syncing between devices.

I’m striking out on iPad apps for organizing scholarly papers. Papers2 doesn’t pull annotated copies back. Mendeley lite doesn’t even let me log in1. Zotero, which has been my main reference manager for at least 5 years, doesn’t offer an iPad app.

For annotation, I like iAnnotatePDF and GoodReader (and I’m getting ready to try PDFExpert). What I don’t know is how to have manageable filenames, when the documents originate in another iPad app, instead of on the desktop.

The only ideas I have left involve either spending more time with filemanagers or relying on the synching inside the annotation tools.

Reference managers/PDF managers:

  1. Try Sente
  2. See what DevonThink Pro Office can do, maybe with Zotero export others have worked on. Surely that’s overkill?

Synching from annotation tools:

iAnnotate or GoodReader can “watch” folders. Main challenge is going to be coming up with a sufficiently small collection of PDFs to sync back and forth to the iPad.

  1. Stick with Zotero, maybe with files renamed from Zotfile, then use iAnnotatePDF’s “watch folder” feature to keep in sync.
  2. Stick with Papers2, manually manage the file synch for everything I’ve annotated, then use watch its data directory with iAnnotatePDF as above.
  3. Try Mendeley, watch its data directory with iAnnotatePDF as above.

Thoughts and suggestions? What would you do?

  1. In Mendeley v1.3.1 (build 19) when I enter my login details, the only option is ‘close’. After closing, Mendeley reports “Not logged in”. Yes, I’ve double-checked my password! []

Posted in books and reading, iOS: iPad, iPhone, etc. | Comments (1)

Papers2 does not integrate with external iPad applications in the way I expected

July 31st, 2011

Papers2 does not integrate with external iPad applications in the way I expected. I use iPad applications like GoodReader, iAnnotatePDF, and PDFExpert to read and annotate papers.

The functionality I expected was:

  • Export from Papers to an external PDF annotation application
  • When I reopen Papers, the annotated PDF is shown in my library

However, here is what happens:

  • Export from Papers to an external PDF annotation application. It renames the file, using a random string as the filename.
  • When I reopen Papers, only the original (unannotated) PDF is in my library.
  • Alternately when I export from the external application, the annotated file is imported as a *new* PDF, unconnected to the original, with a random string used for the filename.

I started using Papers because managing filenames in iAnnotate wasn’t working: I couldn’t figure out which files were which. So this is absolutely key for me.


This is a bug report to Papers2, copied here since bug reports are private. Any workarounds or suggestions for alternate annotation/reference management workflows would be very welcome.

This annotation environment completely failed to meet my expectations: I expected to ‘Open In’ an annotation application; in fact there’s just ‘Export’ and ‘Import’, meaning that the annotated file isn’t automatically stored in the Papers2 library.

Tags: , , , , , , ,
Posted in books and reading, iOS: iPad, iPhone, etc. | Comments (1)

GetSatisfaction’s “feedback-as-you-type”

July 24th, 2011

GetSatisfaction does so many things right. Smart, immediate feedback is one example.

A couple weeks ago, I noticed this message while adding a post:
“EASE UP ON THE ALL CAPS IN YOUR TITLE. It looks like you’re shouting”
Feedback from GetSatisfaction: STOP SHOUTING

This is great in several ways:

  1. It’s immediate.
  2. It makes a single, clear, personalized1 suggestion.
  3. It uses a familiar analogy (“shouting”) — helping to explain the perceived problem.
  4. It’s not enforced: this nudges the poster, but leaves them to make up their own mind.
  5. It hints at humor/puts the shoe on the other foot (by USING CAPS FOR THE START OF THE MESSAGE).
  6. It’s not overwhelming.

Like their mood feedback it’s lightweight and appears to be effective.

Figuring out appropriate ways of presenting people with the “right” feedback at the right time will be important for a lot of the work I’m doing!
  1. i.e. specific to the situation []

Tags: , , , ,
Posted in PhD diary, random thoughts, social web | Comments (0)

Reading Ontologically?

July 24th, 2011

What are the right ontologies for reading? And what kind of ontology support would let books recombine themselves, on the fly, in novel ways?

Today keyword searches within books and book collections is commonplace, highlighting a word in your ebook reader can bring up a definition, and dictionaries grab recent examples of word use from microblogs.1 But can’t we do more? But what kind of synthesis do we need (and what is possible) for supporting readers of literature, classics, and humanities texts?

Current approaches seem to aim at analysis (e.g. getting an overview of the literary works of a period with “distant reading”/”macroanalysis”) and at creating flexible critical editions (e.g. structural, sometimes overlapping markup, as in TEI-based editions and projects like Wendell Piez’ Sonneteer2.) I would call these “sensemaking” approaches rather than tools for reading.

I was intrigued by the Bible Ontology3 because of their tagline: “ever wanted to read and study the Bible Ontologically?” Yet I don’t really know what they mean by reading ontologically4.

Of course, they have recorded various pieces of data. For instance, for Rebekah, we see her children, siblings, birthplace, book and chapters she figures in, etc.:

Rebekah, from

They offer a SPARQL endpoint, so you can query. For instance, to find all the married women6 (live query result):

PREFIX bop: <>
select ?s ?o where {?s bop:isWifeOf ?o }

Intense and long-term work has gone into Bible concordances, scholarship, etc., so it seems like a great use case for “reading ontologically”. With theologians and others looking at the site, using the SPARQL endpoint, etc., perhaps someone will be able to tell me what that means!

  1. In 2003, Gregory Crane wrote that “Already the books in a digital library are beginning to read one another and to confer among themselves before creating a new synthetic document for review by their human readers.” When I first read it in 2006, that article seemed incredibly visionary to me. Yet these commonplace “syntheses” no longer seem extraordinary to me. []
  2. currently offline, but brilliant; do check back, meanwhile see also his Digital Humanities 2010 talk notes []
  3. It’s a bit disingenuous to advertise their work as an ontology: in fact they have applied the ontology, rather than just creating it. []
  4. even though I’ve given a talk about supporting reading with ontologies! []
  5. The most meaningful of their terms is the bop:isRelatedInEvent, perhaps since these events, like Isaac_blesses_Jacob, would require more analysis to discern. []
  6. Gender is not recorded so we can’t (yet) ask for all the women overall, though I’ve just asked about this. []

Tags: , , , , ,
Posted in books and reading, future of publishing, semantic web | Comments (0)

QOTD: “move the computation to the data”: the future of nonconsumptive research with Google Books

July 16th, 2011

Douglas Knox touches on the future of “distant reading”1 with Google Books.2

For rights management reasons and also for material engineering reasons, the research architecture will move the computation to the data. That is, the vision of the future here is not one in which major data providers give access to data in big downloadable chunks for reuse and querying in other contexts, but one in which researchers’ queries are somehow formalized in code that the data provider’s servers will run on the researcher’s behalf, presumably also producing economically sized result sets.

There are also some implicit research goals, for those in cyberinfrastructure, digital humanities support, and people in text mining aiming at supporting humanities scholars:

Whatever we mean by “computation,” that is, can’t be locked up in an interface that tightly binds computation and data. Readers already need (and for the most part do not have) our own agents and our own data, our own algorithms for testing, validating, calibrating, and recording our interaction with the black boxes of external infrastructure.

This kind of blackbox infrastructure contrasts with “using technology critically and experimentally, fiddling with knobs to see what happens, and adjusting based on what they find.” when a scholar is “free to write short scripts and see results in quick cycles of exploration”.

I’m pulling these out of context — from Douglas’ post on the Digital Humanities 2011 conference.

  1. What’s “distant reading”? Think “text mining of literature”–but it’s deeper than that. It’s also called the macroeconomics of literature (“macroanalysis”) and ]
  2. By the way, what approach is the Hathi Trust taking? []

Tags: , , , ,
Posted in books and reading, information ecosystem | Comments (0)

Enabling a Data Web: Is RDF the only choice?

July 8th, 2011

I’ve been slow in blogging about the Web Science Summer School being held in Galway this week. Check Clare Hooper’s blog for more reactions (starting from her day one post from two days ago).

Wednesday was full of deep and useful talks, but I have to start at the beginning, so I had to wait for a copy of Stefan Decker’s slides.

Hidden in the orientation to DERI, there are a few slides (12-19) which will be new to DERIans. They’re based on an argument Stefan made to the database community recently: any data format enabling the data Web is “more or less” isomorphic to RDF.

The argument goes:
The three enablers for the (document) Web were:

  1. scalability
  2. no censorship
  3. a positive feedback loop (exploiting Metcalf’s Law)1.

Take these as requirements for the data Web. Enabling Metcalf’s Law, according to Stefan, requires:

  1. Global Object Identity.
  2. Composability: The value of data can be increased if it can be combined with other data.

The bulk of his argument focuses on this composability feature. What sort of data format allows composability?

It should:

  1. Have no schema.
  2. Be self-describing.
  3. Be “object centric”.  In order to integrate information about different entities data must be related to these entities.
  4. Be graph-based, because object-centric data sources, when composed, results in a graph, in the general case.

Stefan’s claim is that any data format that fulfills the requirements is “more or less” isomorphic to RDF.

Several parts of this argument confuse me. First, it’s not clear to me that a positive feedback loop is the same as exploiting Metcalf’s Law. Second, can’t information can be composed even when it is not object-centric? (Is it obvious that entities are required, in general?) Third, I vaguely understand that composing object-centric data sources results in a (possibly disjoint) graph: but are graphs the only/best way to think about this? Further, how can I convince myself about this (presumably obvious) fact about data integration.

  1. The value of a communication network is proportional to the number of connections between nodes, or n^2 for n nodes []

Tags: , , , , , , , ,
Posted in information ecosystem, semantic web | Comments (1)