Onward and upward

September 4th, 2009
by jodi

Today is my last day at Appalachian State University.

Monday I begin a new adventure as community organizer, helping launch Acawiki, a “wiki for academic research”. The brainchild of Neeru Paharia, Acawiki strives to make research papers easier to access and understand. Go write your own summary!

The next month will find me living in Massachusetts, my adult home, while preparing for a move to Ireland!

In October, I’ll be joining the Social Software Unit at DERI for a fellowship. The group does fascinating work on social software and the semantic web. This is a 3(or 4)-year Ph.D. project, where I’ll be working on modeling online discussions/arguments. More about that soon!

I’m looking for practical advice of all sorts—about community organizing, about moving to Ireland and living abroad, about success in Ph.D. studies. Consider this your personal solicitation for tips, tricks, and advice!

Tags: , , , ,
Posted in computer science, higher education, library and information science, random thoughts | Comments (6)

Organizing a PDF library: Mendeley for information extraction, Zotero for open source goodness

August 27th, 2009
by jodi

I’ve been using Zotero for awhile now. I make no secret of the fact that I’m a big fan. In early July I was testing out Mendeley to give a workshop with a colleague who’s been excited about it.

I wanted to see whether Mendeley could reduce any of my pain points. While I’m not moving to Mendeley*, I do plan to take advantage of its whizz-bang PDF organization. When Mendeley offers Zotero integration, I think I’ll be set. *Zotero is opensource; Mendeley is merely free at the moment. Zotero also offers web archiving features while Mendeley is strictly for PDF organization.

I spend a lot of time reading and pulling materials into my library; I spend far less time organizing materials. So I decided I’d try the PDF metadata functions of each. Zotero can pull in materials lots of different ways, but it doesn’t yet have a “pull this PDF in from this URL” button for reports and things that aren’t in databases. I don’t want to spend my time typing up metadata (I’m lazy and busy, what can I say), but I do want to have an organized library. (Hey, got an organizing business? I’d pay for your services.) So the “get metadata for this PDF” features are of prime interest to me.

I usually have a “to read” pile lying around. I did a very non-scientific test, starting with a folder of 44 PDFs (“PDFs to read”). I dragged them into each program.

Zotero had a small point of failure: I expected “get PDF metadata” to be in the Preferences menu, but I had to look up its location on their website. Happily, it’s easy to find from the Support page of zotero.org: Retrieve PDF Metadata. The page explains that metadata comes from Google Scholar, based on the DOI if it’s embedded. That sounds like a reasonable methodology, but one that’s only going to work for recent journal articles and books published by e-savvy publishers. Most of the files I dump into “PDFs to read” are preprints from personal websites or reports from nonprofits’ websites. DOIs aren’t expected in that context.

Of my 44 test cases, Zotero says “No matching references found.” on 26 of them. Results from the 18 “successful” matches are spottier. The first one I checked leads me to believe that things haven’t changed since the last time I tried out this feature, maybe 8 or 10 months ago. It’s an article called A New Approach to Search [PDF], by Joe Weinman, and it’s available from his website. I can identify the source as Business Communications Review, October 2007 from small type in the footer. So can Mendeley. But Zotero calls it Peters, R. S. 1970. Ethics and education. Allen & Unwin Australia. I’m not really sure why. Google search, perhaps?

Zotero’s ‘identification’ of the next article is even stranger:
Capital, R. Sheriff’s Office moves to new facility. Cell 224: 6547. (Notice: the title and journal don’t even belong together!) This article is actually the contest-winning federated search article published by Computers in Libraries [PDF]. It’s available from the publisher’s website. While Information Today publishes some great articles about technology, their HTML doesn’t have any semantic information. Since no one’s yet written a screenscraper for their site, Zotero can’t auto-grab the metadata. But Mendeley successfully identifies this PDF, too.

I wondered whether Mendeley was grabbing metadata from the files so I took a closer look at these two files. Nope, there was very little usable metadata. (Adobe Bridge is great for reading XMP metadata.) Furthermore, the first article (by Weinman) lists its creator as Sharon Wallach; clearly neither program is pulling that.

Onward and upward: overall there are 4 bad identifications and 22 good identifications of the 44, from Zotero. The false positive score of 9% is the part that bothers me the most.

Mendeley does better but it’s not perfect. At first it appears to have identified all 44 PDFs, but there’s a fair bit of missing information (for instance 13 missing the “Published in” field). When I looked closely, I found 26 bad data, 4 could be improved, 2 weren’t identified. Which means I’m satisfied with only 12 of these, but there’s another important factor: Mendeley marks these files as ‘unreviewed’, meaning that the metadata is suspect until I review and/or correct it. So the false positives are easy to detect. This is reassuring. Especially since (unlike Zotero) only one of Mendeley’s identifications was worse than none at all, and it was dead easy to spot:
Fohjoft, W. J., Jg, J. T., Vtfe, T. F., Jo, F., Epo, O., Bcpvu, N. E., et al. (n.d.). !12 3/4 “#$%&$’,5.

It’s interesting to look at where Mendeley fails: non-scientific articles and documents with non-standard title pages. Mendeley chokes on Open Provenance Model and Funny in Farsi (no metadata at all) and label a Master’s report only with the year (2000).

I’m most interested about Funny in Farsi; I would expect better metadata from Random House, but sure enough Bridge doesn’t find any. I like Mendeley’s auto-rename feature, but on the files it doesn’t label, that renaming is a big disadvantage: filenames are often reasonable metadata. These three filenames (opm-v1.01.pdf, Funny_in_Farsi.pdf, and 2576.pdf) give either information about the contents or a chance at refinding it with a search engine. For opm-v1.01.pdf , googling the filename finds it immediately. For Funny_in_Farsi.pdf, searching for Funny in Farsi provides 8 search results, and a savvy searcher could get more metadata (e.g. the publisher’s name) from the results. Searching for 2576.pdf clarke open source finds the third.

I’m also interested in what neither Zotero nor Mendeley got right. Neither correctly identified a PDF with Highlights of the National Museum of American History. Drag and drop of citations (with ugly special characters and all) gives

Zotero:
Parton, J. 2004. Revolutionary Heroes and Other Historical Papers. Kessinger Publishing.

Mendeley:
Museum, N., & History, A. (2008). Star-Spangled Banner, 1814. Smithsonian.

Neither does well on the Palmer report, either:

Zotero:
Bird, A. 1994. Careers as repositories of knowledge: a new perspective on boundaryless careers. Journal of Organizational Behavior: 325-344.

Mendeley:
Factors, I., Palmer, C. I., Teffeau, P. I., Newton, P. C., Assistant, R., Research, I., et al.
(2008). No title. Library, (August).

With a closer look, you can see Mendeley takes the authors as:
Factors, Identifying
Palmer, C I C Institutional Repository Development Final Report Carole L
Teffeau, Principal Investigator Lauren C
Newton, Project Coordinator Mark P
Assistant, Research
Research, Informatics

If you want more details, please leave a comment or drop me a line; I had hoped to add info but decided just to push this out of my queue. I was thinking about it because Mendeley really does help me review the papers I’ve been meaning to read. Guess it’s time to think about that Mendeley to Zotero workflow again!

Tags: , , , , ,
Posted in information ecosystem, reviews | Comments (7)

…a silver moon in the open skies and a single flag unfurled

July 20th, 2009
by jodi

Apollo 11 makes me think of Hope Eyrie.

Hope Eyrie is a lovely commemoration of the Apollo landing. It’s probably the best known and most celebrated filk song.

I love the lyrics, especially the chorus:
“But the Eagle has landed; tell your children when.
Time won’t drive us down to dust again.”

This video (a tad cheesy in places) is captioned with the lyrics:

Each of these videos features the music of Hope Eyrie, written by Leslie Fish & performed by Julia Ecklar. Thanks to the generosity of Prometheus Music, you can download an mp3 from http://www.totouchthestars.com (a fun space-related album from a great filk label).

Tags: , , ,
Posted in random thoughts | Comments (1)

Paper as a Social Object: “creating conversations, collecting scribbles, instigating adventures”

June 19th, 2009
by jodi

I love it when paper and digital formats are both used for what they do best. Like the Incidental:
“The Incidental is [a] feedback loop made out of paper and human interactions – timebound, situated and circulating in a place.” [Schulz and Webb]

annotated incidental 4/25/09

annotated incidental 4/25/09

“Over in Milan at the Salone di Mobile they’ve created a thing called The Incidental. It’s like a guide to the event but it’s user generated and a new one is printed every day. When I say user generated, I mean that literally. People grab the current day’s copy and scribble on it. So they annotate the map with their personal notes and recommendations. Each day the team collect the scribbled on ones, scan them in and print an amalgamated version out again. You have to see it, to get it. But it’s great to see someone doing something exciting with ‘almost instant’ printing and for a real event and a real client too.

The actual paper is beautiful and very exciting. It has a fabulous energy that has successfully migrated from the making of the thing to the actual thing. Which is also brilliant and rare. [Ben Terrett as quoted by Schulz and Webb]

The Incidental was created at and for Milan’s furniture/design fair with funding by The British Council.

Tags: , , , , ,
Posted in future of publishing, information ecosystem | Comments (0)

JCDL 2009 Poster Session in Second Life

June 18th, 2009
by jodi

Last night I popped into Second Life for a poster session. JCDL 2009 is going on in Austin this week, and several of the posters were on display in the Digital Preserve region of SL. Chris Beer asked for some screenshots.

Here’s the whole poster space from outside. (Click each image for the ginormous full-size screenshot.)
Poster Session Entrance
My avatar (TR Telling) is in a bright orange UIUC GSLIS T-shirt, thanks to a class tour Richard Urban led last year. With a closer look, you can spot the screen that was used to project MinuteMadness.

Here are two posters, “Finding Centuries-Old Hyperlinks” and “Toward Automatic Generation of Image-Text Document Surrogates to Optimize Cognition”.
Two Posters: "Finding Centuries-Old Hyperlinks" and "Toward Automatic Generation of Image-Text Document Surrogates to Optimize Cognition"Poster numbers were used for the best poster competition, I believe.

Large text-sizes really help viewing from afar; deft users can get a closer view with ‘mouse look’. I took a second screenshot of the “Finding Centuries-Old Hyperlinks” poster since it was my favorite. Xiaoyue (Elaine) Wang and Eamonn Keogh suggest cross-referencing manuscript pages using icon similarity.
Closer View of "Finding Centuries-Old Hyperlinks"Handouts could be really useful for a SL poster session — I had to settle for taking screenshots. Clicking on the poster could give a copy of the poster, which could include links to more information. A mailbox could facilitate sending messages to the presenters.

One presenter ‘attended’ from New York. Several people are gathered around her poster, which generated a lot of discussion.
postertalk
In the left corner you can see one of the more visually striking posters, a study of LIS students’ impressions of the Kindle, after using it for something like 3 weeks.

To the right of the entrance is a sign that says “What did you think?”, which linked to a comment form to be completed on the Web. I succeeded at that box, but wasn’t able to figure out how to submit a second, in-world comment form.

My avatar is just stepping down from a rotating lazy-susan which held a striking comment box. Getting a comment form and filling it out was straightforward. However, dragging and dropping the form back onto the box, as suggested, didn’t work for me.

I had several interesting conversations, most notably a chat outside in the Poster Garden with Javier Velasco Martin who helped build and furnish the Preserve. Ed Fox was easily identifiable: his avatar’s first name is EdFox. For social gatherings, handles are useful, but for professional gatherings it can be reassuring to know who you’re talking with.

Here’s one last look at the dome from the outside. I love the bright aqua JCDL lettering. And, what trip to Second Life would be complete without some flying?
Flying by the JCDL Poster Session Dome With a closer look, you can see the large comment box in the center of the dome.

Tags: , , , ,
Posted in computer science, future of publishing, higher education, library and information science | Comments (1)

Surveillance, Personal Edition

June 2nd, 2009
by jodi

Have you ever kept a calendar, tracked what you eat, or saved receipts? Simple data, like how much soda you drink, can tell a story:

Giving up Coke (or not) by Tim Graham

Giving up Coke (or not) by Tim Graham


In fact, what you drink can tell several stories. Here’s a more elaborate example, also by Tim Graham:
"I drink therefore I am" by Tim Graham

"I drink therefore I am" by Tim Graham


This is what we call self-surveillance.

What is self-surveillance? Read my article! (Also in PDF). Or Nathan Yau’s blog.

Also added to to the publications page: Nathan Yau & Jodi Schneider “Self-Surveillance,” Bulletin of the American Society for Information Science and Technology Vol. 35, No. 5 June/July 2009, 24-30. [HTML][PDF] . Thanks to Diane Neal (NCCU/U. Western Ontario), who edited the special section on Visual Representation, Search and Retrieval for this issue, and to the Bulletin’s editor Irene Travis and designer Carla Badaracco (who made the 16 figures work for screen and print).

Hat tip to Jenny Levine, whose “How Public is your Privacy” often comes to mind.

Tags: , ,
Posted in information ecosystem, library and information science | Comments (0)

Wolfram|Alpha Roundup

May 20th, 2009
by jodi

I don’t usually go in for roundups. But the chatter about Wolfram|Alpha is so fun and so contradictory, I just had to collect it.

First, what is Wolfram|Alpha?

Let’s start with the tweets:

Working with WolframAlpha reminds me of playing Adventure, Zork and such- “I Wonder if phrasing it this way will work…”. Fun with NLP. – Geoffrey Bilder

wolframalpha thinks star trek is a movie, House (character) is unicode x2302, but is knows that the Boss is Bruce Springsteen – Eric Hellman

Onward to reviews. I’ll give you four types:

  1. “What can it do?”
  2. Mashable’s 10 Easter Eggs and 10 More

  3. “Incredible potential”
  4. James Hendler says:

    …a useful tool for some fields, and mainly a play toy beyond that — at least for now.

    But the potential is incredible. I really feel like it ushers in a new generation of Web applications and opens the door for getting people to realize that search is only the very beginning of what the Web is about.

    Jon Udell is hoping “to be able to compute with facts in a more frictionless way.”

  5. “Let’s improve it”
  6. Deepak Singh wants to enhance Wolfram|Alpha with structured data from other sources like Freebase and the Protein Data Bank.

    Google, Wikipedia, Wolfram|Alpha, two well established, and one nascent, but together, the three make quite a triumvirate of information, complementing each other well. Add to that sources like Freebase and we continue to move towards a world where information and knowledge at different levels gets increasingly accessible and available. The hope is that as that happens, we can solve new problems, and add to that knowledge at a broader scale than we ever have.”

  7. “All that hype for this“?
  8. Snark (what else?) from Ted Dziuba at The Register: Wolfram Alpha – a new kind of Fail

    In a more serious vein, David Weinberger sees Wolfram|Alpha’s Achilles’ heel:

Curation is a source of its strength. It increases the reliability of the information, it enables the computations, and it lets the results pages present interesting and relevant information far beyond the simple factual answer to the question. The richness of those pages will be big factor in the site’s success.

Curation is also WA’s limitation.

WA’s big benefit is that it answers questions authoritatively. WA nails facts down. …It thus ends conversation. Google and Wikipedia aim at continuing and even provoking conversation.

Buzz started in March, with raves from Nova Spivack and Doug Lenat. Rudy Rucker soon followed.

Wolfram|Alpha faviconI also really go in for the favicon.ico. Equals sign, check. Homage to Mathematica, check. Ahem.

See also: Google Squared lauching soon according to TechCrunch’s demo and Search Engine Land’s post.

Tags: , , ,
Posted in math, reviews | Comments (0)

Stop Intellectual Apartheid

March 30th, 2009
by jodi

A call to action from BYU English professor Gideon Burton: Stop intellectual apartheid!

Let me illustrate how academic institutions enforce Intellectual Apartheid through a simple experiment you can perform right now. Let’s say that you are researching lingering effects of South Africa’s apartheid and you discovered (as I did using Google Scholar) a recent article, “Fantasmatic Transactions: On the Persistence of Apartheid Ideology” (published in Subjectivity in July, 2008 by D. Hook). Now for the experiment: click on this link to the full text of the article.

One of two things just occurred. Either you just gained immediate access to a PDF version of the full article; or, more likely, an authentication window popped up requesting your login credentials. It turns out that Palgrave-Macmillan publishes Subjectivity, and through their website one can get access to this article for a mere $30. Alternatively, one may subscribe to the journal for $503 per year.

You really don’t need to go to the developing world to recognize that advanced knowledge is a big club with stiff entrance fees. Even middle class Americans will think twice before throwing down $30 for a scholarly article. How likely will this knowledge ever reach scholars in Mexico or India? And just how broadly can the editors of Subjectivity expect it to reach when subscribing costs $503/year?

Gideon also gives suggestions for scholars, librarians, and administrators.

via Cameron Neylon on friendfeed

Posted in future of publishing, higher education, information ecosystem, library and information science | Comments (1)

Horizon scanning and the digital underbelly

March 29th, 2009
by jodi

Gaynor Backhouse writes a great post about libraries, holding out for “a guided tour of the library’s digital underbelly”. My favorite part is her metaphor about horizon-scanning:

Horizon scanning is a bit like doing a jigsaw you’ve bought from a car boot sale: first of all, it comes in a plastic bag, so there’s no picture to guide you. Secondly, you can see from the myriad sizes of the different pieces that there’s more than one puzzle in there and, thirdly, you know, even as you are handing over your money, that you won’t have all the pieces to complete any one, particular puzzle. [JISC Libraries of the Future | Holding out for a hero: technology, the future and the renaissance of the university librarian.]

Gaynor manages JISC’s TechWatch, keeping up with tech trends for libraries.

I’m not quite sure what the library’s “digital underbelly” is. But this sampling of news art strikes me as one possible example.

Graphics section of the Chicago Tribune, September 9, 1938

Graphics section of the Chicago Tribune, September 9, 1938

The Art the Message: The Story Behind the Chicago Tribune Collection has the same feel of the behind-the-scenes tour Gaynor Backhouse described: “secret stuff” that only the curators know about. This collection was saved by Janet A. Ginsburg, who edits news aggregator trackernews.net and curates a collection of news retrospectives, hosted at her personal site.

For access to the physical collection (now known as the Janet A. Ginsburg Chicago Tribune Collection of the Michigan State University News Archive) contact MSU Communication professor Lucinda Davenport. Images from Janet’s news art exhibit can also be seen at Brainpickings and (with Portuguese commentary) at Segunda Língua. Found via Janet’s comment on Steven Berlin Johnson’s SXSW talk, Old Growth Media And The Future Of News.

Tags: , , , , ,
Posted in library and information science, old newspapers | Comments (0)

Yes!

March 28th, 2009
by jodi
sms by amf on flickr

sms by flickr:amf

Web acceptance letters are now old hat: Newly admitted students at Baylor can get a text message acceptance note.

Since 2006, Creighton University has texted acceptance letters (via SMS bulk sender Dynmark), with messages like “Katie, congratulations. You’ve been admitted to Creighton!”.

Princeton’s acceptance notes made news a while back:

Source: Howard Wainer, “Clear Thinking Made Visible: Redesigning Score Reports for Students,” Chance 15 (Winter 2002), pp. 56-58. via Tufte. Wainer is also the author of Graphic Discovery: A Trout in the Milk and Other Visual Adventures, a very readable classic in statistics and information visualization. If you’ve meant to read Tufte but keep putting it off, this is the book for you.

Tags: , , , , ,
Posted in higher education | Comments (0)