Posts Tagged ‘zotero’

Sente, a first look

August 1st, 2011

Today I’ve been testing out Sente, on the theory that it might help me organize the PDFs I’m annotating on my iPad.

The desktop application is geared to Mac users who really care about bibliographies, with several fantastic features, including

I like Sente’s statuses; read/unread and Recently Modified and Recently Added are automatically tracked, and you can rate items. I especially like the workflow statuses, which match some of my common tasks:

  • Get Full Text
  • Discuss Further
  • Cite
  • Do Not Cite

“Sort by citation” is surprisingly illuminating: I didn’t realize how many papers from “Discourse Studies” I’d been looking at recently.

Another great feature that could be easily and fruitfully added to most other bibliographic managers: title case and exact case lists (I am *so* sick of seeing lowercased ‘wikipedia’ in bibliographies!), which you can very easily customize.
Sente also has a journal dictionary: You can assign the abbreviations and ISSNs (authority control, yippee!)!

Their visual display could use an update (thankfully it’s on the way) and I find their icons confusing (maybe ‘pencil’ for ‘note’ is sensible, but what in the world about ‘four dots in a diamond shape’ says ‘abstract’ to you?)

I tested the Zotero import. As I wrote Sente’s developers, there are some issues:

In testing it out on my large (5000+ item) Zotero library I see that:

  1. HTML attachments are not copied into the Sente library
  2. Image attachments are not copied into the Sente library
  3. Text note attachments are not copied into the Sente library
  4. Subcollections are not preserved

Since then, I’ve noticed that the keywords don’t get imported. Further, the date added and “date modified” fields are not preserved, but instead now reflect the import date and time (as I noted on twitter). But I do like their duplicate detection. Along with promising to consolidate matched items, they provide a report about the discarded matches. For instance:

Rule “DOI rule” flagged these two references as possible duplicates:
Vilar, P., & Žumer, M. (2008). Perceptions and importance of user friendliness of IR systems according to users’ individual characteristics and academic discipline. Journal of the American Society for Information Science & Technology, 59(12), 1995-2007. doi:Article
Quick-Response Barcodes. (2008). Library Technology Reports, 44(5), 46-47. doi:Article
However, the match was rejected because the references differ in: Article Title, pages, Publication Title, URL, Volume, Issue.

I have played briefly with the Sente’s free iPad viewer, but not yet with their paid ($19.99) app which allows annotation. Based on reviews (why no permalinks, Apple?), “Export seems to be an option but crucially, import is not.” However, if Sente’s annotation is enough, there’s hope, since documentation of the Sync functionality already in the current (6.2) version the description of Sync for the planned 6.5 release (via this) is *very* promising: “As you read a PDF on your iPad on the bus ride home, highlighting passages and taking notes, the highlighting and notes appear in all copies by the time you arrive home.”

By Sente user standards, I am far from a power user: the biggest databases seem to be about 10 times mine. This could be an improvement from Zotero, where my library speed can’t quite keep up some days. I’d be *very* interested to hear from enthusiastic Sente users. Switching seems quite feasible, and probably worth checking out their iPad app.

The main obvious concerns I have are about notetaking and portability. Notetaking of offline/non-fulltext items is important but doesn’t seem to have been a particular focus of development. Portability is incredibly important: I need to ensure that export (and ideally import) brings along files and notes as well as PDFs.

I’ve been thinking of direct, in-file PDF annotation as the best possible way to ensure that my annotations outlive my reference manager. Should I rethink that? So far (according to their draft manual as above): “Highlighting created in Sente 6.2 is not stored in the PDF itself — it is stored in the library database. This change has several very positive effects, notably on syncing.” Let me know what you think in the comments!

Tags: , , ,
Posted in books and reading, library and information science, reviews, scholarly communication | Comments (2)

Organizing a PDF library: Mendeley for information extraction, Zotero for open source goodness

August 27th, 2009

I’ve been using Zotero for awhile now. I make no secret of the fact that I’m a big fan. In early July I was testing out Mendeley to give a workshop with a colleague who’s been excited about it.

I wanted to see whether Mendeley could reduce any of my pain points. While I’m not moving to Mendeley*, I do plan to take advantage of its whizz-bang PDF organization. When Mendeley offers Zotero integration, I think I’ll be set. *Zotero is opensource; Mendeley is merely free at the moment. Zotero also offers web archiving features while Mendeley is strictly for PDF organization.

I spend a lot of time reading and pulling materials into my library; I spend far less time organizing materials. So I decided I’d try the PDF metadata functions of each. Zotero can pull in materials lots of different ways, but it doesn’t yet have a “pull this PDF in from this URL” button for reports and things that aren’t in databases. I don’t want to spend my time typing up metadata (I’m lazy and busy, what can I say), but I do want to have an organized library. (Hey, got an organizing business? I’d pay for your services.) So the “get metadata for this PDF” features are of prime interest to me.

I usually have a “to read” pile lying around. I did a very non-scientific test, starting with a folder of 44 PDFs (“PDFs to read”). I dragged them into each program.

Zotero had a small point of failure: I expected “get PDF metadata” to be in the Preferences menu, but I had to look up its location on their website. Happily, it’s easy to find from the Support page of zotero.org: Retrieve PDF Metadata. The page explains that metadata comes from Google Scholar, based on the DOI if it’s embedded. That sounds like a reasonable methodology, but one that’s only going to work for recent journal articles and books published by e-savvy publishers. Most of the files I dump into “PDFs to read” are preprints from personal websites or reports from nonprofits’ websites. DOIs aren’t expected in that context.

Of my 44 test cases, Zotero says “No matching references found.” on 26 of them. Results from the 18 “successful” matches are spottier. The first one I checked leads me to believe that things haven’t changed since the last time I tried out this feature, maybe 8 or 10 months ago. It’s an article called A New Approach to Search [PDF], by Joe Weinman, and it’s available from his website. I can identify the source as Business Communications Review, October 2007 from small type in the footer. So can Mendeley. But Zotero calls it Peters, R. S. 1970. Ethics and education. Allen & Unwin Australia. I’m not really sure why. Google search, perhaps?

Zotero’s ‘identification’ of the next article is even stranger:
Capital, R. Sheriff’s Office moves to new facility. Cell 224: 6547. (Notice: the title and journal don’t even belong together!) This article is actually the contest-winning federated search article published by Computers in Libraries [PDF]. It’s available from the publisher’s website. While Information Today publishes some great articles about technology, their HTML doesn’t have any semantic information. Since no one’s yet written a screenscraper for their site, Zotero can’t auto-grab the metadata. But Mendeley successfully identifies this PDF, too.

I wondered whether Mendeley was grabbing metadata from the files so I took a closer look at these two files. Nope, there was very little usable metadata. (Adobe Bridge is great for reading XMP metadata.) Furthermore, the first article (by Weinman) lists its creator as Sharon Wallach; clearly neither program is pulling that.

Onward and upward: overall there are 4 bad identifications and 22 good identifications of the 44, from Zotero. The false positive score of 9% is the part that bothers me the most.

Mendeley does better but it’s not perfect. At first it appears to have identified all 44 PDFs, but there’s a fair bit of missing information (for instance 13 missing the “Published in” field). When I looked closely, I found 26 bad data, 4 could be improved, 2 weren’t identified. Which means I’m satisfied with only 12 of these, but there’s another important factor: Mendeley marks these files as ‘unreviewed’, meaning that the metadata is suspect until I review and/or correct it. So the false positives are easy to detect. This is reassuring. Especially since (unlike Zotero) only one of Mendeley’s identifications was worse than none at all, and it was dead easy to spot:
Fohjoft, W. J., Jg, J. T., Vtfe, T. F., Jo, F., Epo, O., Bcpvu, N. E., et al. (n.d.). !12 3/4 “#$%&$’,5.

It’s interesting to look at where Mendeley fails: non-scientific articles and documents with non-standard title pages. Mendeley chokes on Open Provenance Model and Funny in Farsi (no metadata at all) and label a Master’s report only with the year (2000).

I’m most interested about Funny in Farsi; I would expect better metadata from Random House, but sure enough Bridge doesn’t find any. I like Mendeley’s auto-rename feature, but on the files it doesn’t label, that renaming is a big disadvantage: filenames are often reasonable metadata. These three filenames (opm-v1.01.pdf, Funny_in_Farsi.pdf, and 2576.pdf) give either information about the contents or a chance at refinding it with a search engine. For opm-v1.01.pdf , googling the filename finds it immediately. For Funny_in_Farsi.pdf, searching for Funny in Farsi provides 8 search results, and a savvy searcher could get more metadata (e.g. the publisher’s name) from the results. Searching for 2576.pdf clarke open source finds the third.

I’m also interested in what neither Zotero nor Mendeley got right. Neither correctly identified a PDF with Highlights of the National Museum of American History. Drag and drop of citations (with ugly special characters and all) gives

Zotero:
Parton, J. 2004. Revolutionary Heroes and Other Historical Papers. Kessinger Publishing.

Mendeley:
Museum, N., & History, A. (2008). Star-Spangled Banner, 1814. Smithsonian.

Neither does well on the Palmer report, either:

Zotero:
Bird, A. 1994. Careers as repositories of knowledge: a new perspective on boundaryless careers. Journal of Organizational Behavior: 325-344.

Mendeley:
Factors, I., Palmer, C. I., Teffeau, P. I., Newton, P. C., Assistant, R., Research, I., et al.
(2008). No title. Library, (August).

With a closer look, you can see Mendeley takes the authors as:
Factors, Identifying
Palmer, C I C Institutional Repository Development Final Report Carole L
Teffeau, Principal Investigator Lauren C
Newton, Project Coordinator Mark P
Assistant, Research
Research, Informatics

If you want more details, please leave a comment or drop me a line; I had hoped to add info but decided just to push this out of my queue. I was thinking about it because Mendeley really does help me review the papers I’ve been meaning to read. Guess it’s time to think about that Mendeley to Zotero workflow again!

Tags: , , , , ,
Posted in information ecosystem, reviews | Comments (7)