Posts Tagged ‘taxonomies’

A Taxonomy for Decisions

November 4th, 2010

Tim van Gelder provides a taxonomy for decisions:

  1. Intuitive Decisions
  2. Technical Decisions
  3. Deliberative Decisions
  4. Bureaucratic Decisions

Deliberative and bureaucratic decisions are, I think, the most important for collaborative decision-making. Intuitive decisions, made quickly by an individual, are least important for collaboration. Technical decisions have the most interesting description: they are “made by following some well-defined technical procedure”; arguably they are not decisions.

Can you spot any overlaps or gaps? Discuss at his article.

The argumentation community has given a lot of attention to deliberation; I wonder if that has been influenced by the prevalence of deliberation in decision-making, and the difficulty of formal modelling of bureaucracies.

Tags: ,
Posted in argumentative discussions, PhD diary | Comments (0)

A taxonomy of tweets

January 11th, 2010

Here’s a taxonomy of tweets from an experiment at SemanticHacker Blog:

  • User’s current status
  • Private conversations
  • Links to web content
    • links to blog and news articles
    • links to images and videos
    • other links
  • Politics, sports, current events
  • Product recommendations/complaints
  • Advertising  “posted from a company’s twitter account”
  • Spam
  • Other messages “that don’t quite fit under any of the above categories. Fan messages to celebrities, shoutouts to other users, web-based polls and quizzes, and so on.”

via Hak-Lae Kim on twitter

Tags: , ,
Posted in social web | Comments (0)

What types of data do social networks have? See Schneier’s Taxonomy.

November 20th, 2009

Rights to data may depend, says Bruce Schneier, on what type of data it is and who provided it. He provides a useful enumeration:

1. Service data. Service data is the data you need to give to a social networking site in order to use it. It might include your legal name, your age, and your credit card number.

2. Disclosed data. This is what you post on your own pages: blog entries, photographs, messages, comments, and so on.

3. Entrusted data. This is what you post on other people’s pages. It’s basically the same stuff as disclosed data, but the difference is that you don’t have control over the data — someone else does.

4. Incidental data. Incidental data is data the other people post about you. Again, it’s basically same same stuff as disclosed data, but the difference is that 1) you don’t have control over it, and 2) you didn’t create it in the first place.

5. Behavioral data. This is data that the site collects about your habits by recording what you do and who you do it with.

See Schenier’s post for discussion. Via a pointer on Rob Styles’ blog, in turn via Rob’s tweet.

Have you come across other taxonomies for social networking data?

Here’s a simple but far less expressive one way to characterize data on social networks. Is it “about you” or “from you”? Either the first, the second, neither, or both. “Aboutness”, however, is ontologically challenging. Any use for this?

Collaboration/shared control isn’t considered in this taxonomy. For instance, “entrusted data” doesn’t capture the notion of “shared data” in a collaborative system such as wave, a wiki, or perhaps even email.

For behavioral data in libraries, see also “intentional data”, as used by Lorcan Dempsey, back to 2005 (and many times since) [for instance, in discussion with “emergent knowledge”]. I prefer “behavioral data” since much data about intention is by no means deliberate/intentional!

Tags: ,
Posted in social web | Comments (3)

PARC’s Mr. Taggy uses context from social tags

March 1st, 2009

PARC’s Augmented Social Cognition team is doing really interesting work. From time to time, new projects surface on their blog.

Last week, PARC announced the site Mr.Taggy.com, a search engine based on social bookmarking tags:

The problem with using social tags is that they contain a lot of noise, because people often use different words to mean the same thing or the same words to mean different things. The TagSearch algorithm is part of our ongoing research to reduce the noise while amplifying the information signal from social tags.

Mr. Taggy uses “related tags” to reduce the noise.

Filtering makes a difference:

Mr. Taggy results for void

Mr. Taggy results for void

Mr. Taggy search results for void, filtered by semantic web

Mr. Taggy search results for void, filtered by semantic web

Searchers can thumbs-up or thumbs-down each result to provide further context.

Tags: , , ,
Posted in library and information science | Comments (1)

NYTimes Topics: Quirky, Useful Classification, Finding Aid

October 23rd, 2008

Yesterday the NYTimes announced a new API, TimesTags, “based on the taxonomy and controlled vocabulary used by Times indexers since 1851”. The browseable version of this vocabulary, http://topics.nytimes.com/ , is a great entry into NYTimes articles published since 1981.

NYTimes Topics

NYTimes Topics

Ed Summers did some scraping while also asking the Open NYTimes team for a SKOS version. Meanwhile, I’m playing around with the classification (online and scraped). Its quirks seem to reflect how it’s been used, and how it has evolved over time. Classification systems can highlight the material classified; they also tend to give insight into the worldview of the people classifying materials or creating the system. The interplay makes integration of classification systems, such as through topic maps, an interesting research area. But that’s a topic for another day.

Here are some things I’ve noticed while playing around with the vocabulary.

Overall Structure

The NYTimes’ main navigation lists 15 sections. The NYTimes taxonomy has 3 top-level categories: news, opinion, and reference. 7 sections fit within the news taxonomy. Opinion has its own category. Travel is an explicit subject within the reference category. Technology, arts, and style are topical, drawing primarily on the reference category. (Cooking, however, is similar to travel in its treatment.) The 3 advertising sections (jobs, real estate, and auto) are already classified, and thus, out of scope.

The remaining 7 sections we dub “news”. Here are examples of taxonomy terms, showing the category structure:

News

  1. World: international/countriesandterritories
    http://topics.nytimes.com/top/news/international/countriesandterritories/canada
  2. U.S.: national/usstatesterritoriesandpossessions/
    http://topics.nytimes.com/top/news/national/usstatesterritoriesandpossessions/michigan
  3. N.Y. / Region: newyork, newyorkregion
    http://topics.nytimes.com/top/news/newyorkandregion/columns/lens/

    http://topics.nytimes.com/top/news/nyregion/columns/clydehaberman/
    nyregion and newyorkandregion are both used, but they are not interchangeable (in the sense that there aren’t redirects)
  4. Business: business/companies
    http://topics.nytimes.com/top/news/business/companies/spicy-pickle-franchising-inc
  5. Science: science/topics
    http://topics.nytimes.com/top/news/science/topics/quasars

  6. Health: health/diseasesconditionsandhealthtopics
    http://topics.nytimes.com/top/news/health/diseasesconditionsandhealthtopics/amnesia
    As the name (diseases, conditions, health topics) suggests, this encompasses a wide range of topics: particular drugs such as Ritalin, categories of drugs such as antibiotics, topics such as smoking, sleep, teenage pregancy, and twins, and professional groups such as surgery and surgeons.
  7. Sports: sports, olympics
    http://topics.nytimes.com/top/news/sports/baseball/majorleague/philadelphiaphillies
    http://topics.nytimes.com/top/news/sports/probasketball/nationalbasketballassociation/atlantahawks

    Beyond sports, subcategory names vary considerably. Other sections, such as for the Olympics, are outside the main hierarchy:
    http://topics.nytimes.com/olympics/2008/swimming

Opinion: opinion

http://topics.nytimes.com/top/opinion/editorialsandoped/oped/columnists/bobherbert
http://topics.nytimes.com/top/opinion/thepubliceditor/calame
Again, beyond opinion, there is variation. However, editiorialsandoped is the main subcategory.

Reference: reference

http://topics.nytimes.com/top/reference/timestopics/organizations/m/mozilla_foundation

http://topics.nytimes.com/top/reference/timestopics/subjects/s/swimming

Travel
is handled as a subject: http://topics.nytimes.com/top/reference/timestopics/subjects/t/travel_and_vacations

Spelling Discrepancies

Drugs (Pharmaceuticals) has two spellings: drugs_pharmaceuticals and drugspharmaceuticals are aliases.

E TRADE Financial Corporation and E*Trade Financial Corporation, however, appears to be an error: they have some data in common, and other data not in common. Either an error or a bizarre story behind that.

Differences in usage

Where to put recipes

Apples is a subcategory of cooking (e.g. apples):
http://topics.nytimes.com/top/reference/timestopics/subjects/c/cooking_and_cookbooks/apples

Perhaps because apples tend to be used as a cultural reference? Still, where do apple recipes belong?

Pumpkins, on the other hand,  has a subcategory for recipes:
http://topics.nytimes.com/top/reference/timestopics/subjects/p/pumpkins/recipes

Dogs are in science, but fossils are not

While most subjects are classified only alphabetically, there are exceptions. Compare fossils to dogs.
Fossils is a plain-old subject, (subjects/f):
http://topics.nytimes.com/top/reference/timestopics/subjects/f/fossils/

Dogs, however, is a science topic, (news/science/topics):http://topics.nytimes.com/top/news/science/topics/dogs/
I wonder if that’s because dogs are a more common subject than fossils?

Saying what you mean

Disambiguation, eh? Here, shrimp is a topic within science, so don’t expect recipes (except in the ads):
http://topics.nytimes.com/top/news/science/topics/shrimp

Category structure

Prominent subtopics

Subtopics are sometimes listed at the top level. For instance United States Attorneys seems to contain United States Attorneys: Editorials & Opinion. Both are listed at the top of the topics tree.

I find it fascinating that Cookies and Cookies, Recipes are separate topics. Again, culturally justified.

Depth of categories

There may be several levels of subcategories, e.g.

http://topics.nytimes.com/top/news/science/topics/space_shuttle/atlantis

http://topics.nytimes.com/top/reference/timestopics/subjects/w/wines/alsace

Mixing of keyword and controlled terminology

I’m surprised to find “hot dogs” as the top two “articles about dogs”, after some nice featured content. NYTimes may also want to refine handling of multiword terms.

Hot dogs turn up in dogs

Hot dogs turn up in dogs

Another example is “Baby Quasar(Skin Care Devise)” showing up under quasars.

By versus About

Times writers (e.g. Tom Zeller Jr.) are listed in italics and classified as people. The ‘by’ versus ‘about’ distinction is made primarily in meta tags. “PSST” seems to identify Times writers.For instance, compare the meta tags from Tom Zeller Jr’s page:

<meta name=”PT” content=”Topic” />
<meta name=”CG” content=”Times Topics” />
<meta name=”GTN” content=”Zeller, Tom Jr.” />
<meta name=”PST” content=”People” />
<meta name=”PSST” content=”Writer” />

to those on (non-Times) writer Toni Morrison’s page:

<meta name=”PT” content=”Topic” />
<meta name=”CG” content=”Times Topics” />
<meta name=”GTN” content=”Morrison, Toni” />
<meta name=”PST” content=”People” />
<meta name=”SCG” content=”The Public Editor” />

Final thoughts

The world of electronic publishing blurs the lines between producers and indexers. Archival content, served up by organization, person, or topic, is a great offering. The secondary publishing market (abstracting, indexing, etc.) is changing quickly. Source-based browsing, as at NYTimes Topics, is part of that change.

Tags: , , , ,
Posted in old newspapers, reviews | Comments (2)