{"id":153,"date":"2008-10-23T00:14:31","date_gmt":"2008-10-23T04:14:31","guid":{"rendered":"http:\/\/jodischneider.com\/blog\/?p=153"},"modified":"2008-10-23T07:47:34","modified_gmt":"2008-10-23T11:47:34","slug":"nytimes-topics-quirky-useful-classification-finding-aid","status":"publish","type":"post","link":"https:\/\/jodischneider.com\/blog\/2008\/10\/23\/nytimes-topics-quirky-useful-classification-finding-aid\/","title":{"rendered":"NYTimes Topics: Quirky, Useful Classification, Finding Aid"},"content":{"rendered":"<p>Yesterday the NYTimes <a href=\"http:\/\/open.blogs.nytimes.com\/2008\/10\/21\/announcing-the-timestags-api\/\">announced<\/a> a new API, TimesTags, &#8220;based on the taxonomy and controlled vocabulary used by Times indexers since 1851&#8221;. The browseable version of this vocabulary, <a href=\"http:\/\/topics.nytimes.com\/\">http:\/\/topics.nytimes.com\/<\/a> , is a great entry into NYTimes articles published since 1981.<br \/>\n<div id=\"attachment_169\" style=\"width: 310px\" class=\"wp-caption alignnone\"><a href=\"https:\/\/jodischneider.com\/blog\/wp-content\/uploads\/2008\/10\/timestopics.png\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-169\" src=\"https:\/\/jodischneider.com\/blog\/wp-content\/uploads\/2008\/10\/timestopics-300x218.png\" alt=\"NYTimes Topics\" title=\"topics.nytimes.com\" width=\"300\" height=\"218\" class=\"size-medium wp-image-169\" srcset=\"https:\/\/jodischneider.com\/blog\/wp-content\/uploads\/2008\/10\/timestopics-300x218.png 300w, https:\/\/jodischneider.com\/blog\/wp-content\/uploads\/2008\/10\/timestopics.png 620w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><\/a><p id=\"caption-attachment-169\" class=\"wp-caption-text\">NYTimes Topics<\/p><\/div><\/p>\n<p>Ed Summers did some <a href=\"http:\/\/inkdroid.org\/bzr\/nytags\/\">scraping<\/a> while also asking the Open NYTimes team for a SKOS version. Meanwhile, I&#8217;m playing around with the classification (online and scraped). Its quirks seem to reflect how it&#8217;s been used, and how it has evolved over time. Classification systems can highlight the material classified; they also tend to give insight into the worldview of the people classifying materials or creating the system. The interplay makes integration of classification systems, <a href=\"http:\/\/topicmaps.bouvet.no\/blog\/2008\/10\/22\/the-new-york-times-wants-standardized-tags-for-news-sites\/\">such as through topic maps<\/a>, an interesting research area. But that&#8217;s a topic for another day.<\/p>\n<p>Here are some things I&#8217;ve noticed while playing around with the vocabulary.<\/p>\n<h3>Overall Structure<\/h3>\n<p>The NYTimes&#8217; main navigation lists 15 sections. The NYTimes taxonomy has 3 top-level categories: <strong>news<\/strong>,<strong> opinion<\/strong>, and <strong>reference<\/strong>. 7 sections fit within the news taxonomy. Opinion has its own category. Travel is an explicit subject within the reference category. Technology, arts, and style are topical, drawing primarily on the reference category. (Cooking, however, is similar to travel in its treatment.) The 3 advertising sections (jobs, real estate, and auto) are already classified, and thus, out of scope.<\/p>\n<p>The remaining 7 sections we dub &#8220;news&#8221;. Here are examples of taxonomy terms, showing the category structure:<\/p>\n<h4>News<\/h4>\n<ol>\n<li><strong>World<\/strong>: international\/countriesandterritories<br \/>\n<a href=\"http:\/\/topics.nytimes.com\/top\/news\/international\/countriesandterritories\/canada\"> http:\/\/topics.nytimes.com\/top\/<strong>news\/international\/countriesandterritories<\/strong>\/canada<\/a><\/li>\n<li><strong>U.S.<\/strong>: national\/usstatesterritoriesandpossessions\/<br \/>\n<a href=\"http:\/\/topics.nytimes.com\/top\/news\/national\/usstatesterritoriesandpossessions\/michigan\">http:\/\/topics.nytimes.com\/top\/news\/<strong>national\/usstatesterritoriesandpossessions\/<\/strong>michigan<\/a><\/li>\n<li><strong>N.Y. \/ Region<\/strong>: newyork, newyorkregion<a href=\"http:\/\/topics.nytimes.com\/top\/news\/newyorkandregion\/columns\/lens\/\"><br \/>\nhttp:\/\/topics.nytimes.com\/top\/news\/<strong>newyorkandregion<\/strong>\/columns\/lens\/<\/a><br \/>\n<a href=\"http:\/\/topics.nytimes.com\/top\/news\/nyregion\/columns\/clydehaberman\/\">http:\/\/topics.nytimes.com\/top\/news\/<strong>nyregion<\/strong>\/columns\/clydehaberman\/<\/a><br \/>\n<strong>nyregion<\/strong> and<strong> newyorkandregion<\/strong> are both used, but they are not interchangeable (in the sense that there aren&#8217;t redirects)<a href=\"http:\/\/topics.nytimes.com\/top\/news\/nyregion\/columns\/clydehaberman\/\"><br \/>\n<\/a><\/li>\n<li><strong>Business<\/strong>: business\/companies<br \/>\n<a href=\"http:\/\/topics.nytimes.com\/top\/news\/business\/companies\/spicy-pickle-franchising-inc\">http:\/\/topics.nytimes.com\/top\/<strong>news\/business\/companies<\/strong>\/spicy-pickle-franchising-inc<\/a><\/li>\n<li><strong>Science<\/strong>: science\/topics<br \/>\n<a href=\"http:\/\/topics.nytimes.com\/top\/news\/science\/topics\/quasars\">http:\/\/topics.nytimes.com\/top\/<strong>news\/science\/topics<\/strong>\/quasars<\/a><br \/>\n<strong><br \/>\n<\/strong><\/li>\n<li><strong>Health<\/strong>: health\/diseasesconditionsandhealthtopics<br \/>\n<a href=\"http:\/\/topics.nytimes.com\/top\/news\/health\/diseasesconditionsandhealthtopics\/amnesia\">http:\/\/topics.nytimes.com\/top\/<strong>news\/health\/diseasesconditionsandhealthtopics<\/strong>\/amnesia<\/a><br \/>\nAs the name (diseases, conditions, health topics) suggests, this encompasses a wide range of topics: particular drugs such as Ritalin, categories of drugs such as antibiotics, topics such as smoking, sleep, teenage pregancy, and twins, and professional groups such as surgery and surgeons.<\/li>\n<li><strong>Sports<\/strong>: sports, olympics<br \/>\n<a href=\"http:\/\/topics.nytimes.com\/top\/news\/sports\/baseball\/majorleague\/philadelphiaphillies\">http:\/\/topics.nytimes.com\/top\/news\/<strong>sports\/<em>baseball\/majorleague<\/em><\/strong>\/philadelphiaphillies<\/a><a href=\" http:\/\/topics.nytimes.com\/top\/news\/sports\/probasketball\/nationalbasketballassociation\/atlantahawks\"><br \/>\nhttp:\/\/topics.nytimes.com\/top\/news\/<strong>sports\/<em>probasketball\/nationalbasketballassociation<\/em><\/strong>\/atlantahawks<\/a><br \/>\nBeyond <strong>sports<\/strong>, subcategory names vary considerably. Other sections, such as for the Olympics, are outside the main hierarchy:<a href=\" http:\/\/topics.nytimes.com\/olympics\/2008\/swimming\"><br \/>\nhttp:\/\/topics.nytimes.com\/<strong>olympics\/<em>2008<\/em><\/strong>\/swimming<\/a><\/li>\n<\/ol>\n<h4>Opinion<span style=\"color: #888888;\">: opinion<\/span><\/h4>\n<p><a href=\"http:\/\/topics.nytimes.com\/top\/opinion\/editorialsandoped\/oped\/columnists\/bobherbert\">http:\/\/topics.nytimes.com\/top\/<strong>opinion\/editorialsandoped\/<em>oped\/columnists<\/em><\/strong>\/bobherbert<\/a><br \/>\n<a href=\"http:\/\/topics.nytimes.com\/top\/opinion\/thepubliceditor\/calame\">http:\/\/topics.nytimes.com\/top\/<strong>opinion\/thepubliceditor<\/strong>\/calame<\/a><br \/>\nAgain, beyond <strong>opinion<\/strong>, there is variation. However, <em>editiorialsandoped<\/em> is the main subcategory.<\/p>\n<h4>Reference<span style=\"color: #888888;\">: reference<\/span><\/h4>\n<p><a href=\" http:\/\/topics.nytimes.com\/top\/reference\/timestopics\/organizations\/m\/mozilla_foundation\">http:\/\/topics.nytimes.com\/top\/reference\/<strong>timestopics\/<em>organizations\/m<\/em><\/strong>\/mozilla_foundation<\/a><br \/>\n<a href=\"http:\/\/topics.nytimes.com\/top\/reference\/timestopics\/subjects\/s\/swimming\"><br \/>\nhttp:\/\/topics.nytimes.com\/top\/reference\/<strong>timestopics\/<em>subjects\/s<\/em><\/strong>\/swimming<\/a><strong><br \/>\nTravel<\/strong> is handled as a subject: <a href=\"http:\/\/topics.nytimes.com\/top\/reference\/timestopics\/subjects\/t\/travel_and_vacations\">http:\/\/topics.nytimes.com\/top\/reference\/<strong>timestopics\/<em>subjects\/t<\/em><\/strong>\/travel_and_vacations<\/a><\/p>\n<h3>Spelling Discrepancies<\/h3>\n<p>Drugs (Pharmaceuticals) has two spellings: <a href=\"http:\/\/topics.nytimes.com\/top\/news\/health\/diseasesconditionsandhealthtopics\/drugs_pharmaceuticals\">drugs_pharmaceuticals<\/a> and <a href=\"http:\/\/topics.nytimes.com\/top\/news\/health\/diseasesconditionsandhealthtopics\/drugspharmaceuticals\">drugspharmaceuticals<\/a> are aliases.<\/p>\n<p><a href=\"http:\/\/topics.nytimes.com\/top\/news\/business\/companies\/e-trade-financial-corporation\">E TRADE Financial Corporation<\/a> and <a href=\"http:\/\/topics.nytimes.com\/top\/news\/business\/companies\/etrade_financial_corporation\">E*Trade Financial Corporation<\/a>, however, appears to be an error: they have some data in common, and other data not in common. Either an error or a bizarre story behind that.<\/p>\n<h3>Differences in usage<\/h3>\n<h4>Where to put recipes<\/h4>\n<p>Apples is a subcategory of cooking (e.g. apples):<a href=\"http:\/\/topics.nytimes.com\/top\/reference\/timestopics\/subjects\/c\/cooking_and_cookbooks\/apples\"><br \/>\nhttp:\/\/topics.nytimes.com\/top\/reference\/timestopics\/subjects\/c\/cooking_and_cookbooks\/apples<\/a><br \/>\nPerhaps because apples tend to be used as a cultural reference? Still, where do apple recipes belong?<\/p>\n<p>Pumpkins, on the other hand,\u00a0 has a subcategory for recipes:<a href=\" http:\/\/topics.nytimes.com\/top\/reference\/timestopics\/subjects\/p\/pumpkins\/recipes\"><br \/>\nhttp:\/\/topics.nytimes.com\/top\/reference\/timestopics\/subjects\/p\/pumpkins\/recipes<\/a><\/p>\n<h4>Dogs are in science, but fossils are not<\/h4>\n<p>While most subjects are classified only alphabetically, there are exceptions. Compare fossils to dogs.<br \/>\nFossils is a plain-old subject,  (subjects\/f):<a href=\"http:\/\/topics.nytimes.com\/top\/reference\/timestopics\/subjects\/f\/fossils\/\"><br \/>\nhttp:\/\/topics.nytimes.com\/top\/reference\/timestopics\/subjects\/f\/fossils\/<\/a><br \/>\nDogs, however, is a science topic, (news\/science\/topics):<a href=\"http:\/\/topics.nytimes.com\/top\/news\/science\/topics\/dogs\/\">http:\/\/topics.nytimes.com\/top\/news\/science\/topics\/dogs\/<\/a><br \/>\nI wonder if that&#8217;s because dogs are a more common subject than fossils?<\/p>\n<h4>Saying what you mean<\/h4>\n<p>Disambiguation, eh? Here, shrimp is a topic within science, so don&#8217;t expect recipes (except in the ads):<br \/>\n<a href=\"http:\/\/topics.nytimes.com\/top\/news\/science\/topics\/shrimp\">http:\/\/topics.nytimes.com\/top\/news\/science\/topics\/shrimp<\/a><\/p>\n<h3>Category structure<\/h3>\n<h4>Prominent subtopics<\/h4>\n<p>Subtopics are sometimes listed at the top level. For instance <a href=\"http:\/\/topics.nytimes.com\/top\/reference\/timestopics\/subjects\/u\/united_states_attorneys\/index.html\">United States Attorneys<\/a> seems to contain <a href=\"http:\/\/topics.nytimes.com\/top\/reference\/timestopics\/subjects\/u\/united_states_attorneys\/editorials_and_opinion\/index.html\">United States Attorneys: Editorials &amp; Opinion<\/a>. Both are listed at the top of the topics tree.<\/p>\n<p>I find it fascinating that <a href=\"http:\/\/topics.nytimes.com\/top\/reference\/timestopics\/subjects\/c\/cookies\">Cookies<\/a> and <a href=\"http:\/\/topics.nytimes.com\/top\/reference\/timestopics\/subjects\/c\/cookies\/recipes\">Cookies, Recipes<\/a> are separate topics. Again, culturally justified.<\/p>\n<h4>Depth of categories<\/h4>\n<p>There may be several levels of subcategories, e.g.<a href=\" http:\/\/topics.nytimes.com\/top\/news\/science\/topics\/space_shuttle\/atlantis\"><br \/>\n<\/a><\/p>\n<p><a href=\" http:\/\/topics.nytimes.com\/top\/news\/science\/topics\/space_shuttle\/atlantis\">http:\/\/topics.nytimes.com\/top\/news\/science\/topics\/space_shuttle\/atlantis<\/a><\/p>\n<p><a href=\"http:\/\/topics.nytimes.com\/top\/reference\/timestopics\/subjects\/w\/wines\/alsace\">http:\/\/topics.nytimes.com\/top\/reference\/timestopics\/subjects\/w\/wines\/alsace<\/a><\/p>\n<h3><strong>Mixing of keyword and controlled terminology<\/strong><\/h3>\n<p>I&#8217;m surprised to find &#8220;hot dogs&#8221; as the top two &#8220;articles about dogs&#8221;, after some nice featured content. NYTimes may also want to refine handling of multiword terms.<\/p>\n<div id=\"attachment_164\" style=\"width: 274px\" class=\"wp-caption alignnone\"><a href=\"https:\/\/jodischneider.com\/blog\/wp-content\/uploads\/2008\/10\/dogs-or-hot-dogs.png\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-164\" class=\"size-medium wp-image-164\" title=\"dogs-or-hot-dogs\" src=\"https:\/\/jodischneider.com\/blog\/wp-content\/uploads\/2008\/10\/dogs-or-hot-dogs-264x300.png\" alt=\"Hot dogs turn up in dogs\" width=\"264\" height=\"300\" srcset=\"https:\/\/jodischneider.com\/blog\/wp-content\/uploads\/2008\/10\/dogs-or-hot-dogs-264x300.png 264w, https:\/\/jodischneider.com\/blog\/wp-content\/uploads\/2008\/10\/dogs-or-hot-dogs.png 618w\" sizes=\"auto, (max-width: 264px) 100vw, 264px\" \/><\/a><p id=\"caption-attachment-164\" class=\"wp-caption-text\">Hot dogs turn up in dogs<\/p><\/div>\n<p>Another example is &#8220;<a href=\"http:\/\/topics.nytimes.com\/top\/news\/science\/topics\/quasars\/index.html?query=BABY%20QUASAR%20(SKIN%20CARE%20DEVICE)&amp;field=des&amp;match=exact\">Baby Quasar(Skin Care Devise)<\/a>&#8221; showing up under <a href=\"http:\/\/topics.nytimes.com\/top\/news\/science\/topics\/quasars\/index.html\">quasars<\/a>.<\/p>\n<h3>By versus About<\/h3>\n<p>Times writers (e.g.  <a href=\"http:\/\/topics.nytimes.com\/top\/reference\/timestopics\/people\/z\/tom_jr_zeller\/index.html\">Tom Zeller Jr.<\/a>) are listed in italics and classified as people. The &#8216;by&#8217; versus &#8216;about&#8217; distinction is made primarily in meta tags. &#8220;PSST&#8221; seems to identify Times writers.For instance, compare the meta tags from Tom Zeller Jr&#8217;s page:<\/p>\n<p style=\"padding-left: 30px;\">&lt;meta name=&#8221;PT&#8221; content=&#8221;Topic&#8221; \/&gt;<br \/>\n&lt;meta name=&#8221;CG&#8221; content=&#8221;Times Topics&#8221; \/&gt;<br \/>\n&lt;meta name=&#8221;GTN&#8221; content=&#8221;Zeller, Tom Jr.&#8221; \/&gt;<br \/>\n&lt;meta name=&#8221;PST&#8221; content=&#8221;People&#8221; \/&gt;<br \/>\n&lt;meta name=&#8221;PSST&#8221; content=&#8221;Writer&#8221; \/&gt;<\/p>\n<p>to those on (non-Times) writer <a href=\"http:\/\/topics.nytimes.com\/top\/reference\/timestopics\/people\/m\/toni_morrison\/\">Toni Morrison&#8217;s page<\/a>:<\/p>\n<p style=\"padding-left: 30px;\">&lt;meta name=&#8221;PT&#8221; content=&#8221;Topic&#8221; \/&gt;<br \/>\n&lt;meta name=&#8221;CG&#8221; content=&#8221;Times Topics&#8221; \/&gt;<br \/>\n&lt;meta name=&#8221;GTN&#8221; content=&#8221;Morrison, Toni&#8221; \/&gt;<br \/>\n&lt;meta name=&#8221;PST&#8221; content=&#8221;People&#8221; \/&gt;<br \/>\n&lt;meta name=&#8221;SCG&#8221; content=&#8221;The Public Editor&#8221; \/&gt;<\/p>\n<h3>Final thoughts<\/h3>\n<p>The world of electronic publishing blurs the lines between producers and indexers. Archival content, served up by organization, person, or topic, is a great offering. The secondary publishing market (abstracting, indexing, etc.) is changing quickly. Source-based browsing, as at NYTimes Topics, is part of that change.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Yesterday the NYTimes announced a new API, TimesTags, &#8220;based on the taxonomy and controlled vocabulary used by Times indexers since 1851&#8221;. The browseable version of this vocabulary, http:\/\/topics.nytimes.com\/ , is a great entry into NYTimes articles published since 1981. Ed Summers did some scraping while also asking the Open NYTimes team for a SKOS version. [&hellip;]<\/p>\n","protected":false},"author":3,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[12,4],"tags":[32,29,27,30,31],"class_list":["post-153","post","type-post","status-publish","format-standard","hentry","category-old-newspapers","category-reviews","tag-browsing","tag-classification","tag-nytimes","tag-tagging","tag-taxonomies"],"_links":{"self":[{"href":"https:\/\/jodischneider.com\/blog\/wp-json\/wp\/v2\/posts\/153","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/jodischneider.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/jodischneider.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/jodischneider.com\/blog\/wp-json\/wp\/v2\/users\/3"}],"replies":[{"embeddable":true,"href":"https:\/\/jodischneider.com\/blog\/wp-json\/wp\/v2\/comments?post=153"}],"version-history":[{"count":20,"href":"https:\/\/jodischneider.com\/blog\/wp-json\/wp\/v2\/posts\/153\/revisions"}],"predecessor-version":[{"id":174,"href":"https:\/\/jodischneider.com\/blog\/wp-json\/wp\/v2\/posts\/153\/revisions\/174"}],"wp:attachment":[{"href":"https:\/\/jodischneider.com\/blog\/wp-json\/wp\/v2\/media?parent=153"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/jodischneider.com\/blog\/wp-json\/wp\/v2\/categories?post=153"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/jodischneider.com\/blog\/wp-json\/wp\/v2\/tags?post=153"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}