For rights management reasons and also for material engineering reasons, the research architecture will move the computation to the data. That is, the vision of the future here is not one in which major data providers give access to data in big downloadable chunks for reuse and querying in other contexts, but one in which researchers’ queries are somehow formalized in code that the data provider’s servers will run on the researcher’s behalf, presumably also producing economically sized result sets.
There are also some implicit research goals, for those in cyberinfrastructure, digital humanities support, and people in text mining aiming at supporting humanities scholars:
Whatever we mean by “computation,” that is, can’t be locked up in an interface that tightly binds computation and data. Readers already need (and for the most part do not have) our own agents and our own data, our own algorithms for testing, validating, calibrating, and recording our interaction with the black boxes of external infrastructure.
This kind of blackbox infrastructure contrasts with “using technology critically and experimentally, fiddling with knobs to see what happens, and adjusting based on what they find.” when a scholar is “free to write short scripts and see results in quick cycles of exploration”.
- What’s “distant reading”? Think “text mining of literature”–but it’s deeper than that. It’s also called the macroeconomics of literature (“macroanalysis”) and ↩]
- By the way, what approach is the Hathi Trust taking? [↩]