Mapping the Community III: a Model

Jul 192013

Mapping an academic field is not something new. In fact, mapping academic fields is itself an academic field, largely associated with library and information science, but scattered about. So if one wants to map a field, there is an established way to do it.

Let’s take a look. Alan Porter & Scott Cunningham‘s Tech Mining is primarily about how to use large online databases to find out what academics and fellow travelers are publishing, patenting, or otherwise producing. I haven’t gotten my copy yet – the library only has an ebook, and these are difficult to deal with – but here are some observations.

First, Porter and Cunningham make it clear that their techniques are designed to survey a mature science, in the following sense. In 1962, Thomas Kuhn published a book, The Structure of Scientific Revolutions, in which he proposed that science does not develop smoothly. Instead, he claimed that a scientific field can develop somewhat steadily as practitioners explore what is at hand, under a consensus paradigm. That paradigm is the consensus of the basic facts, common perspective, and general agenda of what’s important. Every once and a while, a paradigm shift radically alters the basic facts, perspective and agenda of the practitioners as they replace an old paradigm with a new one; we are familiar with shifts like the revolution in quantum mechanics in the early Twentieth century, or of plate tectonics in the latter. We could consider the long steady periods as mature science.
Another way of looking at this is via David Hilbert’s analogy of a science as a tree. Following my exploration of a similar metaphor, one could consider a scientific field as a vine or tree, slowly exploring the features of some edifice. Every once and a while, there is a dramatic rearrangement, but then things return to normal. What Porter and Cunningham outline is a methodology for mapping out the structure of this tree / vine, under the assumption that it is at some level of maturity.
Second, this is not a philosophy book; it is a book describing how to mine data. The data consists of records in vast and idiosyncratic databases like Thomson-Reuter’s somewhat snooty Web of Science (which includes the Science Citation Index), The Online Computer Library Center‘s World Cat (catalog) (which supposedly contains practically everything), and Google Scholar, among others. The book is about how to get meaningful data, which is the rub. Suppose, for example, you wanted to find the articles published last year on “Mathematical Crystallography.” Unfortunately, nothing requires scholars to classify their papers as “Mathematical Crystallography” if that is one of the primary or secondary subjects of the paper. Instead, one relies on authors, editors, and reviewers to insert keywords, codes, and review text that will somehow indicate whether or not this is a paper on “Mathematical Crystallography.” Consequently, when searching for scholarly activity in “Mathematical Crystallography,” there is the old problem of what statisticians call “false positives” (falsely identifying a book on crystal gazing as a work in “Mathematical Crystallography”) versus “false negatives” (falsely failing to recognize an article on symmetries of unbounded polytopes as a work in “Mathematical Crystallography). There is always a tradeoff between the two, and Porter and Cunningham provide techniques for minimizing both errors.
One strategy that should be useful in a (let’s be honest) scattered and marginal field like “Mathematical Crystallography” arises from Porter & Cunningham’s observation that “Most papers are rarely read; few are heavily cited (the most common number of citations to a paper is zero). A few papers and authors in any specialty are cited repeatedly. Those papers already cited become easier to find, and more attractive to scientists looking for key references. As a result ‘the rich get richer.’ This is known as ‘the Matthew Effect.” One starts with a few highly cited works in the subject, or a few works that seem central to the subject, look at the works that they cite or that cite them, and pick up the keywords and codes of these primary and secondary works, and then use them in searches.

I decided to look at some papers on this kind of thing, and thanks to some kindly assistance from Mike Grienesen at UC Davis, I decided to look at a few publications on nano-stuff. That is also the route that led me from logic to crystallography (which, you must admit, is a strange transition), and it happens to be all the rage. In fact, the papers I looked agreed that nano-stuff is all the rage. So here are a few papers, in chronological order, which should give us an idea of what this sort of investigation might consist of, and what sort of light it might shed.

Plenty of room, plenty of history by Chris Toumey, Nature Nanotechnology 4 (2009), 783 – 784. Toumey explores the effect of Richard Feynman’s 1959 talk There’s Plenty of Room at the Bottom (which was followed up by a 1983 talk on Infinitesimal Machinery; see also Feynman on Tiny Machines on You-Tube). Legend has it that nano-stuff arose out of the first talk, but Toumey found only seven citations as of 1980 (although Toumey reports that nano-stuff-ologists recall talking about it at coffee shops; this brings us to the ugly subject of the reliability of memory, so moving right along…). But then the scanning (tunneling) microscope was invented in 1981, Eric Drexler published Engines of Creation in 1986, and in 1991, Don Eigler and Erhard Schweizer announced that they have written “IBM” in Xenon atoms on a nickel surface. Perhaps the moment that things happened was around 1980, not 1960. Toumey does not address the usual history of science question – whether Feynman was simply prescient in anticipating something that would be in the air decades down the road – but he does ask what, exactly, was the effect of Feynman’s speech.
An empirical analysis of nanotechnology research domains by Nazrul Islam and Kumiko Miyazaki, Technovation 30:4 (2010), 229 – 237. “This paper attempts to answer the questions: (1) Which areas of nanotechnologies are currently state of the art and how mature are they? (2) How is the involvement of organizations, regions or countries in the development of nanotechnology knowledge? (3) Which areas of research are most important for specific types of organizations and for specific regions?” One technique described here is to define “research domains” (e.g. “nanomaterials”), which one then defines (e.g. ” Nanomaterials concern [the] control of the structure of materials at nanoscale with great potential to create a range of advanced materials with novel characteristics, functions and applications.” This paper inadvertently introduces a problem for researchers: a lot of this research into academic research involves expensive databases and software that your university might not subscribe to.
Refining search terms for nanotechnology by Alan Porter, Jan Youtie, Philip Shapira & David Schoeneck, J. Nanopart. Res 10 (2008), 715 – 728. This paper concentrates on a “bootstrapping approach” for searches analogous to (3) above in Porter & Cunningham, which Porter et al describe as more effective than just tracking codes and keywords, but also more labor intensive. They outline a sort of sociological survey – complete with questionnaires sent to samples of the relevant population – for getting keywords that maximizes “recall” (i.e., minimizes false negatives) while maximizing “precision” (i.e., minimizes false positives). One problem with nano-stuff is the use of “nano-anything” as a keyword: it is a trendy keyword that increases funding prospects.
How interdisciplinary is nanotechnology? by Alan Porter & Jan Youtie, J Nanopart Res (2009) 11:1023–1041. This article starts with a substantial literature review addressing the question: is nanoscience a field or a collection of somewhat disconnected “mono-disciplinary” fields? What I found most interesting in this article were two recommendations. One is that researchers in a multidisciplinary field learn how to conduct data mining in that field – if only to get some idea of the area they are in. The other recommendation is worth quoting in full:
We suggest two additional paths to nurture crossdisciplinary research. First, to enhance understanding of findings in other disciplines, we encourage attention be given to the language used to present essential findings. Authors and editors should strive to assure that the essential findings of nano-relevant research are presented so as to be as accessible as possible to researchers from other disciplines. For instance, work presented in a materials science journal may well hold high value for a nano-bio researcher. Minimizing jargon and acronyms (and we know that we use them here!), and checking understandability by researchers from other disciplines, should reduce the barriers to nano research knowledge transfer.
This advice should remind readers of Mike Zaworotko’s third challenge.
Nanoscience and Nanotechnology: Evolving Definitions and Growing Footprint on the Scientific Landscape by Michael Grienesen & Minghua Zhang, Small 7:20 (2011), 2836 – 2839. Recalling that Islam & Miyazaki followed Socrates’ advice to first define your terms, Grienesen and Zhang discuss the difficulties in defining one’s terms, beginning with Porter et al’s bugaboo, the keyword “nano-*”. They developed a narrow query, a boolean combination of 270 terms beginning with “nano-“, and conducted a survey using Web of Science (snooty, remember). While the narrow query found 80 % of the “nano*” records for the year 2010, it found only 22 % of the “nano*” records for the year 1991, suggesting that it took a while for researchers to settle on common keywords. They also looked at the proportion of the nation’s entire scientific output that is falls under their narrow query. For 2010 in science, there were 475,745 records in the European Union, 404,226 records in the USA, and 131,742 in China. In nanoscience, using their query, 5.24 % of the European Union’s output was in nanoscience, 4.57 % of the USA’s output was in nanoscience, and 15.32 % of China’s (China was surpassed only by Singapore, at 16.41 %).

Of course, this only gives us an idea of what this kind of study looks like. There are several differences between an investigation of “nano-science / technology” and “mathematical crystallography”.

Scale. Nano-stuff is a vast field – or vast array of fields – and it might be more accurate to compare nano-stuff with all of crystallography. Mathematical crystallography is merely the theoretical support for crystallographic design and analysis (and hence computation), with tendrils extending into mathematics, physics, chemistry, and other areas. Our project has a smaller, if comparably diffuse, subject.
Activity. Nano-stuff is taking off. Meanwhile, at the SIAM Mathematical Aspects of Materials Science conference (below), only two of the eleven presentations in the Mathematical Crystallography minisymposia had an audience of over twenty. I strongly believe that mathematical crystallography is desperately needed for crystallographic design and analysis to achieve their goals, but at the moment, that need has not translated into economic demand.

But at least, this gives us an idea of how to proceed.

One Response to “Mapping the Community III: a Model”

Gregory McColm says:

July 22, 2013 at 9:36 pm

Mike O’Keeffe questions Porter and Cunningham’s claim that “the most common number of citations to a paper is zero,” and he sent me a paper that begins with the sentence “It is a sobering fact that some 90% of papers that have been published in academic journals are never cited. Indeed, as many as 50% of papers are never read by anyone other than their authors, referees and journal editors.” But Mike claims that that this is not true of articles published in “respectable” journals.

I am more familiar with the respectable journals in mathematics, published by Springer, Kluwer, Elsevier (boo, hiss!), and other less evil corporations, various societies by obscure printers, various campus presses, and the like. Of course, mathematics is highly fragmented and full of niche publishers whose specialty is publishing pieces that look like they have “potential.” Then there are what Anil Nerode called the “write-only” journals, tiny presses indistinguishable from the niche publishers (there is no clear line between them), weird open-access journals with huge page charges that send spam invitations to potential authors and editors, and of course, publishers that wind up on the pages of the Chronicle of Higher Education because they sued a blog that said derogatory things about them.

I would suspect that lots of mathematics articles are never cited, a few are cited frequently by a small number of authors, and a very few get a wide number of citations. Where the latter are published is an interesting question.

Anyway, lacking data, in today’s searches in the Web of Science, I just checked among all the articles how many had no citations. In all the searches, it was about ten to twenty percent. For the Web of Science, that means that about ten to twenty percent of the articles published in WOS-listed journals have not been cited by articles in WOS journals – yet. I noticed another interesting thing, hunting among the Fourier spaces: in today’s areas, the publication rate was almost flat over the last two decades, but the citation rate was exploding exponentially. Perhaps people are taking Gian-Carlo Rota‘s advice and citing articles generously. Indeed, old mathematics papers were very skimpy on their citations; it is the modern papers that have one or more pages of citations. Perhaps the notion that most papers are never cited is out of date.

But it leaves open the question of whether most papers are read.

BTW, about publishers suing bloggers, the latest antic is posted on Inside Higher Ed.

Sorry, the comment form is closed at this time.