Where do Citation Metrics Come From?
Frequently when citation metrics are written about in popular press, they are critiqued for their (mis)use and abuse. There’s a lot to be said about why metrics don’t measure what they say they do. The journal impact factor (JIF), for example, highlights how frequently articles in a specific journal are cited within the last few years, and The h-index produces a similar metric at the author level. Both indicators assume a norm–that in an ideal world, articles that are the “most valuable,” would be recognized universally by everyone and others writing about the topic would change their citation practices to include the new valuable piece. There are numerous problems with the normative assumption. No one reads everything; topics/papers don’t fit neatly in citable categories; citation practices have been historically racist and elitist.* There are numerous other was reasons citation doesn’t straightforwardly identify inherent value, but one of the best is that “value” is a deliberative topic, not just an aggregate value of popular practice.
Citations are meaningful somehow, though. Writers include citations and placing them within writing transforms the literary space of the text. How are they meaningful in a particular situation? It varies. Authors writing their texts include citations for reasons ranging from paying homage to substantiating claims to identifying methodology.† In that last sentence, I included a citation because I cribbed that list from “When to Cite,” which is a hybrid scholarly/tutorial article written by Eugene Garfield, the founder of Web of Knowledge, one of the big three citation databases. Garfield’s article is a normative “how to” article and his list is mostly his personal opinion. I could have just as easily looked at one of my own articles and described why I cited a particular source (at least what I now remember), and give other reasons. Why did I choose Garfield instead of myself or another source about writing? Mostly because he’s famous for inventing citation metrics, I knew about the article already, and his article was easy to find with keywords about citation. That of course doesn’t even begin to describe the ways that readers of this might make sense of why they cite when writing, especially if you believe as I do that much of what is written is beyond anyones intentionality.
So citations perform meaningfulness multivocally and differently at every point of material production. Writers think of them differently as they are positioned in texts. Editors look at them with their own eyes. Readers make sense of them given their own context. Each person also rereads them with new eyes. And on and on. Interpretation varies while the material of citation stays the same.
When citations are aggregated as metrics, their meaningfulness is transformed in a new way. Much like public polls produce something like “public opinion” as a technique of aggregation, citation metrics produce something like “scholarly value.”‡ JIF is calculated by dividing the number of citations to a journal by the number of articles published, both in the last two years. This aggregate value represents the journal’s approximate number of citations per article. The JIF citation metric depends on a vast number of assumptions. The most foundational are the existence of a journal that has published citable articles for the last two years. A more fundamental assumption is that there is a list of every citation to that journal that exists somewhere, ready to reference. This list doesn’t exist. One of the major differences between an impact factor from Web of Science, Scopus, and Google Scholar are the (incomplete) lists of citations that they’ve managed compile. It’s been well documented how these different databases highlight differently curated sets of data and often produce wildly different metrics.
Each aggregated bibliometric value depends on a foundational infrastructure that provides the raw material. The metrics are constructed from what that infrastructure makes available. Each aggregated value flattens out the gaps and specificities of missing parts of the infrastructure. For instance, the Journal of the Medical Humanities includes a variety of genres in its pages–including poetry. Poetry usually isn’t cited, at least not in the same way that a JAMA article would be. Aggregated metrics miss nuance that makes a difference and produce numbers that don’t highlight those difference.
It’s become popular to use the aggregates as evidence for evaluation of individual, publishing, and disciplinary value. At my home institution, a variety of metrics are used to divide public funding among every school in the state. Sometimes some metrics work well, especially if the person or thing being evaluated fits the normative assumptions valued by the metric. Just as often, metrics overlook and provide poor evidence for assessing value. For example, the Quarterly Journal of Speech is frequently esteemed by as the most important journal for rhetoricians in communication departments, primarily because it’s one of the longest running. If you compare it’s 2017 impact factor (.46) to Communication Monographs (1.738), it doesn’t come out so well. Communication Monographs is a more eclectic journal, though. It’s topics often appeal to a generalist audience. The pool of potential citing documents is bigger. Yet it would be a mistake to suggest that Quarterly is less important for people that focus on rhetorical scholarship in communication. You couldn’t learn a lot about rhetorical theory from reading Communication Monographs.
That doesn’t even begin to get at the problems with citation metrics. In 2018, Paula Chakravartty, Rachel Kuo, Victoria Grubbs, Charlton McIlwain pointed out how citational practices in communication forward systemic racism.§ The academic journal system started in Europe and has been overwhelmingly sustained and forwarded by a labor force that is to this day predominantly white.¶ This means both that the scholarly topics of concern emerged from white in-groups and that the majority of editors and supporters are enculturated in that legacy of racism. There has been and continue to be problems of access in education and community that affect what topics and people end up in the pages of the journals. Differences in service loads, teaching expectations, and funding, and much more are glossed over by performance metrics, even though they affect access and opportunity for publishing or citing.# Read the article. The same issues are affected by gender, too. Although there is evidence that gender is less attenuated by citation metrics than in previous decades,** every step toward better inclusion and diversity is met with two back.††
The double edge of aggregate citation metrics is that they perform and provide material evidence of what should be valued. Each time a metric is invoked as evidence of something it lends additional credibility to the metric as evidence. Metrics postulate an invisible norm, which is often that the highest number of citations or mentions is inherently valuable. That norm produces incentives that feedback into the maintenance and care of the infrastructure. If a journal is given better funding or receives more recognition for a higher impact factor, it is incentivized to maximize that impact factor. To say that Communication Monographs is valuable because of its higher impact factor is to simultaneously suggest that the practices that enable that journal are the important ones. If that metric is tied to better funding or more support for that journal, it undercuts the value of specialist journals like Quarterly Journal of Speech or Communication and Critical/Cultural Studies (JIF= .767). Metrics silently lend support to the disparities and differences that plague academic labor. Aggregates flatten contextualized meaning to provide evidence normative behavior. If you are an academic writer that has every thought twice about where to send your writing based on a metric of some sort, you have participated in that norm (guilty here). The norm supports existing academic infrastructure, an infrastructure that does not work for many current problems faced in the 21st century. They reinforce status quo when thought of as indicators of value.
But metrics could instead be looked at as entry points for examination. Each performance metric can be examined for the assumptions and material they are reinforcing, the ones that are supporting normative infrastructure. Since JIF measures and evaluates journals, one way to examine infrastructure would be by looking for what Sarah Ahmed calls “strategic inefficiencies,” the points in production that slow the work of people advocating change. Anyone that has attempted to publish in a journal will be able to tell you about how strategic inefficiencies affected them. (Raise your hand if you have a peer review story.) Collecting these stories, each meaningful in their own way, helps to articulate and forward where value is being manipulated by a metric. Another way to open up the black box of metrics is to read them against their own grain. In a previous post I had conducted a co-citation analysis of several rhetoric journals to identify which citations are grouped together frequently. A typical analysis of co-citation patterns looks at frequent co-citations as foundational research for a field. A different way to look at them would be to see their authors as in-groups/out-groups/gatekeepers in a profession that is just as much defined by who you know as by what you know.
This is all just to say these metrics work both ways, as evidence of both functioning and crumbling infrastructure, and as Shannon Mattern has pointed out, “To fill in the gaps in this literature, to draw connections among different disciplines, is an act of repair or, simply, of taking care — connecting threads, mending holes, amplifying quiet voices.”
- *Chakravartty, P., Kuo, R., Grubbs, V., & McIlwain, C. (2018). #CommunicationSoWhite. Journal of Communication, 68(2), 254–266.
- †Garfield, E. (1996). When to cite. The Library Quarterly: Information, Community, Policy, 66(4), 449–458.
- ‡Hauser, G.A. (2010). Vernacular Voices: The Rhetoric of Publics and Public Spheres. Columbia, SC: University of South Carolina Press.
- §Chakravartty, P., Kuo, R., Grubbs, V., & McIlwain, C. (2018). #CommunicationSoWhite. Journal of Communication, 68(2), 254–266.
- ¶Moxham, N., & Fyfe, A. (2018). The Royal Society and the Prehistory of Peer Review, 1665–1965. The Historical Journal, 61(4), 863–889.
- #Gunning, S. (2000). Now That They Have Us, What’s the Point? In S. G. Lim, M. Herrera-Sobek & G. M. Padilla (Eds.), Power, Race, and Gender in Academe (pp. 171–182). New York, NY: Modern Language Association of America.
- **Andersen, J. P., Schneider, J. W., Jagsi, R., & Nielsen, M. W. (2019). Gender Variations in Citation Distributions in Medicine are Very Small and Due to Self-Citation and Journal Prestige. ELife, 8, e45374; Mayer, V., Press, A., Verhoeven, D., & Sterne, J. (2017). How Do We Intervene in the Stubborn Persistence of Patriarchy in Communication Research? In D. T. Scott & A. Shaw (Eds.), Interventions: Communication theory and practice. New York, NY: Peter Lang.
- ††Caruth, G. D., & Caruth, D. L. (2013). Adjunct faculty: Who are these unsung heroes of academe? Current Issues in Education , 16(3), 1–10.