Keith Hart on Fri, 11 Jul 2003 23:24:59 +0200 (CEST) |
[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]
<nettime> statistical models and society |
The following is a brief summary of a paper I have begun to work on, "From bell-curve to power-law: statistical models between national and world society". It is to be presented at the Association of Social Anthropologists Decennial Conference on Anthropology and Science, held at Manchester, 14-18th July, in a panel entitled "Making and abstracting numbers: the culture and politics of counting". The summary stretches my competence and relies perhaps too heavily on A-L Barabasi's Linked: the new science of networks (Perseus, 2002). I would be glad of any suggestions for further reading. * * * * * * Statistical patterns can be found empirically in nature and society. Their distribution may even conform to mathematical models. Thus, if two unbiased dice are rolled a thousand times, the number seven will occur with roughly six times the frequency of two or twelve. The resulting histogram will be symmetrical with one peak where the mean, median and mode coincide. Or take a large sample of adult human beings and measure their height. Most cases will fall between five and six feet with very few less than four or more than seven feet. Because this is a continuous variable, the results can be plotted on a graph to which a curve may be fitted. It too will have a single peak with fan tails on the high and low ends. We call this the normal distribution or popularly "the bell-curve". For more than a century statistical inference has largely been based on this curve with its parameters of mean and standard deviation. More recently, another statistical pattern has been making the headlines. If you score the number of hits on 7,000 websites in a given day and plot them by size and frequency, the curve hugs the vertical and horizontal axes, indicating few very large numbers and many small numbers. If the same data are plotted on a log-log scale, the result is a straight line sloping down from left to right. This is a typical manifestation of something called a "power-law". A similar formula appears to describe the frequency of words used in natural language; and even the distribution of molecular reactions in cells reveals a few hubs linked to most reactions and many weakly connected molecules. "The new science of networks", basing its statistical approach on the physics of complexity, has been announced by, among others, Albert-Laszlo Barabasi in a recent popular book Linked (2002). Just as, in the late 19th century, the normal distribution seemed to lend unity to statistical patterns emerging in a number of apparently unrelated fields, such as criminology, astronomy and plant genetics, now the power-law appears in fields as disparate as the internet, stock markets, air transport, Hollywood actors' networks, power grids, urban hierarchies and molecular biology. In this brief speculative exercise, I want to explore the possibility that the forms through which we perceive order in the world are derived in part from our experience of society. This is not to deny the empirical occurrence of phenomena that lend weight to the mathematical models used by statisticians, but rather to suggest that their relative prominence in our collective imagination reflects the way we experience society at different times. This is to revive the proposition made by Durkheim and Mauss in Primitive Classification (1905) that cultural forms are social in origin. They supported this claim with reference to Australian totemism, the classification of animals corresponding to clan organization, and to Chinese astrology which reflected the hierarchical organization of that society. Nearer to home, it could be said that Darwin's scheme of evolutionary biology shared many features with the Victorian capitalism of his day, the individualism, natural selection as market competition and so on. So I wish to explore here whether the recent rise to prominence of the power-law distribution, with its premise of extreme inequality, tells us something about our collective experience of society at this time. In particular, I will argue that the normal distribution or bell-curve was very well-suited to the egalitarian and democratic premises of the nation-state form that came to dominate society, at the same time as probabilistic thinking enshrined the bell-curve at the centre of its practice. The power-law has been known for much longer than the decade or so in which it has achieved greater prominence. In the 19th century, when urban economy was not yet fully subsumed under the logic of nation-states, power-laws were discerned in the dramatically uneven growth of cities. Later both Zipf and Pareto proposed something similar in the form of rank-order distributions, the one for word frequencies and the other for income distribution. Pareto is credited with discovering the 20/80 rule -- the idea that 20% of the people own 80% of the wealth or 20% of journal articles account for 80% of the citations. But the premise of inequality contained in this rule was not adapted to the ideology of mid-20th century society and it remained a marginal anomaly. Statistics arose in the mid 19th century as a way of regulating people through enumerating them. Towards the end of the century the growing influence of probabilistic thinking (Hacking, The Taming of Chance) began to reveal some regular statistical patterns that could be applied as models to a series of apparently disparate phenomena. The one that attracted the greatest interest was the frequency distribution we know as the bell-curve. In the 20th century this model was the basis for the development of "parametric" statistics, the mainstream approach to statistical inference. The very word normal says it all -- conformity to a standard revealed by a central tendency, meaning that a population can be described in terms of an average type. The key assumption is randomness. This means that every member of a group has an equal chance of being selected. The democratic premise is obvious. This is an egalitarian as well as an atomistic model. Moreover, the quantities have to be measured on an interval scale, so that size is a continuous variable, not broken up into the separate categories of nominal or ordinal scales. It is my hypothesis that this image of the natural and social world gained credibility from reflecting the premises of the national societies formed in the second half of the 19th century. In the 20th, the nation-state was the modal form of society. Anthropologists transposed its basic assumptions to ethnographic descriptions of so-called primitive societies, thereby demonstrating that the model of cultural homogeneity was universal. The power-law is characterized by a few very large quantities and very many small ones. The curve reflects an exponential rate of growth. In the case of the rapidly growing field of network science, it is commonly observed that there are a few hubs with very many links and a large number of weakly-connected nodes. The discovery of power-laws is related to the physics of complexity, the attempt to study interconnectivity in a non-reductionist way (as opposed to the isolated atoms of the random universe). This science is mainly concerned with the construction of order out of chaos and with the properties of transitional phases, as when chaotic water molecules assume the rigid pattern of ice. Network theory in social science arose in the 1950s as a result of the development of graph theory in mathematics. This theory was based on a number of assumptions that have since come to be seen as unrealistic. The model described an inventory of nodes whose number is fixed and remains unchanged throughout the life of the network. Second, all nodes are equivalent, so that they can only be linked together randomly. These assumptions of randomness, stasis and equivalence were unquestioned for forty years. Territorial society lent some credibility to networks configured in its own image. Thus road maps do not diverge markedly from the model, each centre having roughly the same number of links to the others. Stanley Milgram conducted an experiment in 1967 to see how many personal links would be needed to connect any two randomly selected individuals in the United States. He found the median number of links was 5.5 and this gave rise to the popular idea of "six degrees of separation", that all humanity is connected on average by six links. This "small world" phenomenon does not sit well with the assumptions of a random universe. Then it was discovered that most Hollywood actors, as measured by appearances in the same film, were linked by two or three degrees to Kevin Bacon (who turned out later to be far from the best connected of actors). Mark Granovetter established in 1973 that distance in networks was reduced by weak links between clusters. And the typical clustering of networks was modeled by Watts and Strogatz in 1998. But until now the basic assumptions of original graph theory still held. The key shift emerged with the recognition that some nodes in networks are hubs and some persons are "connectors" (Gladwell The Tipping Point, 2000). People vary enormously in their ability to make social connection and in this they resemble the air traffic grid of the United States, with a few O'Hares and many small airports. By now networks were coming to seen as both intrinsically unequal in the size distribution of nodes and dependent on a few highly connected individuals. But what produces this effect? Barabasi and his collaborators at Notre Dame in the late 90s established the fit between the pattern of internet links and the power-law distribution. This led them to characterize such networks as "scale-free", unlike the central tendency and standard deviation of the normal distribution. There is no characteristic node in a continuous hierarchy such as that typical of the power-law. The exponential character of the curve reflects the fact that networks grow over time and the skewed distribution of links may be accounted for by "preferential attachment". Growth with preferences both accounts for the hub phenomenon (early comers tend to attract more links) and requires us to abandon graph theory's key assumptions, of randomness, stasis and equivalence. The rule appears to be consistent with the market principle that "the rich get richer". Indeed in the network economy, as the Microsoft case confirms, it can even be summarized as "winner takes all". The winner in any network is often unpredictable until one node crosses a threshold and takes off. The trick is to find the threshold. When hubs are undermined, the network as a whole is often visited by "cascading failure". It is clear that the convergence of world markets and the internet has multiplied opportunities for scale-free networks. If corporate hierarchy was well-suited to the era of mass production for national markets, the rise of a web or network model of economy involves a shift from vertical to flat virtual integration, as Castells (The Internet Galaxy, 2000) has long insisted. The detachment of the money circuit from real production and trade (Hart, Money in an Unequal World, 2001) has accelerated recognition that the market is a weighted and directed network, with the mass of ordinary stocks following a few market leaders. Already the power-law has been harnessed to predictive models based on analysis of the movement of the eight or so main stocks in a given sector. "Nature normally hates power-laws", says Barabasi who has done more than most to promote their visibility. Hitherto physicists have found them most often near the critical point of phase transitions, as when a metal is magnetized by heat. The bell-curve is empirically preponderant in the natural world, we are told. Interestingly enough, the Americans have long held that income inequality is inevitable, while the Europeans have tended to deny it. Today webloggers or peer-to-peer activists, the radical democratic wing of internet society, accept the fact of the power-law and claim that as long as choices can made freely (equal opportunity), this inequality is acceptable, one might say normal. Even if it can be shown to be regular, exponential growth is unpredictable. Statistical physicists can only say that sometimes a variable crosses a threshold and then it takes off into the curve described by a power-law. The stakes are high, but anyone can play. This whole paradigm shift in scientific and statistical models seems to coincide with the breakdown of the nation-state as the monopolistic framework of society and with it of the corporate premises of 20th century economy, jobs for life and all that. Since the late 70s the neo-liberal consensus has valorized global markets over national economy and the digital revolution of the 90s has given us a new emergent model of society in the network of networks, the internet. This new world market in commodities and information has revealed stark inequality as the norm. Winner takes all is now understood to be a general principle. The egalitarian premises of nation-states, seeking to curb the polarizing tendencies of markets and capitalism, have given way to an emergent world society in which the rich get richer is now taken to be axiomatic. This may be a transitional stage on the way to a new world order capable of curbing the natural excesses of the market. But for now the power-law is king. It's a different model of statistics, for sure. Perhaps it captures society poised between national and world forms. Or society between state and market, having reverted to a balance between the two more like that of the mid-19th century, before national regulation aspired to curb the domestic excesses of capitalism. The question before us is whether new political forms will enable humanity to curb the polarities of the network economy or market. No-one denies that there is an objective basis for these statistical phenomena in nature and society. But the model that attracts most attention in a given period is likely to reflect underlying tendencies in social experience. Having been raised in the heyday of British social democracy, only to face the new liberalism now, I feel like I have had to undergo several radical paradigm shifts. The models of statistical distribution I have discussed briefly here serve as one way of talking about this momentous transition in society and its cultural forms of expression. Keith Hart Manchester, 15th July 2003 # distributed via <nettime>: no commercial use without permission # <nettime> is a moderated mailing list for net criticism, # collaborative text filtering and cultural politics of the nets # more info: majordomo@bbs.thing.net and "info nettime-l" in the msg body # archive: http://www.nettime.org contact: nettime@bbs.thing.net