Friday, 17 October 2014

Watched by the Web: Surveillance Is Reborn

Books of The Times

Google does it. Amazon does it. Walmart does it. And, as news reports last week made clear, the United States government does it.

Sonny Figueroa/The New York Times


A Revolution That Will Transform How We Live, Work, and Think
By Viktor Mayer-Schönberger and Kenneth Cukier
242 pages. Eamon Dolan/Houghton Mifflin Harcourt. $27.

Rob Judges
Viktor Mayer-Schönberger
Rob Judges
Kenneth Cukier
Does what? Uses “big data” analysis of the swelling flood of data that is being generated and stored about virtually every aspect of our lives to identify patterns of behavior and make correlations and predictive assessments.
Amazon uses customer data to give us recommendations based on our previous purchases. Google uses our search data and other information it collects to sell ads and to fuel a host of other services and products.
The National Security Agency, a news article in The Guardian revealed last week, is collecting the phone records of millions of American customers of Verizon — “indiscriminately and in bulk” and “regardless of whether they are suspected of any wrongdoing” — under a secret court order. Under another surveillance program called Prism, The Guardian and The Washington Post reported, the agency has been collecting data from e-mails, audio and video chats, photos, documents and logins, from leading Internet companies like Microsoft, Yahoo, Google, Facebook and Apple, to track foreign targets.
Why spread such a huge net in search of a handful of terrorist suspects? Why vacuum up data so indiscriminately? “If you’re looking for a needle in the haystack, you need a haystack,” Jeremy Bash, chief of staff to Leon E. Panetta, the former director of the Central Intelligence Agency and defense secretary, said on Friday.
In “Big Data,” their illuminating and very timely book, Viktor Mayer-Schönberger, a professor of Internet governance and regulation at the Oxford Internet Institute at Oxford University, and Kenneth Cukier, the data editor for The Economist, argue that the nature of surveillance has changed.
“In the spirit of Google or Facebook,” they write, “the new thinking is that people are the sum of their social relationships, online interactions and connections with content. In order to fully investigate an individual, analysts need to look at the widest possible penumbra of data that surrounds the person — not just whom they know, but whom those people know too, and so on.”
Mr. Cukier and Mr. Mayer-Schönberger argue that big data analytics are revolutionizing the way we see and process the world — they even compare its consequences to those of the Gutenberg printing press. And in this volume they give readers a fascinating — and sometimes alarming — survey of big data’s growing effect on just about everything: business, government, science and medicine, privacy and even on the way we think. Notions of causality, they say, will increasingly give way to correlation as we try to make sense of patterns.
Data is growing incredibly fast — by one account, it is more than doubling every two years — and the authors of this book argue that as storage costs plummet and algorithms improve, data-crunching techniques, once available only to spy agencies, research labs and gigantic companies, are becoming increasingly democratized.
Big data has given birth to an array of new companies and has helped existing companies boost customer service and find new synergies. Before a hurricane, Walmart learned, sales of Pop-Tarts increased, along with sales of flashlights, and so stores began stocking boxes of Pop-Tarts next to the hurricane supplies “to make life easier for customers” while boosting sales. UPS, the authors report, has fitted its trucks with sensors and GPS so that it can monitor employees, optimize route itineraries and know when to perform preventive vehicle maintenance.
Baseball teams like Billy Beane’s Oakland A’s (immortalized in Michael Lewis’s best-seller “Moneyball”) have embraced new number-crunching approaches to scouting players with remarkable success. The 2012 Obama campaign used sophisticated data analysis to build a formidable political machine for identifying supporters and getting out the vote. And New York City has used data analytics to find new efficiencies in everything from disaster response, to identifying stores selling bootleg cigarettes, to steering overburdened housing inspectors directly to buildings most in need of their attention. In the years to come, Mr. Mayer-Schönberger and Mr. Cukier contend, big data will increasingly become “part of the solution to pressing global problems like addressing climate change, eradicating disease and fostering good governance and economic development.”
There is, of course, a dark side to big data, and the authors provide an astute analysis of the dangers they foresee. Privacy has become much more difficult to protect, especially with old strategies — “individual notice and consent, opting out and anonymization” — losing effectiveness or becoming completely beside the point.
“The ability to capture personal data is often built deep into the tools we use every day, from Web sites to smartphone apps,” the authors write. And given the myriad ways data can be reused, repurposed and sold to other companies, it’s often impossible for users to give informed consent to “innovative secondary uses” that haven’t even been imagined when the data was first collected.
The second danger Mr. Cukier and Mr. Mayer-Schönberger worry about sounds like a scenario from the sci-fi movie “Minority Report,” in which predictions seem so accurate that people can be arrested for crimes before they are committed. In the real near future, the authors suggest, big data analysis (instead of the clairvoyant Pre-Cogs in that movie) may bring about a situation “in which judgments of culpability are based on individualized predictions of future behavior.”
Already, insurance companies and parole boards use predictive analytics to help tabulate risk, and a growing number of places in the United States, the authors of “Big Data” say, employ “predictive policing,” crunching data “to select what streets, groups and individuals to subject to extra scrutiny, simply because an algorithm pointed to them as more likely to commit crime.”
Last week an NBC report noted that in so-called signature drone strikes “the C.I.A. doesn’t necessarily know who it is killing”: in signature strikes “intelligence officers and drone operators kill suspects based on their patterns of behavior — but without positive identification.”
One problem with relying on predictions based on probabilities of behavior, Mr. Mayer-Schönberger and Mr. Cukier argue, is that it can negate “the very idea of the presumption of innocence.”
“If we hold people responsible for predicted future acts, ones they may never commit,” they write, “we also deny that humans have a capacity for moral choice.”
At the same time, they observe, big data exacerbates “a very old problem: relying on the numbers when they are far more fallible than we think.” They point to escalation of the Vietnam War under Robert S. McNamara (who served as secretary of defense to Presidents John F. Kennedy and Lyndon B. Johnson) as a case study in “data analysis gone awry”: a fierce advocate of statistical analysis, McNamara relied on metrics like the body count to measure the progress of the war, even though it became clear that Vietnam was more a war of wills than of territory or numbers.
More recent failures of data analysis include the Wall Street crash of 2008, which was accelerated by hugely complicated trading schemes based upon mathematical algorithms. In his best-selling 2012 book, “The Signal and the Noise,” the statistician Nate Silver, who writes the FiveThirtyEight blog for The New York Times, pointed to failures in areas like earthquake science, finance and biomedical research, arguing that “prediction in the era of Big Data” has not been “going very well” (despite his own successful forecasts in the fields of politics and baseball).
Also, as the computer scientist and musician Jaron Lanier points out in his brilliant new book, “Who Owns the Future?,” there is a huge difference between “scientific big data, like data about galaxy formation, weather or flu outbreaks,” which with lots of hard work can be gathered and mined, and “big data about people,” which, like all things human, remains protean, contradictory and often unreliable.
To their credit, Mr. Cukier and Mr. Mayer-Schönberger recognize the limitations of numbers. Though their book leaves the reader with a keen appreciation of the tools that big data can provide in helping us “quantify and understand the world,” it also warns us about falling prey to the “dictatorship of data.”
“We must guard against overreliance on data,” they write, “rather than repeat the error of Icarus, who adored his technical power of flight but used it improperly and tumbled into the sea.”

Another Book Review

Book Review: Big Data: A Revolution That Will Transform How We Live, Work and Think

nic_tempiniIn Big Data: A Revolution That Will Transform How We Live, Work and Think, two of the world’s most-respected data experts reveal the reality of a big data world and outline clear and actionable steps that will equip the reader with the tools needed for this next phase of human evolution. Niccolo Tempini finds that rather than showing how the impact of data-driven innovations will advance the march of humankind, the authors merely present a thin collection of happy-ending business stories.
This was originally posted on LSE Review of Books.
Big Data: A Revolution That Will Transform How We Live, Work and Think. Kenneth Cukier and Viktor Mayer-Schonberger. Hodder. March 2013.
Find this book amazon-logo
My issue with Big Data is that it does not take big data seriously enough. Although the authors have pedigree (Editor at the Economist; Professor at Oxford) this is not an academic text: it belongs to that category of popular essays that attempt to stimulate debate. Anyone who works with data (e.g. technologists, scientists, politicians, consultants) or questions what will be borne from our age of data affluence may have expectations for this book - unfortunately it falls short on providing any real answer.

The book paints an impending revolution in mighty strokes. The authors claim the impact of data-driven innovations will advance the march of humankind. What they end up presenting is a thin collection of happy-ending business stories — flight fare prediction, book recommendation, spell-checkers and improved vehicle maintenance. It’s too bad the book’s scientific champion Google Flu Trends, a tool which predicts flu rates through search queries, has proven so fallible. Last February it forecast almost twice the number of cases reported by the official count of theCenter for Disease Control.
Big data will certainly affect many processes in a range of industries and environments, however, this book gestures at an inevitable social revolution in knowledge making (‘god is dead’), for which I do not find coherent evidence.
The book correctly points out that data is rapidly becoming the “raw material of business”. Many organisations will tap into the new data affluence, the outcome of a long historical process that includes ‘datafication’ (I’ll define later) and the diffusion of technologies that have tremendously reduced the costs involved in data production, storage and processing.
So, where’s the revolution? The book argues for three rather simplistic shifts.
The first shift – the new world is characterised by “far more data”. The authors say that just as a movie emerges from a series of photographs, increasing amounts of data are as important because quantitative changes bring about qualitative changes. The technical equivalent in big data is the ability to survey a whole population instead of just sampling random portions of it.
The second shift is that “looking at vastly more data also permits us to loosen up our desire for exactitude”. Apparently, in big data, “with less error from sampling we can accept more measurement error”. According to the authors, science is obsessed with sampling and measurement error as a consequence of coping in a ‘small data’ world.
It would be amazing if the problems of sampling and measurement error really disappeared when you’re “stuffed silly with data”. But context matters, as Microsoft researcher Kate Crawford cogently argues in her blog. It is easy to treat samples as n=all as data get closer to full coverage, yet researchers still need to account for the representativeness of their sample. Consider how the digital divide – some people are on the Internet, others are not — affects the data available to researchers.
While a missed prediction does not cause much damage if it is about book recommendations on Amazon, a similar error when doing policy making through big data is potentially more serious. Crawford reminds us that Google Flu Trends failed because of measurement error. In big data, data are proxies of events, not the events themselves. Google Flu Trends cannot distinguish with certainty people who have the flu from people who are just searching about it. Google may tune “its predictions on hundreds of millions of mathematical modelling exercises using billion of data points”, but volume is not enough. What matters is the nature of the data points and Google has apples mixed with oranges.
The third and most radical shift implies “we won’t have to be fixated on causality [...] the idea of understanding the reasons behind all that happens.” This is a straw man argument. The traditional image of science the authors discuss (fixated with causality, paranoid about exactitude) conflates principles with practices. Correlational thinking has been driving a lot of processes and institutional behaviours in the real world. Nevertheless, “Felix, qui potuit rerum cognoscere causas” (Fortunate who was able to the know the causes of things) – which happens to be the motto of the LSE – is still bedrock in Western political life and philosophy. The authors cannot dismiss causation so cavalierly.
However, it appears that they do. Big data, they say, means that the social sciences “have lost their monopoly on making sense of empirical data, as big-data analysis replaces the highly skilled survey specialists of the past. The new algorithmists will be experts in the areas of computer science, mathematics, and statistics; and they would act as reviewers of big data analyses and predictions.” This is an odd claim given that the social sciences are thriving precisely because expert narratives are a necessary component of how data becomes operational. This book is a shining example that big data speaks the narrative experts give it. What close observers know is that even at the most granular level of practice, analytic understanding is necessary when managers attempt to implement these systems in the world.
The book is blinded by its strongest assumption: that quantitative analysis is devoid of qualitative assessment. For the authors, to datafy is merely to put a phenomenon “in a quantified format so it can be tabulated and analysed.” Their argument, that “mathematics gave new meaning to data – it could now be analysed, not just recorded and retrieved”, implies that analysis begins only after phenomena get reduced to quantifiable formats. Human judgement is just an inconvenience of a ‘small data’ world that has no role in the process of making data. This is why they warn that in the impending world of big data, “there will be a special need to carve out a place for the human”.
It is hard to see how imagination and practical context will suddenly cease to play a fundamental role in innovation. But innovation could definitely be jeopardised if big data systems are not recognized for what they are – tools for optimising resource management. Big data may not be an instrument of discovery; while certainly it is a way of managing entities that are already known. Big data promises to be financially valuable – because it is primarily a managerial resource (e.g. pricing fares, finding books, moving spare parts, etc.).
In the world according to Cukier and Mayer-Schönberger, all the challenges of knowledge-making are about to evaporate. With big data affluence – sampling, exactitude, and the pursuit of causality will no longer be issues. The most pressing question is the problem of data valuation. Now there is a problem the authors are willing to discuss seriously: how can data be transformed into a stable financial asset when most of its utility as a predictive resource is not predictable?
So eager are the authors to mark the potential value of big data for organisations (data can only be an asset to a corporation) that they overlook the impact of these systems on other social actors. So what if big data environments reconfigure social inequalities? While the citizen will earn new responsibilities (like privacy management), only corporate entities will be able to systematically generate, own and exploit big data sets.
Big data is serious. There will be winners and there will be losers. What the public need is a book that explains the stakes so that they can be active participants in this revolution, rather than be passive recipients of corporate competition.
Niccolò Tempini is a PhD Candidate in Information Systems at the London School of Economics and Political Science. You can follow Niccolò on Twitter @tmpncl. Read more reviews by Niccolò.

No comments:

Post a Comment