The migration of information online grabbed the attention of early predictive analysts like Robert Grossman, who is now the director of the Chicago-based National Center for Data Mining. Grossman advises companies that want to use data to target customers better and to improve their profit margins. He and his colleagues have been working for years on statistical analysis methods that chew up complex sets of data and spit out significant patterns that appear in them. Relevant details can be obtained easily from census records, credit report agencies such as Experian and Equifax, and consumer data-mining companies like Phorm. When you have a detailed set of information on a group of people—say, their political views, the kind of homes they live in, and their favorite movie genres—obvious cluster patterns can emerge.
To find these patterns, data miners like Grossman first chart their harvested facts on a scatter plot, an imaginary graph that has as many dimensions as the number of personal characteristics being evaluated, such as age, marital status, gender, and geography. Grossman combines these factors into about 180 segments. A company might then create a dozen different sales offers and target them to specific segments. Some of the targets are straightforward: Newly married women might get ads for furniture. Some are based on more subtle forms of behavior: Single males are more likely to be hit with online ads that move around. And some are just devious. If you have a Gmail account, opening an e-mail will trigger the delivery of ads based not only on your demographics but also on the content of that particular message.
Grossman does not share the identities of the firms he works with, but one company that has profited from this type of data mining is 1-800-Flowers, which has been monitoring the behavior of its customers and sifting the data on buying habits since 2003. (1-800-Flowers, like a number of large retailers, uses the business analysis company SAS.) Instead of reaching out to all customers the same way, as advertisers traditionally do, the company targets specific subgroups. According to Aaron Cano, vice president of enterprise customer knowledge at 1-800-Flowers, there are planners and there are last-minute buyers. Planners receive offers in advance of buying occasions. The last-minute types get occasion-reminder e-mail.
When 1-800-Flowers started its analytics program, third-quarter revenues hit $124.1 million—up 7.5 percent over the same quarter of the previous year, even though the economy was recovering from a recession. The company has also increased customer retention rates by more than 15 percent since the program began. Brooks Brothers and The Limited, which also work with SAS, claim similar successes as a result of their data-mining programs.
No one understands the transformative power of data analysis better than Democratic consultant Ken Strasma, who helped propel Barack Obama into office by devising a mathematical model that predicts the political behavior of nearly every eligible voter. Strasma first randomly selected a pool of about 10,000 voters from his database, which includes demographic information on more than 100 million people. His consulting firm next conducted phone interviews with those 10,000 to learn their views on a wide range of political topics.
Armed with that huge data set, Strasma started looking for clusters. He found some strange things. Gin drinkers tend to be Democrats. Military history buffs are generally conservative on social issues. Got call-waiting? You’re probably a Republican. “We come up with correlations that might not be intuitive at all,” Strasma says. “We really don’t get at the whys of it.” But really, the whys don’t matter; only the correlations do.
To figure out the voting behavior of those not surveyed, Strasma applied what is called the nearest-neighbor algorithm. This technique matches each of the 100 million eligible voters in the United States to one of the people surveyed, according to a range of demographic measures. “The ‘distance’ between voters is not physical distance but rather how similar or dissimilar they are, based on these thousands of indicators,” he says. For instance, two voters with similar retail preferences might tend to vote the same way. Strasma’s nearest-neighbor tactic helped the Obama campaign fine-tune its mailings, advertising, and donation efforts along with its drives to get voters to the polls. Whether Strasma’s efforts proved decisive is an open question, but Obama pulled in $745 million from donors, more than twice what John McCain managed.
Where companies and politicians see opportunity, outspoken privacy advocates like Christopher Soghoian, a doctoral student at Indiana University, see threats to our personal privacy. There is little regulation limiting what data can be taken and mined: The current canonical law, the Federal Trade Commission’s Privacy Act of 1974, specifies that government agencies must show individuals any personal records about them, but it excludes law enforcement from this provision. It also does not restrict the data-collecting efforts of private companies.
As Acquisti demonstrated, even seemingly innocent information that people routinely display about themselves can be mined to expose more sensitive bits. And often people are not aware of how extensive a data trail they are leaving online. For example, that anonymous post you left on the Web site of your local newspaper? Not so private: In 2008 the Alton Telegraph site was served with a grand jury subpoena demanding the full names and addresses of some anonymous commenters who had hinted that they might have information valuable to a murder investigation. “The judge said the law gives anonymity protection to journalists but not non-journalists,” says Seattle Internet-security expert Bennett Haselton. “If you do something online, it’s logged that it was done from your IP address. People should use common sense.”
In the near future, the challenges to personal privacy will move to another level. The Swedish company Polar Rose has developed software that identifies unlabeled individuals in digital photos, such as those posted all over Facebook, using face-recognition algorithms. Once you tag a friend in one photo, the software will automatically identify when that friend appears in other photos. “All kinds of things could happen,” Soghoian says. “What if you were a health insurance company and you could pinpoint all the people who’ve starred in Jackass-type stunt videos online—and drop them?” More plausibly, what if your insurance company sees photos of you drinking and smoking and adjusts your premiums accordingly?
Medical advances may cause our data clouds to envelop us in new and unexpected ways. Harvard synthetic biologist Yaakov Benenson is developing implantable computers capable of detecting chemical changes inside a cell. Eventually, such devices should enable us to monitor our vital statistics, take diagnostic tests, and receive treatment without ever going to the hospital. The results could be beamed wirelessly to health-care providers, raising the specter of eavesdropping. Researchers are experimenting with genetic profiling to fine-tune cancer treatment or to identify patients with an elevated risk of heart attack. Soon doctors might keep your DNA profile on hand to develop personalized treatments for you; if such information got out, your entire genome could be available for public viewing.
In the quest for total knowledge, Soghoian notes, companies and government officials are likely to leave no stone unturned. A planned new version of a system designed to protect the U.S. government from online spies, dubbed Einstein 3, has the capability to read e-mail that travels over government networks. In response, Ari Schwartz, vice president of the Center for Democracy and Technology, has flagged concerns about the government’s ability to balance surveillance with privacy protection. Any information that leaks out can be mined.
Meanwhile, cell phones are accumulating ever-more processing power—Qualcomm’s Snapdragon mobile processor broke the one-gigahertz barrier this year—enabling seamless video watching and recording. “In a not-too-distant future, phones are going to be recording everything we see and hear,” Soghoian warns. That could easily include videos of you going about your business, taken by someone you don’t even know; just look at all the anonymous videos already available on YouTube.
Knowledge may be power, but the runaway growth of our personal data clouds suggests that we may not be happy about where that power ends up. “All of the Facebook interaction, all of the MySpace stuff, all the Expedia travel searches,” Soghoian says, “all those data trails are hanging out forever.”