加载中…
个人资料
bushitom
bushitom
  • 博客等级:
  • 博客积分:0
  • 博客访问:20,195
  • 关注人气:11
  • 获赠金笔:0支
  • 赠出金笔:0支
  • 荣誉徽章:
相关博文
推荐博文
谁看过这篇博文
加载中…
正文 字体大小:

无题

(2011-08-06 17:53:05)
标签:

无题

杂谈

分类: 时评

数据、数据满天飞

Data, data everywhere
数据、数据满天飞

Information has gone from scarce to superabundant. That brings huge new benefits, says Kenneth Cukier (interviewed here)—but also big headaches

信息从过去的短缺变成现在的过度浮滥。Kenneth Cukier说,这带来了新的、庞大的利益,也带来很大的问题。

Feb 25th 2010 | From The Economist print edition

WHEN the Sloan Digital Sky Survey started work in 2000, its telescope in New Mexico collected more data in its first few weeks than had been amassed in the entire history of astronomy. Now, a decade later, its archive contains a whopping 140 terabytes of information. A successor, the Large Synoptic Survey Telescope, due to come on stream in Chile in 2016, will acquire that quantity of data every five days.
当斯隆数字天空观测站(Sloan Digital Sky Survey)在2000年开始运转时,它在新墨西哥州的望远镜,最初几周所收集的数据,比过去全部天文观测所得到的数据还要多。在十年后的现在,它的数据库已经储存了惊人的140TB(terabyte;译注:1000MB)这么多的数据。这个观测站的后继者,位于智利预计2016年运转的巨概观观测站 (Large Synoptic Survey Telescope),则将在五天内就可以取得现在斯隆数据库内所有的数据。

Such astronomical amounts of information can be found closer to Earth too. Wal-Mart, a retail giant, handles more than 1m customer transactions every hour, feeding databases estimated at more than 2.5 petabytes—the equivalent of 167 times the books in America’s Library of Congress (see article for an explanation of how data are quantified). Facebook, a social-networking website, is home to 40 billion photos. And decoding the human genome involves analysing 3 billion base pairs—which took ten years the first time it was done, in 2003, but can now be achieved in one week.
如此天文数字的数据在较接近地表也有。沃尔玛(Wal-Mart),一个零售业的巨擘,每个小时要处理超过一百万件交易,同时将大约2.5PB (petabyte;译注: 一千TB)送进数据库—相当于美国国会图书馆所有藏书的一百六十七倍(有关数据如何被量化、见此文)。Facebook,一个供社交与人脉建立的网站,有着四百亿张照片。解码人体基因需要分析三十亿对基因,这工作第一次在2003年被完成时用了十年的时间,现在则只需一周。

All these examples tell the same story: that the world contains an unimaginably vast amount of digital information which is getting ever vaster ever more rapidly. This makes it possible to do many things that previously could not be done: spot business trends, prevent diseases, combat crime and so on. Managed well, the data can be used to unlock new sources of economic value, provide fresh insights into science and hold governments to account.
所有这些例子都述说着同一个故事: 这个世界有着不可想象般庞大的数字信息,其数量正以越来越快的速度增长着。这使得我们能完成一些之前不可能完成的事情,例如发现市场趋势,预防疾病,打击犯罪等。管理好的话,数据可以用来解开经济价值的新来源,提供对科学的了解,以及让政府能对其施政负责。

But they are also creating a host of new problems. Despite the abundance of tools to capture, process and share all this information—sensors, computers, mobile phones and the like—it already exceeds the available storage space (see chart 1). Moreover, ensuring data security and protecting privacy is becoming harder as the information multiplies and is shared ever more widely around the world.
但是数据也在创造许多新的问题。尽管有繁多的工具,例如传感器、计算机、以及手机等,可以捕捉、处理、及分享信息,庞大的数据目前已超过现有的储藏空间 (见图一)。不只如此,信息以倍计的成长和它在世界越来越广泛的散布,使得数据安全上的防护以及个人隐私的维护都变得越来越难。

Alex Szalay, an astrophysicist at Johns Hopkins University, notes that the proliferation of data is making them increasingly inaccessible. “How to make sense of all these data? People should be worried about how we train the next generation, not just of scientists, but people in government and industry,” he says.

约翰霍普金斯大学(Johns Hopkins University)的一位天文物理学家Alex Szalay说,数据的激增使其越来越难被使用。”到底要如何将这么多数据理出个头绪呢?人们应该要担心我们如何训练下一代,不只是科学家,还包括政府和企业界的员工。”

“We are at a different period because of so much information,” says James Cortada of IBM, who has written a couple of dozen books on the history of information in society. Joe Hellerstein, a computer scientist at the University of California in Berkeley, calls it “the industrial revolution of data”. The effect is being felt everywhere, from business to science, from government to the arts. Scientists and computer engineers have coined a new term for the phenomenon: “big data”.
在IBM工作,并曾写了二十几本有关社会信息历史的James Cortada说,”这么多的数据使我们处在一个不同的时代”。这样的时代被一位加州大学伯克利分校(University of California in Berkeley)的计算机科学家Joe Hellerstein称做是”数据的工业革命”。其影响遍及所有的地方,包括商业、科学、政府、以及艺术。科学家和计算机工程师替这现象取了一个新名词:”大数据”。

Epistemologically speaking, information is made up of a collection of data and knowledge is made up of different strands of information. But this special report uses “data” and “information” interchangeably because, as it will argue, the two are increasingly difficult to tell apart. Given enough raw data, today’s algorithms and powerful computers can reveal new insights that would previously have remained hidden.
从知识论而言,信息是数据的集合,而知识则是由不同的信息所组成。但是在本篇特别报导里,数据和信息是可以被互换的,因为如同本文将论述的,这两者将越来越难被区分。只要有足够的数据,今天的演算逻辑和快速的计算机就能够揭开对事物新的理解,这在过去是没法办到的。

The business of information management—helping organisations to make sense of their proliferating data—is growing by leaps and bounds. In recent years Oracle, IBM, Microsoft and SAP between them have spent more than $15 billion on buying software firms specialising in data management and analytics. This industry is estimated to be worth more than $100 billion and growing at almost 10% a year, roughly twice as fast as the software business as a whole. Chief information officers (CIOs) have become somewhat more prominent in the executive suite, and a new kind of professional has emerged, the data scientist, who combines the skills of software programmer, statistician and storyteller/artist to extract the nuggets of gold hidden under mountains of data. Hal Varian, Google’s chief economist, predicts that the job of statistician will become the “sexiest” around. Data, he explains, are widely available; what is scarce is the ability to extract wisdom from them.
信息管理—也就是帮助企业组织将他们日渐增加的数据理出头绪—这行业正飞快的成长着。最近这几年,IBM、Microsoft、和SAP,加起来花了超过一百五十亿美元来购并专业于数据管理及分析的软件公司。这个产业据估计将有一千亿美元的产值,而其年成长将近有每年10%,大约是整个软件产业成长率的一倍。首席信息官(Chief information officers (CIOs))在主管层级里有了较重要的角色。同时一种新的专业诞生了,那就是数据科学家,一个综合软件工程师、统计学家、和说书人这三种专业的技能,来将金块从像山一样庞大的数据中挖掘出来的专业。谷歌(Google)的首席经济学家维Hal Varian预测统计学家将会变得是最热门的工作。他解释说,数据是随处可得,但是将智能从这些数据中萃取的的能力却是极短缺的。

More of everything
各方面的数据都在增加

There are many reasons for the information explosion. The most obvious one is technology. As the capabilities of digital devices soar and prices plummet, sensors and gadgets are digitising lots of information that was previously unavailable. And many more people have access to far more powerful tools. For example, there are 4.6 billion mobile-phone subscriptions worldwide (though many people have more than one, so the world’s 6.8 billion people are not quite as well supplied as these figures suggest), and 1 billion-2 billion people use the internet.
信息暴增的原因有很多。最明显的就是技术。随着数字设备功能的增加而价格崩跌,传感器和各种新奇玩意儿数字化了许多以前无法取得的信息。更多人能取得远较以前强大的工具。例如,现在世界上有四十六亿手机订户(由于一些人有多支手机,这数字对世界六十八亿人代表的手机普及率没有乍看之下多),而有十亿到二十亿人使用互联网。

Moreover, there are now many more people who interact with information. Between 1990 and 2005 more than 1 billion people worldwide entered the middle class. As they get richer they become more literate, which fuels information growth, notes Mr Cortada. The results are showing up in politics, economics and the law as well. “Revolutions in science have often been preceded by revolutions in measurement,” says Sinan Aral, a business professor at New York University. Just as the microscope transformed biology by exposing germs, and the electron microscope changed physics, all these data are turning the social sciences upside down, he explains. Researchers are now able to understand human behaviour at the population level rather than the individual level.
尤有甚者,现在有远多过以前的人使用信息做互动。Cortada先生说,在1990年和2005年之间,有十亿人成了中产阶级,当他们变得更富有,他们也变得更能使用文字,因此为信息膨胀火上加油。这个结果也浮现在政治、经济、和法律这些方面上。纽约大学的一位商学院教授Sinan Aral说,”度量的革命通常发生在科学的革命之前”。也正如显微镜藉由揭露了细菌而改变了生物学,以及电子显微镜改变了物理学,数量庞大的数据现在正将社会科学弄得天翻地覆。研究者现在有能力可以从族群的角度来了解人类行为,而不只是像过去拘限在个人的层次。

The amount of digital information increases tenfold every five years. Moore’s law, which the computer industry now takes for granted, says that the processing power and storage capacity of computer chips double or their prices halve roughly every 18 months. The software programs are getting better too. Edward Felten, a computer scientist at Princeton University, reckons that the improvements in the algorithms driving computer applications have played as important a part as Moore’s law for decades.
数字信息每五年就会增加十倍。目前计算机产业视为理所当然的摩尔定律,则认为计算机的芯片处理速度和储存能力每十八个月会增加一倍。计算机软件也同时在进步中。普林斯顿大学(Princeton University)的一位计算机科学家Edward Felten则认为,驱动计算机应用的算法改进和摩尔定律在过去数十年扮演同样重要的角色。

A vast amount of that information is shared. By 2013 the amount of traffic flowing over the internet annually will reach 667 exabytes, according to Cisco, a maker of communications gear. And the quantity of data continues to grow faster than the ability of the network to carry it all.
这些庞大的信息有很大一部分是被分享的。根据通讯设备制造商Cisco说,在2013年来到前,网络上的数据流总量将达到667EB (exabytes,译注: 1000PB)。而且数据量成长的速度将持续高于网络承载的能力。

People have long groused that they were swamped by information. Back in 1917 the manager of a Connecticut manufacturing firm complained about the effects of the telephone: “Time is lost, confusion results and money is spent.” Yet what is happening now goes way beyond incremental growth. The quantitative change has begun to make a qualitative difference.
人们抱怨信息泛滥为时已久。在1917年,一家康乃狄克州制造公司的经理就曾对电话的效应抱怨说,”时间不见了,混乱发生了,而金钱也浪费了”。现在正在发生的改变则是远超过渐进式的成长。数量上的变化正带来质量上的变化。

This shift from information scarcity to surfeit has broad effects. “What we are seeing is the ability to have economies form around the data—and that to me is the big change at a societal and even macroeconomic level,” says Craig Mundie, head of research and strategy at Microsoft. Data are becoming the new raw material of business: an economic input almost on a par with capital and labour. “Every day I wake up and ask, ‘how can I flow data better, manage data better, analyse data better?” says Rollin Ford, the CIO of Wal-Mart.
从信息短缺到信息泛滥的转变有广泛的影响。微软研究和策略部门的主管Craig Mundie说,”我们看见的,是一种使经济以数据为中心来成长的能力—这对我来说,是在社会甚至总体经济层面上,很大的转变”。数据正逐渐变成商业所需的原物料之一: 一项几乎和资本或劳力一样重要的经济原料。Wal-Mart的CIO Rollin Ford说,”每天醒来的时候,我会问自己,我要如何更好的传输、管理、以及分析数据呢?”

Sophisticated quantitative analysis is being applied to many aspects of life, not just missile trajectories or financial hedging strategies, as in the past. For example, Farecast, a part of Microsoft’s search engine Bing, can advise customers whether to buy an airline ticket now or wait for the price to come down by examining 225 billion flight and price records. The same idea is being extended to hotel rooms, cars and similar items. Personal-finance websites and banks are aggregating their customer data to show up macroeconomic trends, which may develop into ancillary businesses in their own right. Number-crunchers have even uncovered match-fixing in Japanese sumo wrestling.
复杂的计量分析现在正被广泛运用到生活上的各方面,而不只限于过去的火箭轨道分析或者是金融对冲策略分析。例如微软搜索引擎Bing 其中名为Farecast的部分软件,能够分析两千两百五十亿笔航班及价位信息并据以提供客户何时购买机票的建议。同样的概念也被用在旅馆订房,租车,和其他类似的事项上。个人金融的网页和银行正积聚着他们客户的数据,好用来分析总体经济趋势,这样的应用有可能被单独发展成一个附属的行业。分析数据的人甚至能够揭发日本相扑场上的造假事件。

Dross into gold
将残渣变成黄金

“Data exhaust”—the trail of clicks that internet users leave behind from which value can be extracted—is becoming a mainstay of the internet economy. One example is Google’s search engine, which is partly guided by the number of clicks on an item to help determine its relevance to a search query. If the eighth listing for a search term is the one most people go to, the algorithm puts it higher up.
“数据废气”—互联网用户点击后所留下来能被提取价值的轨迹—正成为网络经济的主要支柱。其中一个例子是谷歌的搜索引擎,该引擎部分根据一个对象被点击的次数来帮助决定该对象对搜寻的相关度。如果一个搜寻所得清单中顺位第八的网站是最多人造访的,那么算法会将该网站放到较高的顺位。

As the world is becoming increasingly digital, aggregating and analysing data is likely to bring huge benefits in other fields as well. For example, Mr Mundie of Microsoft and Eric Schmidt, the boss of Google, sit on a presidential task force to reform American health care. “Early on in this process Eric and I both said: ‘Look, if you really want to transform health care, you basically build a sort of health-care economy around the data that relate to people’,” Mr Mundie explains. “You would not just think of data as the ‘exhaust’ of providing health services, but rather they become a central asset in trying to figure out how you would improve every aspect of health care. It’s a bit of an inversion.”
当世界逐渐变得更数字化,收集和分析数据可能也会在其他领域带来庞大的利益。例如说,微软的Mundie和谷歌的老板Eric Schmidt都属于一个总统指派来改革美国健保体制的工作组。Mudie解释说,”在这工作进行不久,Schmidt和我就都说,’听着,如果你们真想改革健保,基本上你么就必须将健保的经济以人们的数据为中心来建立’。”你不会认为数据只是提供健保过程中所产生的’废气’,而会将这些数据做为中心资产来改进健保的每一个环节。这有一点颠倒”。

To be sure, digital records should make life easier for doctors, bring down costs for providers and patients and improve the quality of care. But in aggregate the data can also be mined to spot unwanted drug interactions, identify the most effective treatments and predict the onset of disease before symptoms emerge. Computers already attempt to do these things, but need to be explicitly programmed for them. In a world of big data the correlations surface almost by themselves.
可以确定的是,数字记录将会让医生较容易处理事情,降低保险公司和病人的成本,以及改进健保的质量。不只如此,集合起来的健保数据,可以用来发掘不良的医药反应、找出最有效的治疗方法、以及在症状明显前预测疾病的发生。目前计算机已经被用来尝试这些事情,不过程序必须特别为这些工作设计。在一个”大数据” 的世界,相关性几乎是会自己浮现。

Sometimes those data reveal more than was intended. For example, the city of Oakland, California, releases information on where and when arrests were made, which is put out on a private website, Oakland Crimespotting. At one point a few clicks revealed that police swept the whole of a busy street for prostitution every evening except on Wednesdays, a tactic they probably meant to keep to themselves.
有时候,数据会揭露过多的讯息。例如,加州奥克兰市公布该市警察逮捕纪录的时间和地点,而这信息被放在一个名为奥克兰犯罪现形(Oakland Crimespotting)的私人网站上。曾经有一段时间,只要几次在这网站上点击,就能发现警察只有在星期三没有对一条街道进行妓女扫荡,该市可能不希望这信息公开吧。

But big data can have far more serious consequences than that. During the recent financial crisis it became clear that banks and rating agencies had been relying on models which, although they required a vast amount of information to be fed in, failed to reflect financial risk in the real world. This was the first crisis to be sparked by big data—and there will be more.
不过大数据可以有更严重的后果。在最近的金融危机里,我们理解到,银行和评级机构所仰赖的模式,虽然采用了巨量的信息,却不能反映真实世界的金融风险。这只是第一件由大数据激发的危机,未来还会有更多。

The way that information is managed touches all areas of life. At the turn of the 20th century new flows of information through channels such as the telegraph and telephone supported mass production. Today the availability of abundant data enables companies to cater to small niche markets anywhere in the world. Economic production used to be based in the factory, where managers pored over every machine and process to make it more efficient. Now statisticians mine the information output of the business for new ideas.
信息管理的方式影响了生活的各个层面。在二十世纪开始的时候,电报和电话支持了大量制造所需的信息流通。今天,丰沛数据使得企业能有办法迎合世界各地小而美的市场区块。过去的经济生产是以工厂为中心,其经理详细审查每一部机器和每一个制程来让生产变得更有效率。现在统计学家则从商业的各项信息中发掘新的想法。

“The data-centred economy is just nascent,” admits Mr Mundie of Microsoft. “You can see the outlines of it, but the technical, infrastructural and even business-model implications are not well understood right now.” This special report will point to where it is beginning to surface.
微软的Mundie说,”以数据为中心的经济才刚开始出现”。”你可以看到一个轮廓,但是它在技术、基础建设、和商业模式上代表的含意在目前还不是很清楚”。这篇特别报导将指出这些含意浮现的地方。

打印文章 这篇文章由刘炜在2010年04月2日10:49发表,分类为金融。您可以订阅RSS 2.0以跟踪所有反馈。 您可以留一个回复引用链接到您自己的网站。
还没有评论。
验证图片
刷新验证码
*
还没有引用链接。
Mystique主题由digitalnature设计 | 基于WordPress
RSS订阅 XHTML 1.1 顶部

0

阅读 评论 收藏 转载 喜欢 打印举报/Report
前一篇:美国1929
后一篇:老外
  • 评论加载中,请稍候...
发评论

    发评论

    以上网友发言只代表其个人观点,不代表新浪网的观点或立场。

    < 前一篇美国1929
    后一篇 >老外
      

    新浪BLOG意见反馈留言板 电话:4000520066 提示音后按1键(按当地市话标准计费) 欢迎批评指正

    新浪简介 | About Sina | 广告服务 | 联系我们 | 招聘信息 | 网站律师 | SINA English | 会员注册 | 产品答疑

    新浪公司 版权所有