In a tech startup industry that loves its shiny new objects, the
term “Big Data” is in the unenviable position of sounding
increasingly “3 years ago”. While Hadoop was
created in 2006, interest in the concept of “Big Data” reached
fever pitch sometime between 2011 and 2014. This
was the period when, at least in the press and on industry panels,
Big Data was the new “black”, “gold” or “oil”.
However, at least in my conversations with people
in the industry, there’s an increasing sense of having reached some
kind of plateau. 2015 was probably the year when
the cool kids in the data world (to the extent there is such a
thing) moved on to obsessing over AI and its many related concepts
and flavors: machine intelligence, deep learning, etc.
在喜新厌旧的技术初创企业界,已有 3年历史 “大数据” 听起来似乎已经过气了。虽然 Hadoop 在 2006年 已经出来,但
“大数据” 这个概念大概是在 2011 到 2014年 左右才真正火起来的。也就是在这段时间里,至少是在媒体或者专家眼里,“大数据”
成为了新的 “金子” 或者
“石油”。然而,至少在我跟业界人士交谈中,大家越来越感觉到这项技术已经在某种程度上陷入了停滞。2015年可能是数据领域的那些酷小子转移兴趣,开始沉迷于
AI 以及机器智能、深度学习等许多相关概念的年份。
.
Beyond semantics and the inevitable hype cycle, our fourth annual
“Big Data Landscape” (scroll down) is a great opportunity to take a
step back, reflect on what’s happened over the last year or so and
ponder the future of this industry.
抛开不可避免的炒作周期曲线态势不管,我们的 “大数据全景图” 已经进入第 4
个年头了,趁这个时候退一步来反思一下去年发生了什么,思考一下这个行业的未来会怎样是很有意义的。
.
In 2016, is Big Data still a “thing”? Let’s dig in.
那么 2016年 大数据到底还算不算个 “东西” 呢?我们不妨探讨一下。
.
Enterprise Technology = Hard
Work
企业技术=艰苦工作
.
The funny thing about Big Data is, it wasn’t a very likely
candidate for the type of hype it experienced in the first
place.
大数据有趣的一点在于,它不再像当初经历过那样有可能成为炒作的题材了。
.
Products and services that receive widespread interest beyond
technology circles tend to be those that people can touch and feel,
or relate to: mobile apps, social networks,
wearables, virtual reality, etc.
经过炒作周期后仍能引起广泛兴趣的产品和服务往往那些大家能够接触、可以感知,或者与大众相关联的:比如移动应用、社交网络、可穿戴、虚拟现实等。
.
But Big Data, fundamentally, is… plumbing.
Certainly, Big Data powers many consumer or
business user experiences, but at its core, it
is enterprise technology:
databases, analytics, etc: stuff that runs in the back that no one
but a few get to see. And, as anyone who works in that world knows,
adoption of new technologies in the enterprise doesn’t exactly
happen overnight.
但大数据基本上就是管道设施的一种。当然,大数据为许多消费者或商业用户体验提供了动力,但它的核心是企业技术:数据库、分析等,这些东西都是在后端运行的,没几个人能看得见。就像在那个世界工作的任何人都知道那样,用一个晚上的时间就想适应企业端的新技术是不可能的。
.
The early years of the Big Data phenomenon were propelled by a very
symbiotic relationship among a core set of large Internet companies
(in particular Google, Yahoo, Facebook, Twitter,
LinkedIn, etc), which were both heavy users and creators of a core
set of Big Data technologies. Those companies
were suddenly faced with unprecedented volume of data, had no
legacy infrastructure and were able to recruit some of the best
engineers around, so they essentially started building the
technologies they needed. The ethos of open
source was rapidly accelerating and a lot of those new technologies
were shared with the broader world. Over time,
some of those engineers left the large Internet companies and
started their own Big Data startups. Other
“digital native” companies, including many of the budding unicorns,
started facing similar needs as the large Internet companies, and
had no legacy infrastructure either, so they became early adopters
of those Big Data technologies. Early successes
led to more entrepreneurial activity and more VC funding, and the
whole thing was launched.
大数据现象在早期主要是受到了与一批骨干互联网公司(尤其是 Google、Facebook、Twitter
等)的共生关系的推动,这些公司既是核心大数据技术的重度用户,同时也是这些技术的创造者。这些公司突然间面对着规模前所未有的庞大数据时,由于本身缺乏传统的(昂贵的)基础设施,也没有办法招募到一些最好的工程师,所以只好自己动手来开发所需的技术。后来随着开源运动的迅速发展,一大批此类新技术开始共享到更广的范围。然后,一些互联网大公司的工程师离职去创办自己的大数据初创企业。其他的一些
“数字原生”
公司,包括崭露头角的独角兽公司,也开始面临着互联网大公司的类似需求,由于它们自身也没有传统的基础设施,所以自然就成为了那些大数据技术的早期采用者。而早期的成功又导致了更多的创业活动发生,并获得了更多的
VC 资助,从而带动了大数据的起势。
.
Fast forward a few years, and we’re now in the thick of the much
bigger, but also trickier, opportunity: adoption of Big Data
technologies by a broader set of companies, ranging from
medium-sized to the very largest multinationals.
Unlike the “digital native” companies, those
companies do not have the luxury of starting from scratch.
They also have a lot more to lose: in the vast
majority of those companies, the existing technology infrastructure
“does the trick”. It may not have all the bells
and whistles, and many within the organization understand that it
will need to be modernized sooner rather than later, but they’re
not going to rip and replace their mission critical systems
overnight. Any evolution will require processes,
budgets, project management, pilots, departmental deployments, full
security audits, etc. Large corporations are
understandingly cautious about having young startups handle
critical parts of their infrastructure. And, to the despair of some
entrepreneurs, many (most?) still stubbornly refuse to move their
data to the cloud, at least the public one.
快速发展了几年之后,现在我们面临的是更加广阔、但也更加棘手的机遇:让中等规模到跨国公司级别的更大一批企业采用大数据技术。这些公司跟
“数字原生”
公司不一样的是,他们没有从零开始的有利条件。而且他们失去的会更多:这些公司绝大部分的现有技术基础设施都是成功的。那些基础设施当然未必是功能完备的,组织内部许多人也意识到对自己的遗留基础设施进行现代化应该是早点好过晚点,但他们不会一夜间就把自己的关键业务取代掉。任何革命都需要过程、预算、项目管理、试点、局部部署以及完备的安全审计等。大企业对由年轻的初创企业来处理自己基础设施的关键部分的谨慎是可以理解的。还有,令创业者感到绝望的是,许多(还是大多数?)企业仍顽固地拒绝把数据迁移到云端(至少不愿迁移到公有云)。
.
Another key thing to understand: Big Data success is not about
implementing one piece of technology (like Hadoop or anything
else), but instead requires putting together
an assembly line of technologies, people and
processes. You need to capture data, store
data, clean data, query data, analyze data, visualize data.
Some of this will be done by products, and some
of it will be done by humans. Everything needs to
be integrated seamlessly. Ultimately, for all of this to work, the
entire company, starting from senior management, needs to commit to
building a data-driven culture, where Big Data is not “a” thing,
but “the” thing.
还需要理解的另一个关键是:大数据的成功不在于实现技术的某一方面(像 Hadoop
什么的),而是需要把一连串的技术、人和流程糅合到一起。你得捕捉数据、存储数据、清洗数据、查询数据、分析数据并对数据进行可视化。这些工作一部分可以由产品来完成,而有的则需要人来做。一切都需要无缝集成起来。最后,要想让所有这一切发挥作用,整个公司从上到下都需要树立以数据驱动的文化,这样大数据才不仅仅是个
“东西”,而且就是那个(关键的)“东西”。
.
In other words: lots of hard work.
换句话说:有一堆艰苦的工作要做。
.
The Deployment Phase
部署阶段
.
The above explains why, a few years after many of the high profile
startups were launched and the headline-grabbing VC investments
made, we are just hitting the deployment and early maturation phase
of Big Data.
所以,这就是在经过几年引人瞩目的初创企业如雨后春笋冒头,VC 投资频等头条后,我们开始步入大数据的部署期和早期成熟期的原因。
.
The more forward-thinking large companies (call them the “early
adopters” in a traditional technology adoption cycle) started early
experimentation with Big Data technologies
sometime in 2011-2013, launching Hadoop pilots (often because it
was the chic thing to do) or trying out point solutions. They hired
all sorts of people whose job titles didn’t exist previously (such
as “data scientist” or “chief data officer”). They went through
various types of efforts, including dumping all their data in one
central repository or “data lake”, sometimes hoping that magic
would ensue (it generally
didn’t). They gradually built internal
competencies, experimented with different vendors, went from pilots
to departmental deployments in production and are now debating (or,
more rarely, implementing) enterprise-wide roll outs.
In many cases, they are at an important
inflection point where, after several years building Big Data
infrastructure, they don’t have (yet) much to show for it, at least
from the perspective of the business user in their companies.
But a lot of the thankless work has been done,
and the disproportionately impactful phase where applications are
deployed on top of the core architecture is now starting.
更有前瞻性的大公司(姑且称之为传统技术采用周期的 “早期采用者”)在 2011 到 2013年 间开始实验大数据技术,推出了若干的
Hadoop 试点计划(往往是因为赶时髦)或者尝试一些点方案。他们招募了各种各样此前并不存在的岗位(如 “数据科学家” 或
“首席数据官”)。他们进行了各种努力,包括吧全部数据都堆到一个数据容器(“data
lake”),然后希望紧跟着就会发生奇迹(往往不会)。他们逐步建设自己的内部能力,试验了各种供应商,从试点计划到生产中的局部部署,然后到现在争论要不要全企业铺开(全范围铺开实施的情况还很罕见)。许多情况下,他们正处在这样一个重要的拐点上,即经过大数据基础设施的数年建设后,能够展示的成果还不多,至少在公司内部的商业用户看来是这样的。但是大量吃力不讨好的工作已经做完了,现在开始进入到有影响力的应用部署阶段了。只是从目前来看,这种建构在核心架构之上的应用数量还不成比例。
.
The next set of large companies (call them the “early majority” in
the traditional technology adoption cycle) has been staying on the
sidelines for the most part, and is still looking at this whole Big
Data thing with some degree of puzzlement. Up
until recently, they were hoping that a large vendor (e.g., an IBM)
would offer a one-stop-shop solution, but it’s starting to look
like that may not happen anytime soon. They look
at something like our Big Data Landscape with horror, and wonder
whether they seriously need to work with all those startups that
often sound the same, and cobble those solutions together.
They’re trying to figure out whether they should
work sequentially and progressively, building the infrastructure
first, then the analytics then the application layer, or do
everything at the same time, or wait until something much easier
shows up on the horizon.
接下来的一波大公司(称之为传统技术采用周期的
“早期多数使用者”)大多数时候对大数据技术是持观望态度的,对于整个大数据方面的东西,他们还在心存一定程度困惑中观望。直到最近,他们还在指望某个大型供应商(比如
IBM)会提供一个一站式的解决方案,不过现在看来这种情况近期内并不会出现。他们看待这个大数据全景图的态度是心怀恐惧,在想自己是不是真的需要跟这一堆看起来并没有什么不同的初创企业合作,然后修补出各种解决方案。
.
The Ecosystem is Maturing
生态体系正在成熟
.
Meanwhile, on the startup/vendor side, the whole first wave of Big
Data companies (those that were founded in, say, 2009 to 2013) have
now raised multiple VC financing rounds, scaled their
organizations, learned from successes and failures in early
deployments, and now offer more mature, battle-tested products.
A handful are now public companies
(including HortonWorks and New Relic which did
their IPO in December 2014) while others (Cloudera, MongoDB, etc.)
have raised hundreds of millions of dollars.
与此同时,在初创企业 / 供应商这一块,整个第一波的大数据公司(2009 至
2013年间成立的那批)现在已经融了数轮的资金,企业规模已经得到了扩大,并且从早期部署的成功或失败中学到了东西,现在他们已经能够提供更成熟的、经受过考验的产品了。少数一些已经成为了上市公司(包括
2015年 上市的 HortonWorks 和 New Relic),而有的(比如 Cloudera、MongoDB
等)融资已经达上亿美元了。
.
VC investment in the space remains vibrant and the first few week
of weeks of 2016 saw a flurry of announcements of big founding
rounds for late stage Big Data startups: DataDog
($94M), BloomReach ($56M), Qubole ($30M), PlaceIQ
($25M), etc. Big Data startups received $6.64B in
venture capital investment in 2015, 11% of total tech VC.
这个领域的 VC 融资活动仍然很有生气,2016年 的前几周我们见证好几轮相当可观的后期阶段大数据融资事件:DataDog(9400
万美元),BloomReach(5600 万美元),Qubole(3000 万美元),PlaceIQ(2500
万美元)等。2015年大数据初创企业拿到的融资额达到了 66.4 亿美元,占整个技术 VC 总融资额额 11%。
.
M&A activity has remained moderate (we noted 35 acquisitions
since our last landscape, see notes below).
并购活动则开展得中规中矩(自从上一版大数据全景图发布以来完成了 34 项并购,具体可参见附注)
.
With continued influx of entrepreneurial activity and money in the
space, reasonably few exits, and increasingly active tech giants
(Amazon, Google and IBM in particular), the number of companies in
the space keeps increasing, and here’s what
the Big Data Landscape looks like in 2016:
随着该领域的创业活动持续进行以及资金的不断流入,加上适度的少量退出,以及越来越活跃的技术巨头(尤其是
Amazon、Google、IBM),使得这个领域的公司日益增多,最后汇成了这幅 2016 版的大数据全景图。
http://mattturck.com/wp-content/uploads/2016/02/matt_turck_big_data_landscape_v11r.pngBig Data Still a Thing? (The&nb" />
To see the landscape
at full
size, click here.
To view a full list of companies,
click here.
[Note: This is version 2.0 of the landscape and list, both revised
as of February 12, 2016]
Obviously, that’s a lot of companies, and many others were not
included in the chart, deliberately or not (scroll to the bottom of
the post for a few notes on methodology).
显然这张图已经很挤了,而且还有很多都没办法列进去(关于我们的方法论可以参见附注)
.
In terms of fundamental trend, the action (meaning innovation,
launch of new products and companies) has been gradually moving
left to right, from the infrastructure layer (essentially the world
of developers/engineers) to the analytics layer (the world of data
scientists and analysts) to the application layer
(the world of business users and consumers) where “Big Data native
applications” have been emerging rapidly – following more or less
the pattern we expected.
在基本趋势方面,行动开始慢慢从左转到右(即创新、推出新产品和新公司),从基础设施层(开发者 /
工程师的世界)转移到分析层(数据科学家和分析师的世界)乃至应用层(商业用户和消费者的世界),“大数据原生应用”
已经在迅速冒头—这多少符合了我们原先的一些预期。
.
Big Data infrastructure:
Still Plenty of Innovation
大数据基础设施:仍有大量创新
.
It’s now been a decade since Google’s papers on MapReduce and
BigTable led Doug Cutting and Mike Cafarella to
create Hadoop, so the infrastructure layer of Big Data has had the
most time to mature and some key problems there have now been
solved.
Google 关于 MapReduce 和 BigTable 的论文(Cutting 和 MikeCafarella 因为这个而做出了
Hadoop)的诞生问世已有 10年 了,在这段时间里,大数据的基础设施层已经逐渐成熟,一些关键问题也得到了解决。
.
However, the infrastructure space continues to thrive with
innovation, in large part through considerable open source
activity.
但是,基础设施领域的创新仍然富有活力,这很大程度上是得益于可观的开源活动规模。
.
2015 was without a doubt the year of Apache
Spark, an open source framework leveraging
in-memory processing, which was starting to get a lot of buzz when
we published the previous version of our landscape.
Since then, Spark has been embraced by a variety
of players, from IBM to Cloudera, giving it considerable
credibility. Spark is meaningful because it
effectively addresses some of the key issues that were slowing down
the adoption of Hadoop: it is much faster (benchmarks have shown
Spark is 10 to 100 times faster than Hadoop’s MapReduce), easier to
program, and lends itself well to machine learning.
(For more on Spark, see our fireside chat at
our Data
Driven NYC monthly event with Ion Stoica, one
of the key Spark pioneers and CEO of Spark in the cloud company
Databricks, here).
2015年 无疑是 Apache Spark
之年。自我们发布上一版大数据全景图以来,这个利用了内存处理的开源框架就开始引发众多讨论。自那以后,Spark 受到了从 IBM 到
Cloudera 的各式玩家的拥护,让它获得了可观的信任度。Spark 的出现是很有意义的,因为它解决了一些导致 Hadoop
采用放缓的关键问题:Spark 速度变快了很多(基准测试表明 Spark 比 Hadoop 的 MapReduce 快 10 到
100 倍),更容易编程,并且跟机器学习能够很好地搭配。
.
Other exciting frameworks continue to emerge and gain momentum,
such as Flink, Ignite, Samza, Kudu, etc. Some
thought leaders think the emergence of Mesos (a framework to
“program against your datacenter like it’s a single pool of
resources”) dispenses for the need for Hadoop altogether (watch a
great talk on the topic by Stefan Groschupf, CEO of
Datameer, here and learn
more about Mesos by watching Tobi Knaupf of
Mesosphere here).
除了 Spark 以外,还出现了其他的一些令人兴奋的框架,比如 Flink、Ignite、Samza、Kudu
等,这些框架的发展势头也很好。一些思想领袖认为,Mesos(数据中心资源管理系统,把数据中心当作一台大计算资源池进行编程)的出现也刺激了对
Hadoop 的需求。
.
Even in the world of databases, that seemed to have seen more
emerging players than the market could possibly sustain, plenty of
exciting things are happening, from the maturation of graph
databases (watch Emil Eifrem, CEO Neo4j here), the launch of specialized
databases (watch Paul Dix, founder of time series database
InfluxDB here) to the emergence of
CockroachDB, a database inspired by Google Spanner, billed as
offering the best of both the SQL and NoSQL worlds (watch Spencer
Kimball, CEO of Cockroach Labs here). Data
warehouses are evolving as well (watch Bob Muglia, CEO of cloud
data warehouse Snowflake, here).
即便在数据库的世界里,新兴的玩家似乎也越来越多。多到市场已经难以承受的地步,这里发生了很多令人兴奋的事情,从图形数据库(如 Neo4j
)的成熟,到专门数据库的推出(如统计时序数据库 InfluxDB),乃至于 CockroachDB 的出现(受 Google
Spanner 灵感启发诞生的融合了 SQL 与 NoSQL 长处的新型数据库)。数据仓库也在演变(如云数据仓库
Snowflake)。
.
Big Data Analytics: Now with
AI
大数据分析:现在跟 AI 结合了
.
The big trend over the last few months in Big Data analytics has
been the increasing focus on artificial intelligence (in its
various forms and flavors) to help analyze massive amounts of data
and derive predictive insights.
大数据分析过去几个月出现的一股趋势是,越来越关注利用人工智能(形式和风格各异)来帮助分析大规模的数据,从而获得预测性的洞察。
.
The recent resurrection of AI is very much a child of Big Data.
The algorithms behind deep learning (the area of
AI that gets the most attention these days) were for the most part
created decades ago, but it wasn’t until they could be applied to
massive amounts of data cheaply and quickly enough that they lived
up to their full potential (watch Yann LeCun, pioneer of deep
learning and head of AI at Facebook, here). The
relationship between AI and Big Data is so close that some industry
experts now think that AI has regretfully “fallen
in love with Big Data” (watch Gary Marcus, CEO of Geometric
Intelligence here).
其实最近出现复兴的 AI 很大程度上算是大数据的产物。深度学习(最近受到关注最多的 AI
领域)背后的算法基本上是几十年前就诞生了的,但直到最近能够以足够便宜、足够快速地应用到大规模数据之后才发挥出了它的最大潜能。AI
与大数据之间的关系如此紧密,以至于业界专家现在认为 AI 已经令人懊恼地 “与大数据陷入了热恋当中”。
.
In turn, AI is now helping Big Data deliver on its promise.
The increasing focus on AI/machine learning in
analytics corresponds to the logical next step of the evolution of
Big Data: now that I have all this data, what insights am I going
to extract from it? Of course, that’s where data scientists come in
– from the beginning their role has been to implement machine
learning and otherwise come up with models to make sense of the
data. But increasingly, machine intelligence is
assisting data scientists – just by crunching the data, emerging
products can extract mathematical formulas (watch Stephen Purpura,
founder of Context Relevant here) or automatically build and
recommend the data science model that’s most likely to yield the
best results (watch Jeremy Achin, CEO of
DataRobot here). A crop
of new AI companies provide products that automate the
identification of complex entities such as images (watch Richard
Socher, CEO of MetaMind, here; Matthew Zeiler, CEO of
Clarifai, here; and David Luan, CEO of
Dextro here) or provide powerful
predictive analytics (e.g., our portfolio company HyperScience,
currently in stealth).
不过反过来,AI 现在也在帮助大数据实现后者的承诺。分析对 AI/
机器学习越来越多的关注也符合大数据下一步演进的趋势:现在数据我都有了,但究竟从中能得到什么样的洞察呢?当然,这件事情可以让数据科学家来解决,从一开始他们的角色就是实现机器学习,否则的话就得想出模型来发现数据的意义。但是机器智能现在正在逐渐发挥辅助数据科学家的作用—只需要倒腾数据,新兴的产品就能从中提炼出数学公式(如
Context Relevant)或者自动建立和推荐最有可能返回最佳结果的数据科学模型(如 DataRobot)。一批新的 AI
公司提供的产品能够自动识别像图像这样的复杂实体(如 Clarifai、Dextro),或者提供强大的预测性分析(如
HyperScience)。
.
As unsupervised learning based products spread and improve, it will
be interesting to see how their relationship with data scientists
evolve – friend or foe? AI is certainly not going
to replace data scientists any time soon, but expect to see
increasing automation of the simpler tasks that data scientists
perform routinely, and big productivity gains as a result.
同时,随着基于无监督学习的产品的传播和改善,看看它们与数据科学家之间的关系如何演变将非常有趣—将来这两者是敌还是友呢?AI
当然不会很快取代数据科学家的位置,但预计会看到数据科学家通常执行的更简单一点的工作越来越多的自动化,从而可以极大提高生产力。
.
By all means, AI/machine learning is not the only trend worth
noting in Big Data analytics. The general
maturation of Big Data BI platforms and their increasingly strong
real-time capabilities is an exciting trend (watch
Amir Orad, CEO of SiSense here; and Shant Hovespian, CTO
of Arcadia Data here)
但不管怎样,AI/ 机器学习绝不是大数据分析唯一值得关注的趋势。大数据 BI
平台的普遍成熟及其日益增强的实时能力也是一个令人兴奋的趋势(如 SiSense、Arcadia Data 等)。
.
Big Data Applications: A Real
Acceleration
大数据应用:真正的加速
.
As some of the core infrastructure challenges have been solved, the
application layer of Big Data is rapidly building up.
随着一些核心基础设施的挑战得到解决,大数据应用层正在快速构建。
.
Within the enterprise, a variety of tools has appeared to help
business users across many core functions. For
example, Big Data applications in sales and marketing help with
figuring out which customers are likely to buy, renew or churn, by
crunching large amounts of internal and external data, increasingly
in real-time. Customer service applications help personalize
service; HR applications help figure out how to attract and retain
the best employees; etc.
在企业内部,已经出现了各种工具来帮助跨多个核心职能的企业用户。比方说,销售和营销的大数据应用通过处理大规模的内外部数据来帮助找出哪位客户可能会购买、续约或者流失,且速度越来越实时化。客服应用帮助个性化服务。人力应用帮助找出如何吸引和挽留最好的员工等。
.
Specialized Big Data applications have been popping up in pretty
much any vertical, from healthcare (notably in genomics and drug
research) to finance to fashion to law enforcement (watch Scott
Crouch, CEO of Mark43 here).
专门的大数据应用几乎在任何一个垂直行业都有出现,从医疗保健(尤其是基因组学和药物研究)到金融、时尚乃至于执法(如
Mark43)。
.
Two trends are worth highlighting.
有两个趋势值得强调一下:
.
First, many of those applications are “Big Data
Natives” in that they are themselves built on the latest Big Data
technologies, and represent an interesting way for customers to
leverage Big Data without having to deploy underlying Big Data
technologies, since those already come “in a box”, at least for
that specific function – for example, our portfolio company
ActionIQ is built on Spark (or a variation thereof) , so its
customers can leverage the power of Spark in their marketing
department without having to actually deploy Spark themselves – no
“assembly line” in this case.
首先,这些应用很多都是 “大数据原生”
的,本身都是依托在最新的大数据技术基础上开发的,代表了一种客户无须部署底层大数据技术即可利用大数据的有趣方式—因为那些底层技术已经是打包的,至少对于特定功能来说是这样的。比方说,ActionIQ
就是在 Spark 基础上开发的(或者说是 Spark 的一个派生),所以它的客户能够在营销部门利用 Spark
的威力而不需要自己部署 Spark,这种情况下是没有 “装配线” 的。
.
Second, AI has made a powerful appearance at the application level
as well. For example, in the cat and mouse game
that is security, AI is being leveraged extensively to get a leg up
on hackers and identify and combat cyberattacks in real time.
“Artificially intelligent” hedge funds are
starting to appear.
A whole AI-driven digital assistant industry has
appeared over the last year, automating tasks from scheduling
meetings (watch Dennis Mortensen, CEO of
x.ai here) to shopping to bringing
you just about everything. The degree to which
those solutions rely on AI varies greatly, ranging from near 100%
automation to “human in the loop” situations where human
capabilities are augmented by AI – nonetheless, the trend is
clear.
其次,AI 在应用层也有很强大的存在。比方说,在猫捉老鼠的安全领域中,AI
被广泛用来对付黑客,实时识别和对抗网络攻击。去年已经出现了一个 AI 驱动的数字助手行业,支持从任务自动化到会议安排(如
x.ai)以及购物等几乎一切事情。这些解决方案对 AI 的依赖程度不一,从几乎 100%自动化到 “有人参与”
等情况各不相同,但是可以明确的是,人的能力在 AI 帮助下得到了增强。
.
Conclusion
结论
.
In many ways, we’re still in the early innings of the Big Data
phenomenon. While it’s taken a few years,
building the infrastructure to store and process massive amounts of
data was just the first phase. AI/machine
learning is now precipitating a trend towards the emergence of the
application layer of Big Data. The combination
of Big Data and AI will drive incredible innovation across pretty
much every industry. From that perspective, the
Big Data opportunity is probably even bigger than people
thought.
从很多方面来看,我们仍然处在大数据现象的早期发展阶段。尽管已经花费了数年时间,但减少基础设施来存储和处理大规模数据还只是第一阶段。AI/
机器学习已经成为大数据应用层的一股迅猛趋势。大数据与 AI
的结合将会推动很多行业的惊人创新。从这个角度来说,大数据的机会也许要比大家想象的还要大。
.
As Big Data continues to mature, however, the term itself will
probably disappear, or become so dated that nobody will use it
anymore. It is the ironic fate of successful
enabling technologies that they become widespread, then ubiquitous,
and eventually invisible.
然而,随着大数据继续走向成熟,这个术语本身可能会消失,或者变得太过时以至于没有人会再使用这个词。这就是成功赋能技术令人讽刺的命运归宿—由于技术的广泛传播,然后到达无所不在的地步,最后被人熟视无睹。
.
____________________
NOTES:
附注:
.
1) First and foremost, a big thank you to our
FirstMark associate Jim
Hao who did a lot of the heavy lifting on this
project and was immensely helpful
1)由于不可能把大数据的所有公司都列到图表上,所以我们只能按照一定原则筛选部分公司出来,筛选原则一是进行过 1 轮或多轮 VC
融资的初创企业,二是把一些我们特别感兴趣的较早期初创企业列进去。
.
2) As it became very clear very quickly that we couldn’t possibly
fit all companies we wanted on the chart, we ended up
giving priority to startups that have raised one
or several rounds of venture capital financing – certainly an
imperfect criteria (but, hey, we’re VCs…), and we’ve occasionally
made the editorial decision to include earlier stage startups when
we thought they were particularly interesting.
2)值得注意的收购包括:
◦Revolution Analytics(微软 2015年1月 收购)
◦Mortar(DataDog2015年2月 收购)
◦Acunu 和 FoundationDB(2015年3月 被苹果收购)
◦AlchemyAPI(2015年3月 被 IBM 收购)
◦Amiato(2015年4月 被 Amazon 收购)
◦Next Big Sound(2015年5月 被 Pandora 收购)
◦1010Data(Advance/Newhouse 2015年8月 收购)
◦Boundary(BMC 2015年8月 收购)
◦Bime Analytics(Zendesk 2015年10月 收购)
◦CleverSafe(IBM 2015年10月 收购)
◦ParStream(2015年11月 被思科收购)
◦Lex Machine(2015年11月 被 LexisNexis 收购)
◦DataHero(2016年1月 被 Cloudability 收购)
.
3) As always, it is inevitable that we inadvertently missed some
great companies in the process of putting this chart together.
Did we miss yours? Feel free to
add thoughts and suggestions in the comments
4) The chart is in png format, which should
preserve overall quality when zooming, etc.
5) Disclaimer: I’m an investor
through FirstMark in a number of companies mentioned on this Big
Data Landscape, specifically: ActionIQ, Cockroach Labs, Helium,
HyperScience, Kinsa, Sense360 and x.ai. Other
FirstMark portfolio companies mentioned on this chart include
Bluecore, Engagio, HowGood, Payoff. I’m a small
personal shareholder in Datadog and LendingClub (pre-IPO).
6) Notable acquisitions (of all
sizes) include Revolution Analytics (acquired by Microsoft in
January 2015), Pentaho (acquired by Hitachi in February 2015),
Mortar (acquired by Datadog in February 2015), Acunu and
FoundationDB (both acquired by Apple in March 2015), AlchemyAPI
(acquired by IBM in March 2015), Amiato (acquired by Amazon in
April 2015), Next Big Sound (acquired by Pandora in May 2015),
1010Data (acquired by Advance/Newhouse in August
2015), Informatica (acquired/taken private
by a company controlled by the Permira funds and
Canada Pension Plan Investment Board (CPPIB) in August 2015),
Boundary (acquired by BMC in August 2015), Bime
Analytics (acquired by Zendesk in October 2015), CleverSafe
(acquired by IBM in October 2015) ParStream (acquired by Cisco in
November 2015), Lex Machina (acquired by LexisNexis in November
2015) and DataHero (acquired by Cloudability in January 2016).