# 加载中...

bicloud
• 博客等级：
• 博客积分：0
• 博客访问：380,110
• 关注人气：493
• 获赠金笔：0支
• 赠出金笔：0支
• 荣誉徽章：

## 基于mapreduce的购物篮分析算法实现

(2011-08-01 13:31:28)

### it

1介绍

2相关工作

3.1 mapreudce原理

3.2 海量数据存储

3.3 mapreduce问题
(1)在处理大量数据应用中，这种编程方式是一种巨大的退步
(2)并不是全新的方法，譬如很多实现的算法是数十年前的，尤其是人工智能算法
(3)数据需要转化成key-value形式，失去了当前RDBMS的很多特性
(4)与很多已经存在的算法和工具不兼容

4购物篮分析算法

Transaction 1: cracker, icecream, beer
Transaction 2: chicken, pizza, coke, bread
Transaction 3: baguette, soda, hering, cracker, beer
Transaction 4: bourbon, coke, turkey
Transaction 5: sardines, beer, chicken, coke
Transaction 6: apples, peppers, avocado, steak
Transaction 7: sardines, apples, peppers, avocado,
steak
......

Total number of Items: 322,322
Ten most frequent Items:
cracker, beer 6,836
bourbon, cracker 5,299
baguette, beer 5,003
corned, hering 4,664
beer, hering 4,566
......
Mapper
1: Reads each transaction of input file and generates
the data set of the items:
(<V1>, <V2>, …, <Vn>) where < Vn>: (vn1, vn2,.. vnm)
2: Sort all data set <Vn> and generates sorted data set
<Un>:
(<U1>, <U2>, …, <Un>) where < Un>: (un1, un2,.. unm)
3: Loop While < Un> has the next element;
note: each list Un is handled individually
3.1: Loop For each item from un1 to unm of < Un> with
NUM_OF_PAIRS
3.a: generate the data set <Yn>: (yn1, yn2,.. ynl);
ynl: (unx􀀀 uny) is the list of self-crossed pairs of
(un1, un2,.. unm) where unx uny
3.b: increment the occurrence of ynl;
note: (key, value) = (ynl, number of occurrences)
3.2: End Loop For
4. End Loop While
5. Data set is created as input of Reducer: (key,
<value>) = (ynl, <number of occurrences>)
Reducer
1: Read (ynl, <number of occurrences>) data from
multiple nodes
2. Add the values for ynl to have (ynl, total number of
occurrences)

5实验结果
Total number of keys in order 2: 212
Total number of items: 1,255,922,927
Items Paired (key) Frequency (value)
cracker, heineken 208,816,643
bourbon, cracker 161,866,763
baguette, heineken 152,824,775
corned_b, hering 142,469,636
heineken, hering 139,475,906
bourbon, heineken 126,310,383
baguette, cracker 125,699,308
artichok, heineken 125,180,072

0

• 评论加载中，请稍候...

发评论

以上网友发言只代表其个人观点，不代表新浪网的观点或立场。

新浪BLOG意见反馈留言板　电话：4000520066 提示音后按1键（按当地市话标准计费）　欢迎批评指正

新浪公司 版权所有