【hive】sql语句结尾distributebyrand()_Sunmi

个人资料

微博

正文字体大小：大中小

【hive】sql语句结尾distributebyrand()

(2020-02-05 09:36:24)

分类：数据分析

distribute by ：用来控制map输出结果的分发，即map端如何拆分数据给reduce端。会根据distribute by 后边定义的列，根据reduce的个数进行数据分发，默认是采用hash算法。

select * from mytest distrubute by word sort by word;

这里distrubute by 后边跟的是word，会有数据倾斜的风险

当 distribute by 后边跟的列是：rand()时，即保证每个分区的数据量基本一致

https://blog.csdn.net/lzw2016/article/details/97818080

阅读┊ 收藏 ┊ 喜欢 ▼ ┊打印┊举报/Report