[HIVE]hive中使用自定义函数(UDF)实现分析函数row_number的功能
(2011-08-11 16:15:59)
标签:
hiveudf分析函数row_number |
分类: Hadoop Hive |
hive> desc row_number_test;
OK
id1
id2
age
score
name
hive> select * from row_number_test;
OK
2
1
1
2
2
1
1
使用时要先在子查询中进行分区与排序,比如oracle中这样一句SQL:
select row_number() over (partition by id1 order by age desc) from row_number_test;
转换为hive语句应该是:
select row_number(id1) from
如果partition by 两个字段:
select row_number() over (partition by id1,id2 order
by
转换为hive语句应该是:
select row_number(id1,id2)
展示一下查询结果:
1.
select id1,id2,age,score,name,row_number(id1) rn from (select * from row_number_test distribute by id1 sort by id1,age desc) a;
OK
2
2
2
1
1
1
1
2.
select id1,id2,age,score,name,row_number(id1,id2) rn from
(select * from row_number_test distribute by id1,id2 sort
by
OK
2
1
2
1
1
1
2
下面是代码,只实现了接收1个参数和2个参数的evaluator方法,参数再多的照搬代码就可以了,代码仅供参考:
package com.hadoopbook.hive;
import org.apache.hadoop.hive.ql.exec.UDF;
import org.apache.hadoop.hive.ql.udf.UDFType;
@UDFType(deterministic = false)
public class Row_number extends UDF {
private static int MAX_VALUE = 50;
private static String comparedColumn[] = new String[MAX_VALUE];
private static int rowNum = 1;
public int evaluate (Object ...args){
String columnValue[] = new String[args.length];
for(int i=0;i<args.length;i++)
columnValue[i] = args[i].toString();
if (rowNum == 1)
{
for(int i=0;i<columnValue.length;i++)
comparedColumn[i] = columnValue[i];
}
for(int i=0;i<columnValue.length;i++)
{
if ( !comparedColumn[i].equals(columnValue[i]) )
{
for (int j=0;j<columnValue.length;j++)
{
comparedColumn[j] = columnValue[j];
}
rowNum = 1;
return rowNum++;
}
}
return rowNum++;
}
public static void main(String args[])
{
Row_number t = new Row_number();
System.out.println(t.evaluate(123));
System.out.println(t.evaluate(123));
System.out.println(t.evaluate(123));
System.out.println(t.evaluate(1234));
System.out.println(t.evaluate(1234));
System.out.println(t.evaluate(1234));
System.out.println(t.evaluate(1235));
}
}