加载中…
正文 字体大小:

读paper,GLMix: Generalized Linear Mixed Models For Large-Scale&nb

(2016-10-06 09:19:57)
标签:

glm

mf

分类: 数据挖掘

GLMix: Generalized Linear Mixed Models For Large-Scale Response Prediction

1.背景介绍

glm,generalized linear model在统计推断和响应预测问题中有非常广泛的应用。glmx算法在linkedin的job主页,文章推荐,竞价广告和job推荐等应用中,得到不错的效果。在这些大数据应用场景中,通过user和item级别的更加精细粒度的特征,可以有效的提升模型性能;常用的方法之一就是将ID-level作为模型的系数进行训练,此类问题可以定义为GLMx(generalized linear Mixed model). ID-level往往是海量的,因此在大数据环境下,训练glmx模型是一个非常大的挑战。本文主要是基于spark计算平台,通过bsp模式实现glmx模型。

2.算法介绍

2.1 glm模型

预测user i对item j的响应率,
g(E[yij])=f1=Xij′W" role="presentation" style="display: inline-block; line-height: 0; font-size: 18.72px; word-wrap: normal; word-spacing: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0px; min-height: 0px; border: 0px; margin: 0px; padding: 1px 0px; position: relative;">g(E[yij])=f1=XijWg(E[yij])=f1=Xij′W,其中 Xij" role="presentation" style="display: inline-block; line-height: 0; font-size: 18.72px; word-wrap: normal; word-spacing: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0px; min-height: 0px; border: 0px; margin: 0px; padding: 1px 0px; position: relative;">XijXij表示特征向量,w" role="presentation" style="display: inline-block; line-height: 0; font-size: 18.72px; word-wrap: normal; word-spacing: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0px; min-height: 0px; border: 0px; margin: 0px; padding: 1px 0px; position: relative;">ww表示系数向量,E" role="presentation" style="display: inline-block; line-height: 0; font-size: 18.72px; word-wrap: normal; word-spacing: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0px; min-height: 0px; border: 0px; margin: 0px; padding: 1px 0px; position: relative;">EE表示响应率的期望,g" role="presentation" style="display: inline-block; line-height: 0; font-size: 18.72px; word-wrap: normal; word-spacing: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0px; min-height: 0px; border: 0px; margin: 0px; padding: 1px 0px; position: relative;">gg表示链接函数。

2.2 GLMix: Generalized Linear Mixed Model

预测user i对item j的响应率,
g(E[yij])=f1=Xij′W+Xj′αi+Xi′βj" role="presentation" style="display: inline-block; line-height: 0; font-size: 18.72px; word-wrap: normal; word-spacing: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0px; min-height: 0px; border: 0px; margin: 0px; padding: 1px 0px; position: relative;">g(E[yij])=f1=XijW+Xjαi+Xiβjg(E[yij])=f1=Xij′W+Xj′αi+Xi′βj,模型参数由不同的粒度组成,αi" role="presentation" style="display: inline-block; line-height: 0; font-size: 18.72px; word-wrap: normal; word-spacing: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0px; min-height: 0px; border: 0px; margin: 0px; padding: 1px 0px; position: relative;">αiαi表示每个用户粒度的系数,βj" role="presentation" style="display: inline-block; line-height: 0; font-size: 18.72px; word-wrap: normal; word-spacing: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0px; min-height: 0px; border: 0px; margin: 0px; padding: 1px 0px; position: relative;">βjβj表示每个item粒度的系数。
GLMix = GLM + per-user model + per-item model

2.3 glmx在job推荐中应用

(1)Global model(fixed effect)
Similarity between member profile and jobs profile, e.g. do the member skills and job skills look similar?
(2)Per-member model(random effect)
E.g. If a member has applied to a job with title = “software engineer”, we will boost “software engineer” jobs more in her results.
(3)Per-job model(random effect)
E.g. If a job gets an apply with a member titled “software engineer”, we will boost this job more for members with this title.
(4)模型应用

读paper,GLMix: <wbr>Generalized <wbr>Linear <wbr>Mixed <wbr>Models <wbr>For <wbr>Large-Scale&nb

3. 模型拓展

GLM + MF,
g(E[yij])=f1+f2=Xij′W+ui′vj" role="presentation" style="display: inline-block; line-height: 0; font-size: 18.72px; word-wrap: normal; word-spacing: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0px; min-height: 0px; border: 0px; margin: 0px; padding: 1px 0px; position: relative;">g(E[yij])=f1+f2=XijW+uivjg(E[yij])=f1+f2=Xij′W+ui′vj
f1" role="presentation" style="display: inline-block; line-height: 0; font-size: 18.72px; word-wrap: normal; word-spacing: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0px; min-height: 0px; border: 0px; margin: 0px; padding: 1px 0px; position: relative;">f1f1表示glm,f2" role="presentation" style="display: inline-block; line-height: 0; font-size: 18.72px; word-wrap: normal; word-spacing: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0px; min-height: 0px; border: 0px; margin: 0px; padding: 1px 0px; position: relative;">f2f2表示MF
谷歌提出的wide&deep learning模型框架,可以表示为
GLM + DNN = Wide & Deep Learning

4. 模型实现

linkedin开源该算法框架的实现
https://github.com/linkedin/photon-ml
Implemented in the GAME module,Unifies and mixes different models into a principled additive model.

5. 参考资料

[1] GLMix: Generalized Linear Mixed Models For Large-Scale Response Prediction
X. Zhang et al., KDD2016
[2] GAME https://github.com/linkedin/photon-ml



0

阅读 评论 收藏 转载 喜欢 打印举报
已投稿到:
  • 评论加载中,请稍候...
发评论

    发评论

    以上网友发言只代表其个人观点,不代表新浪网的观点或立场。

      

    新浪BLOG意见反馈留言板 不良信息反馈 电话:4006900000 提示音后按1键(按当地市话标准计费) 欢迎批评指正

    新浪简介 | About Sina | 广告服务 | 联系我们 | 招聘信息 | 网站律师 | SINA English | 会员注册 | 产品答疑

    新浪公司 版权所有