加载中…
个人资料
bicloud
bicloud
  • 博客等级:
  • 博客积分:0
  • 博客访问:378,159
  • 关注人气:492
  • 获赠金笔:0支
  • 赠出金笔:0支
  • 荣誉徽章:
相关博文
推荐博文
谁看过这篇博文
加载中…
正文 字体大小:

python 迭代器

(2011-05-04 10:41:55)
标签:

python

groupby

杂谈

分类: python
实现groupby
#!/usr/bin/python
from itertools import groupby
from operator import itemgetter

def summary(data, key=itemgetter(0), value=itemgetter(1)):
    """Summarise the supplied data.

       Produce a summary of the data, grouped by the given key (default: the
       first item), and giving totals of the given value (default: the second
       item).

       The key and value arguments should be functions which, given a data
       record, return the relevant value.
    """

    for k, group in groupby(data, key):
        yield (k, sum(value(row) for row in group))

if __name__ == "__main__":
    # Example: given a set of sales data for city within region,
    # produce a sales report by region
    sales = [('Scotland', 'Edinburgh', 20000),
             ('Scotland', 'Glasgow', 12500),
             ('Wales', 'Cardiff', 29700),
             ('Wales', 'Bangor', 12800),
             ('England', 'London', 90000),
             ('England', 'Manchester', 45600),
             ('England', 'Liverpool', 29700)]

    for region, total in summary(sales, key=itemgetter(0), value=itemgetter(2)):
        print "s: %d" % (region, total)

$python groupby.py
  Scotland: 32500
     Wales: 42500
   England: 165300

实现多key groupby
#!/usr/bin/python
from itertools import groupby
from operator import itemgetter

def set_keys(*indices):
    """Returns a function that returns a tuple of key values"""
    def get_keys(seq, indices=indices):
        keys = []
        for i in indices:
            keys.append(seq[i])
        return tuple(keys)
    return get_keys
   

def summary(data, key=itemgetter(0), value=itemgetter(1)):
    """Summarise the supplied data.

       Produce a summary of the data, grouped by the given key (default: the
       first item), and giving totals of the given value (default: the second
       item).

       The key and value arguments should be functions which, given a data
       record, return the relevant value.
    """

    for k, group in groupby(data, key):
        yield (k, sum(value(row) for row in group))

if __name__ == "__main__":
    # Example: given a set of sales data for city within region,
    # produce a sales report by region
    sales = [('Scotland', 'Edinburgh', 'Branch1', 20000),
             ('Scotland', 'Glasgow', 'Branch1', 12500),
             ('Scotland', 'Glasgow', 'Branch2', 12000),
             ('Wales', 'Cardiff', 'Branch1', 29700),
             ('Wales', 'Cardiff', 'Branch2', 30000),
             ('Wales', 'Bangor', 'Branch1', 12800),
             ('England', 'London', 'Branch1', 90000),
             ('England', 'London', 'Branch2', 80000),
             ('England', 'London', 'Branch3', 70000),
             ('England', 'Manchester', 'Branch1', 45600),
             ('England', 'Manchester', 'Branch2', 50000),
             ('England', 'Liverpool', 'Branch1', 29700),
             ('England', 'Liverpool', 'Branch2', 25000)]

    sales.sort()
    for (region, city), total in summary(sales, key=set_keys(0,1), value=itemgetter(3)):
        print "%-10s  %-10s : �" % (region, city, total)

$python mgroupby.py
England     Liverpool    54700
England     London      240000
England     Manchester :    95600
Scotland    Edinburgh    20000
Scotland    Glasgow      24500
Wales       Bangor       12800
Wales       Cardiff      59700


from:
http://code.activestate.com

0

阅读 评论 收藏 转载 喜欢 打印举报/Report
  • 评论加载中,请稍候...
发评论

    发评论

    以上网友发言只代表其个人观点,不代表新浪网的观点或立场。

      

    新浪BLOG意见反馈留言板 电话:4000520066 提示音后按1键(按当地市话标准计费) 欢迎批评指正

    新浪简介 | About Sina | 广告服务 | 联系我们 | 招聘信息 | 网站律师 | SINA English | 会员注册 | 产品答疑

    新浪公司 版权所有