加载中…
个人资料
  • 博客等级:
  • 博客积分:
  • 博客访问:
  • 关注人气:
  • 获赠金笔:0支
  • 赠出金笔:0支
  • 荣誉徽章:
正文 字体大小:

STATA中collapse的妙用

(2014-07-24 00:22:18)
标签:

stata

教育

分类: 04STATA数据处理
在数据处理中,我们经常要遇到分组求和 求均值等运算,计算完毕后再一次加入到新一轮的计算中,之前我们使用这个命令并不是太多,在这里我总结了几个例子。

Stata Learning Modules
Collapsing data across observations

Sometimes you have data files that need to be collapsed to be useful to you. For example, you might have student data but you really want classroom data, or you might have weekly data but you want monthly data, etc. We will illustrate this using an example showing how you can collapse data across kids to make family level data.

Here is a file containing information about the kids in three families. There is one record per kid. Birth is the order of birth (i.e., 1 is first), age wt and sexare the child's age, weight and sex. We will use this file for showing how to collapse data across observations.

use http://www.ats.ucla.edu/stat/stata/modules/kids, clear

list

          famid    kidname      birth        age         wt        sex

  1.              Beth                         60         

  2.                Bob                         40         

  3.              Barb                         20         

  4.              Andy                         80         

  5.                Al                  6         50         

  6.               Ann                         20         

  7.              Pete                         60         

  8.               Pam                         40         

  9.              Phil                         20          

Consider the collapse command below. It collapses across all of the observations to make a single record with the average age of the kids.

collapse age

list

           age

  1.  5.111111  

The above collapse command was not very useful, but you can combine it with the by(famid) option, and then it creates one record for each family that contains the average age of the kids in the family.

use http://www.ats.ucla.edu/stat/stata/modules/kids, clear

collapse age, by(famid)

list

         famid        age

  1.                

  2.          5.333333 

  3.                 

The following collapse command does the exact same thing as above, except that the average of age is named avgage and we have explicitly told thecollapse command that we want it to compute the mean.

use http://www.ats.ucla.edu/stat/stata/modules/kids, clear

collapse (mean) avgage=age, by(famid)

list

         famid     avgage

  1.                

  2.           5.333333 

  3.                

We can request averages for more than one variable. Here we get the average for age and for wt all in the same command.

use http://www.ats.ucla.edu/stat/stata/modules/kids, clear

collapse (mean) avgage=age avgwt=wt, by(famid)

list

         famid     avgage      avgwt

  1.                        40 

  2.          5.333333         50 

  3.                        40  

This command gets the average of age and wt like the command above, and also computes numkids which is the count of the number of kids in each family (obtained by counting the number of observations with valid values of birth).

use http://www.ats.ucla.edu/stat/stata/modules/kids, clear 

collapse (mean) avgage=age avgwt=wt (count) numkids=birth, by(famid)

list

         famid     avgage      avgwt    numkids

  1.                        40         

  2.          5.333333         50         

  3.                        40          

Suppose you wanted a count of the number of boys and girls in the family. We can do that with one extra step. We will create a dummy variable that is 1 if the kid is a boy (0 if not), and a dummy variable that is 1 if the kid is a girl (and 0 if not). The sum of the boy dummy variable is the number of boys and the sum of the girl dummy variable is the number of girls.

First, let's use the kids file (and clear out the existing data).

use http://www.ats.ucla.edu/stat/stata/modules/kids, clear 

We use tabulate with the generate option to make the dummy variables.

tabulate sex, generate(sexdum)

        sex |      Freq.     Percent        Cum.

------------+-----------------------------------

          f |               44.44       44.44

          m |               55.56      100.00

------------+-----------------------------------

      Total |              100.00

We can look at the dummy variables. Sexdum1 is the dummy variable for girls. Sexdum2 is the dummy variable for boys. The sum of sexdum1 is the number of girls in the family. The sum of sexdum2 is the number of boys in the family.

list famid sex sexdum1 sexdum2

          famid        sex   sexdum1   sexdum2

  1.                              

  2.                              

  3.                              

  4.                              

  5.                              

  6.                              

  7.                              

  8.                              

  9.                               

The command below creates girls which is the number of girls in the family, and boys which is the number of boys in the family.

collapse (count) numkids=birth (sum) girls=sexdum1 boys=sexdum2, by(famid)

We can list out the data to confirm that it worked correctly.

list famid boys girls numkids

         famid      boys     girls    numkids

  1.                              

  2.                2                 

  3.                               

Summary

To create one record per family (famid) with the average of age within each family.

collapse age, by(famid)

To create one record per family (famid) with the average of age (called avgage) and average weight (called avgwt) within each family.

collapse (mean) avgage=age avgwt=wt,  by(famid)

Same as above example, but also counts the number of kids within each family calling that numkids.

collapse (mean) avgage=age  avgwt=wt (count) numkids=birth, by(famid)

Counts the number of boys and girls in each family by using tabulate to create dummy variables based on sex and then summing the dummy variables within each family.

tabulate sex, generate(sexdum)

collapse (sum) girls=sexdum1 boys=sexdum2, by(famid)

 

0

阅读 收藏 喜欢 打印举报/Report
  

新浪BLOG意见反馈留言板 欢迎批评指正

新浪简介 | About Sina | 广告服务 | 联系我们 | 招聘信息 | 网站律师 | SINA English | 产品答疑

新浪公司 版权所有