STATA中collapse的妙用
(2014-07-24 00:22:18)
标签:
stata教育 |
分类: 04STATA数据处理 |
Stata Learning Modules
Collapsing data across observations
Sometimes you
have data files that need to
be
Here is a file
containing information about the kids in three families. There is
one record per
kid.
use http://www.ats.ucla.edu/stat/stata/modules/kids, clear
list
Consider
the
collapse age
list
The
above
use http://www.ats.ucla.edu/stat/stata/modules/kids, clear
collapse age, by(famid)
list
The
following
use http://www.ats.ucla.edu/stat/stata/modules/kids, clear
collapse (mean) avgage=age, by(famid)
list
We
can request averages for more than one variable. Here we get the
average for
use http://www.ats.ucla.edu/stat/stata/modules/kids, clear
collapse (mean) avgage=age avgwt=wt, by(famid)
list
This command
gets the average
of
use
http://www.ats.ucla.edu/stat/stata/modules/kids,
clear
collapse (mean) avgage=age avgwt=wt (count) numkids=birth, by(famid)
list
Suppose you
wanted a count of the number of boys and girls in the family. We
can do that with one extra step. We will create a dummy variable
that is 1 if the kid is a boy (0 if not), and a dummy variable that
is 1 if the kid is a girl (and 0 if not). The sum of
the
First, let's use the kids file (and clear out the existing data).
use
http://www.ats.ucla.edu/stat/stata/modules/kids,
clear
We
use
tabulate sex, generate(sexdum)
------------+-----------------------------------
------------+-----------------------------------
We
can look at the dummy
variables.
list famid sex sexdum1 sexdum2
The
command below
creates
collapse (count) numkids=birth (sum) girls=sexdum1 boys=sexdum2, by(famid)
We can list out the data to confirm that it worked correctly.
list famid boys girls numkids
Summary
To create one record per family (famid) with the average of age within each family.
collapse age, by(famid)
To create one record per family (famid) with the average of age (called avgage) and average weight (called avgwt) within each family.
collapse (mean)
avgage=age avgwt=wt,
Same as above
example, but also counts the number of kids within each family
calling that
collapse (mean)
avgage=age
Counts the number of boys and girls in each family by using tabulate to create dummy variables based on sex and then summing the dummy variables within each family.
tabulate sex, generate(sexdum)
collapse (sum) girls=sexdum1 boys=sexdum2, by(famid)