加载中…
个人资料
  • 博客等级:
  • 博客积分:
  • 博客访问:
  • 关注人气:
  • 获赠金笔:0支
  • 赠出金笔:0支
  • 荣誉徽章:
正文 字体大小:

299.multiple comparisons or tests on subtables

(2014-07-15 06:14:28)
标签:

multiple-test

proportions

freq

multitest

call-execute

分类: 统计分享

Sometimes we need to do pairwise analyses on the proportions for a combination of subgroups. Here lists an example of performing Pearson Chi-Square tests about proportions on pairwise sub-tables. We can use WHERE statement in the procedure to select subsets of rows, columns, or both. Here rows were selected. However, those tests on arbitrary subtables will not be independent, and therefore will not partition the chi-square from the original table. Also, because multiple subtables results in multiple tests, adjusting for the multiplicity is important.

In the following example, we want to test the age effect, probably the association more exactly, on/with different types of syncope. The data was seen http://bbs.pinggu.org/thread-3128460-1-1.html.

 

proc format;

value agefmt

Remember FORMAT should always go to output to enhance data's readability.

1='<=20'

2='21-40'

3='41-60'

4='>=60'

;

run;

data x;

input age  @@;

      array t[3] $1 _temporary_ ('H', 'C', 'M');

      do i=1 to 3;

            type =t[i];

            input n_vvs@@;

            output;

            end;

lines;

1 14 3  21

2 10 12 29

3 37 20 50

4 14 1  8

;

The global Chi-Sq test has shown a significant association between age and the types of Syncope. On the contrast, MH Chi-Sq is not significant. It tells that the proportions of types of Syncope are not linearly associated with age groups. We may take this Chi-Sq test considering that it is not really related to our objective here. Considering some cells are sparse, a Fisher's exact test may be preferable as well. However, for illustration purpose, we still adopt the Chi-Sq test.

proc freq data=x order=internal;

      tables age *type/chisq fisher;

      weight n_vvs;

      format age agefmt.;

      run;

 

Statistic

DF

Value

Prob

Chi-Square

6

15.6447

0.0158

Likelihood Ratio Chi-Square

6

16.4677

0.0115

Mantel-Haenszel Chi-Square

1

1.4303

0.2317

Phi Coefficient

 

0.2673

 

Contingency Coefficient

 

0.2582

 

Cramer's V

 

0.1890

 

 

 

Fisher's Exact Test

Table Probability (P)

2.060E-08

Pr <= P

0.0169

 

We are looking into the tests of proportions on pairwise subgroups. Four age groups will result in 6 pairwise tests. Basically, we use WHERE-clause to subset the sample into sub-samples containing paired groups and then to perform the Chi-Sq test on each pair. There are many ways to do that. Here is a nice method in terms of its concise coding.

%macro freqfit(ageL,ageR);

      proc freq data=x order=internal;

            where age =&ageL or age=&ageR;

Using WHERE- to subset the sample

            tables age *type/chisq fisher;

            weight n_vvs;

            format age agefmt.;

            run;

%mend freqFit;

This SQL code is to form all the combinations of the distinct pairs of age groups. 

proc sql;

      create table runonit as

      select distinct a.age as ageL, b.age as ageR from x(keep =age) a, x(keep =age) b

      where a.age < b.age

      ;

quit;

data _null_;

      set runonit end=Eof;

      if _n_ =1 then call execute('ods output chisq(persist)=chisq(where=(statistic="Chi-Square")

                                      rename=(value =ChiSq Prob=Raw_P));');

Why I rename prob = raw_p? The reason is that 'raw_p' is a reserved variable name for the input data set to MULTTEST procedure.

      call execute('%freqFit('||cats(ageL, ',', ageR)||')');

THE code for execute function is very readable in SAS log window. That is why sometimes I prefer this function rather than macro-looping.

      if Eof then call execute('ods output clear;');

      run;

 

Since more than two subgroups were involved, it is not surprised for some people to consider multiple comparisons for controlling reference errors. We input raw p values from the Chi-Sq tests above into the MULTTEST procedure.

ods output pvalues =pvs;

proc multtest inpvalues=chisq bon holm hoc fdr;

run;

 

data xx;

      length pairs $32.;

      merge runonit chisq pvs;

      pairs =catx(' vs. ', put(ageL, agefmt.), put(ageR, agefmt.));

      keep pairs ChiSq raw_p Bonferroni;

      run;

 

Obs

pairs

ChiSq

Raw_P

Bonferroni

1

<=20 vs. 21-40

5.5666

0.0618

0.3710

2

<=20 vs. 41-60

2.5187

0.2838

1.0000

3

<=20 vs. >=60

3.3411

0.1881

1.0000

4

21-40 vs. 41-60

3.7110

0.1564

0.9383

5

21-40 vs. >=60

13.1866

0.0014

0.0082

6

41-60 vs. >=60

6.3519

0.0418

0.2505

 

Here comes the summary of the pairwise Chi-Sq tests for each combination of age groups. The raw p values give two significant groups; while considering multiple comparison,  only paired group of 21-40 vs. >=60 is significant. Of note: group of the youngest people (<= 20) are hardly significantly different to any other groups.

0

阅读 收藏 喜欢 打印举报/Report
  

新浪BLOG意见反馈留言板 欢迎批评指正

新浪简介 | About Sina | 广告服务 | 联系我们 | 招聘信息 | 网站律师 | SINA English | 产品答疑

新浪公司 版权所有