加载中…
个人资料
  • 博客等级:
  • 博客积分:
  • 博客访问:
  • 关注人气:
  • 获赠金笔:0支
  • 赠出金笔:0支
  • 荣誉徽章:
正文 字体大小:

8. MAD(中位数绝对偏差)

(2012-03-29 04:58:29)
标签:

mad

proc

iml

linear

regression

分类: 统计分享

Summary:

From WIKI: For a univariate data set X1, X2, ..., Xn, the MAD is defined as the median of the absolute deviations from the data's median:

http://s8/middle/a3a92636gbc53f7d0a917&690MAD(中位数绝对偏差)" TITLE="8. MAD(中位数绝对偏差)" />

 

that is, starting with the residuals (deviations) from the data's median, the MAD is the median of their absolute values.

The calculation of MAD statistic is very straightforward in proc IML. The Scale factor K, i.e., the ratio of STD and MAD, in normal distribution is known to many people. The code was given to test if K was correct. The correctness of K is theortical proved. So, this turns out testing the validity of generated random numbers. The SAS code as tested on IML/Studio 4.3 but should work well in PROC IML.

以上定义从Wiki上引用。MAD的计算非常简单,虽然许多人也许并没有注意到这个统计量值。SAS语句是在IML/Studio上编写。不言而喻,就这段语句而言,和PROC IML语句没有任何的区别。只是放入PROC IML即可。K = STD/MAD, 是有理论依据的。所以对K的检测可归结为随机数特性的检查。

 

Results:

How MAD was calculated: 

<direct MAD from SAS function> 

                             1

<original vector>   <median of vector>   <absolute deviation vector>   <computed MAD> 

                1 *                   2 *                         1 *              1

                1                                                  1                 

                2                                                  0                

                2                                                  0                

                4                                                  2                 

                6                                                  4                

                9                                                  7                

 

Regression test if K correct: 

<Scale factor K> Estimated value <P value:Estimated = K?> 

          1.4826         1.48211                    0.724

 

SAS code (from IML/Studio):

*compute MAD statistic;

c = {1, 1, 2, 2, 4, 6, 9};

mad0 = mad(c);

median0 = median(c);

c1 = abs(c-median(c));

mad2 = median(c1);

print "How MAD was calculated:",,

         mad0[label ='<direct MAD from SAS function>' format =best.],

         c[label ="<original vector>"]'*'(median(c))[label ="<median of vector>"]'*'
      
(c1)[label ="<absolute deviation vector>"]'*' mad2[label ="<computed MAD>"];

*simulate and calculate Scale factor K;

x = j(1000, 1000);

m = j(ncol(x),2);

do i =1 to ncol(x);

       call randseed(1234);

       _x = x[,i];

       call randgen(_x, "normal");

       m[i,1] = mad(_x);

       m[i,2] = std(_x);

end;

k = 1/quantile('normal', 3/4);

x = m[, 1]; y = m[, 2];

start Regress;                   

  xpxi = inv(x`*x);              

  beta = xpxi * (x`*y);          

  yhat = x*beta;                 

  resid = y-yhat;             

  sse = ssq(resid);              

  n = nrow(x);                  

  dfe = nrow(x)-ncol(x);        

  mse = sse/dfe;                 

  cssy = ssq(y-sum(y)/n);        

  rsquare = (cssy-sse)/cssy;    

  stdb = sqrt(vecdiag(xpxi)*mse);

  t = (beta-k)/stdb;                 

  prob = 1-probf(t#t,1,dfe);

  print "Regression test if K correct:",,

        k[label ="<Scale factor K>" format =best7.5]
       beta[label =
"Estimated value" format =best7.5]
       prob[label =
'<P value:Estimated = K?>' format =pvalue6.3];

finish Regress;             

run Regress;

0

阅读 收藏 喜欢 打印举报/Report
  

新浪BLOG意见反馈留言板 欢迎批评指正

新浪简介 | About Sina | 广告服务 | 联系我们 | 招聘信息 | 网站律师 | SINA English | 产品答疑

新浪公司 版权所有