加载中…
个人资料
  • 博客等级:
  • 博客积分:
  • 博客访问:
  • 关注人气:
  • 获赠金笔:0支
  • 赠出金笔:0支
  • 荣誉徽章:
正文 字体大小:

大盘预测术1-大盘涨跌和涨跌家数比例的关系计算

(2013-10-27 11:02:43)
分类: 大盘指数
大盘预测术1-大盘涨跌和涨跌家数比例的关系计算
概述:
网上有用上涨家数和下跌家数,来预测第二天大盘涨跌的文章,并且给出了分析过程,让人感觉很神奇。
网上预测大盘涨跌的文章是下面这个链接:
blog.sina.cn/dpool/blog/s/blog_7fb981940101ahoi.html
作者股为纸是美国金融博士从事软件开发,身处美国,知识产权保护意识极强。
很多人向作者讨教相关预测技术,被残酷的拒绝了。

不能眼看着别人掌握核心机密而无动于衷,我也想预测预测大盘。
最近请教了几个数学系的教授老师,问能否通过数据预测大盘,他们指点了几下,
我准备验证一下网上文章的理论。

由于是现学现卖,这里贴出sas运行结果,懂得人提醒一下我理解的基本错误问题。
特别是sas结果分析部分,我的理解十有八九可能是错的。
工具选择sas。
1.目地:计算当日涨跌和过去几天上涨家数/下跌家数之间的关系。
2.分析过程:
  计算 上海当日涨跌 与 昨天/前天/大前天/大大前天/大大大前天涨跌家数比的相关程度
3.数据定义:数据是上海10年的日K线,3300多个样本。
  3.1上海是指上证指数,数据集合是stock.Stock999999_txt。
     收盘价是close,下跌家数是ds_diejiashu,上涨家数是zs_zhangjiashu。
  3.2当日涨跌是当天上证指数的涨跌情况,用1/0表示,1表示上涨,0表示平盘或下跌,变量是zdvalue。
  3.3涨跌家数比有两种计算方式:一是用下跌家数直接除上涨家数,变量是zdb;
二是用下跌家数除以(下跌家数+上涨家数),变量是zd。
大前天的涨跌比用 昨天+前天+大前天 来计算。
昨天的涨跌比赛zdb1,前天是zdb2,大前天是zdb3.
4.数据来源:通达信软件。
操作办法:显示通达信日K线,然后追加 advance/decline 上涨家数/下跌家数 两个指标;
从File菜单点击导出按钮,导出成text文件。
5.sas代码:
6.sas结果:

7.sas结果分析:
7.1看zdvalue和zdb1/zdb2/zdb3/zdb4的结果。
7.1.1 R-Square    0.0007    Max-rescaled R-Square    0.0010
0.0007表示0.07%的样本被logistic模型覆盖,覆盖率太低了。
表示logistic模型不适合分析这套数据?请专业人士发表意见。
7.1.2 Analysis of Maximum Likelihood Estimates
                               Standard          Wald
Parameter    DF    Estimate       Error    Chi-Square    Pr > ChiSq

Intercept         -255.5       754.5        0.1146        0.7349
zdb1            0.000156    0.000121        1.6583        0.1978
zdb2               113.1       357.0        0.1003        0.7514
zdb3               142.5       947.3        0.0226        0.8804
zdb4                                              .
p值都大于0.05.表示logistic模型不适合。

7.2看zdvalue和zdb1(昨天)的结果
R-Square    0.0006    Max-rescaled R-Square    0.0009
R-Square 太小。
                               Standard          Wald
Parameter    DF    Estimate       Error    Chi-Square    Pr > ChiSq

Intercept         0.0899      0.0346        6.7432        0.0094
zdb1            0.000159    0.000121        1.7106        0.1909
zdb1的P值0.1907>0.05,模型符合度太小。

logodds=  0.0899 + zd1* 0.000159;
odds=exp(logodds);
p=odds/(1+odds);
p是zdvalue=1的概率。p>0.5时,预测zdvalue是1。

股为纸得出的计算表达式是
 zdvalue=1.70510-1.26517*zdb
他的分析是:“发现根据跌涨比来预测当天的大盘收盘是涨还是跌,结果实际预测值为0.64或其他一些数据,
这不坑爹嘛。。。。。不要急,还要进行再次数据拟合处理,要让那些小数点的预测值形成1或0,且拟合的
数据最大限度符合样本历史的大盘实际涨跌的数据,损失最小。
5、第五步,处理阀值alpha,让产生的预测值与实际值拟合最好。
发现,当alpha=0.37时,预测出现错误的次数最少,为357次,即错误率=357/3117=11.4%,即用此方法,有将近90%的准确率。
6、第6步,规律找到了,但规律与大盘涨跌是个线性关系,且同步,因此如何预测仍然是个问题,因此需要继续设计算法,可
进行超前预测的算法。对跌涨比原始数据进行EXPMA均值滑动处理,参数分别设置为P1=3,P2=5,P3=8.
同时根据模型,1.70510-1.26517*zdb>0.37,可以计算出来zdb<1.055273时,大盘为涨,反之为跌。 ”。

可以看出2点错误:
1是他把logodds直接当zdvalue预测值使用了,这应该是错误的。
2是他对logodds值,做第二次alpha/expma拟合处理(我不懂什么是alpha/expma).
然后他找到了一个值logodds=0.37, 反过来计算出当zdb<1.055273时,大盘涨。
由于他把logodds值之间当zdvalue再用,后面虽然做了二次拟合,由于数值用法错误,后面的结果也就不科学了。
难道这就是很多专家认为股票预测是伪科学的由来?

7.3再看zdvalue和zd1/zd2/zd3/zd4的结果。
7.3.1 R-Square    0.0026    Max-rescaled R-Square    0.0035
             Analysis of Maximum Likelihood Estimates
                               Standard          Wald
Parameter    DF    Estimate       Error    Chi-Square    Pr > ChiSq
Intercept        -0.0822      0.1263        0.4239        0.5150
zd1              -0.3802      0.1777        4.5744        0.0325
zd2               0.8311      0.3526        5.5550        0.0184
zd3              -0.4706      0.5280        0.7945        0.3727
zd4               0.3622      0.4895        0.5475        0.4594
R-Square=0.0026表示0.26%的样本被logistic模型覆盖,覆盖率太低了。
表示logistic模型不适合分析这套数据?请专业人士发表意见。
zd1/zd2的P值小于0.05,是主要变量。
logodds=  -0.0822 - zd1* 0.3802 + zd2 * 0.8311;
odds=exp(logodds);
p=odds/(1+odds);
p是zdvalue=1的概率。p>0.5时,预测zdvalue是1。
由于R-Square太小,模型覆盖度太低,应抛弃。

7.4再看zdvalue和zd1/zd2的结果
 R-Square    0.0011    Max-rescaled R-Square    0.0014
             Analysis of Maximum Likelihood Estimates
                               Standard          Wald
Parameter    DF    Estimate       Error    Chi-Square    Pr > ChiSq
Intercept        -0.0716      0.0935        0.5866        0.4437
zd2               0.3235      0.1704        3.6053        0.0576
R-Square太小了,zd2的P值也大于0.05.
抛弃。
8.sas分析总结
由于R-Square值太小,表明用过去涨跌家数比例来预测大盘指数明天涨跌是不靠谱的。
请专业人士发表看法。
sas script:

data datain_1;
set stock.Stock999999_txt;
zhangdiefu=(close-lag1(close))/lag1(close);
if close >  lag1(close) then zdvalue=1;
if close <= lag1(close) then zdvalue=0;

zdb1 = lag2(ds_diejiashu)/
 (
   lag2(zs_zhangjiashu) + 0.1
 );
zdb2 = (lag2(ds_diejiashu)+lag3(ds_diejiashu))/
 (
   lag2(ds_diejiashu)+
   lag3(ds_diejiashu) + 0.1
 );
zdb3 = (lag2(ds_diejiashu)+lag3(ds_diejiashu)+lag4(ds_diejiashu))/
 (
   lag2(ds_diejiashu)+
   lag3(ds_diejiashu)+
   lag4(ds_diejiashu) + 0.1
 );
zdb4 = (lag2(ds_diejiashu)+lag3(ds_diejiashu)+lag4(ds_diejiashu)+lag5(ds_diejiashu))/
 (
   lag2(ds_diejiashu)+
   lag3(ds_diejiashu)+
   lag4(ds_diejiashu)+
   lag5(ds_diejiashu) + 0.1
 );

zd1 = lag2(ds_diejiashu)/
 (
   lag2(ds_diejiashu)+lag2(zs_zhangjiashu) + 0.1
 );
zd2 = (lag2(ds_diejiashu)+lag3(ds_diejiashu))/
 (
   lag2(ds_diejiashu)+lag2(zs_zhangjiashu)+
   lag3(ds_diejiashu)+lag3(zs_zhangjiashu) + 0.1
 );
zd3 = (lag2(ds_diejiashu)+lag3(ds_diejiashu)+lag4(ds_diejiashu))/
 (
   lag2(ds_diejiashu)+lag2(zs_zhangjiashu)+
   lag3(ds_diejiashu)+lag3(zs_zhangjiashu)+
   lag4(ds_diejiashu)+lag4(zs_zhangjiashu) + 0.1
 );
zd4 = (lag2(ds_diejiashu)+lag3(ds_diejiashu)+lag4(ds_diejiashu)+lag5(ds_diejiashu))/
 (
   lag2(ds_diejiashu)+lag2(zs_zhangjiashu)+
   lag3(ds_diejiashu)+lag3(zs_zhangjiashu)+
   lag4(ds_diejiashu)+lag4(zs_zhangjiashu)+
   lag5(ds_diejiashu)+lag5(zs_zhangjiashu) + 0.1
 );


proc logistic descending;
model zdvalue = zdb1 zdb2 zdb3 zdb4/rsq cl;
run;
proc logistic descending;
model zdvalue = zdb1/rsq cl;
run;

proc logistic descending;
model zdvalue = zd1 zd2 zd3 zd4/rsq cl;
run;
proc logistic descending;
model zdvalue = zd2/rsq cl;
run;
sas 结果:
                                                                             SAS 系统                                    2013年10月27日 星期日 上午08时39分45秒  76

                                                                      The LOGISTIC Procedure

                                                                        Model Information

                                                          Data Set                      WORK.DATAIN_1
                                                          Response Variable             zdvalue
                                                          Number of Response Levels     2
                                                          Model                         binary logit
                                                          Optimization Technique        Fisher's scoring


                                                              Number of Observations Read        3360
                                                              Number of Observations Used        3355


                                                                          Response Profile

                                                                 Ordered                      Total
                                                                   Value      zdvalue     Frequency

                                                                                         1755
                                                                                         1600

                                                                 Probability modeled is zdvalue=1.

NOTE: 5 observations were deleted due to missing values for the response or explanatory variables.


                                                                     Model Convergence Status

                                                          Convergence criterion (GCONV=1E-8) satisfied.



                                                                             SAS 系统                                    2013年10月27日 星期日 上午08时39分45秒  77

                                                                      The LOGISTIC Procedure

                                                                       Model Fit Statistics

                                                                                           Intercept
                                                                            Intercept            and
                                                              Criterion          Only     Covariates

                                                              AIC            4645.854       4649.341
                                                              SC             4651.972       4673.814
                                                              -2 Log L       4643.854       4641.341


                                                       R-Square    0.0007    Max-rescaled R-Square    0.0010


                                                              Testing Global Null Hypothesis: BETA=0

                                                      Test                 Chi-Square       DF     Pr > ChiSq

                                                      Likelihood Ratio         2.5133               0.4729
                                                      Score                    2.3407               0.5048
                                                      Wald                     2.0636               0.5593


NOTE: The following parameters have been set to 0, since the variables are a linear combination of other variables as shown.


                                                   zdb4 =  0.5147 * Intercept + 0.02464 * zdb2 + 0.46064 * zdb3



                                                                             SAS 系统                                    2013年10月27日 星期日 上午08时39分45秒  78

                                                                      The LOGISTIC Procedure

                                                             Analysis of Maximum Likelihood Estimates

                                                                               Standard          Wald
                                                Parameter    DF    Estimate       Error    Chi-Square    Pr > ChiSq

                                                Intercept         -255.5       754.5        0.1146        0.7349
                                                zdb1            0.000156    0.000121        1.6583        0.1978
                                                zdb2               113.1       357.0        0.1003        0.7514
                                                zdb3               142.5       947.3        0.0226        0.8804
                                                zdb4                                              .


                                                                       Odds Ratio Estimates

                                                                         Point          95% Wald
                                                            Effect    Estimate      Confidence Limits

                                                            zdb1         1.000       1.000       1.000
                                                            zdb2      >999.999      <0.001    >999.999
                                                            zdb3      >999.999      <0.001    >999.999


                                                   Association of Predicted Probabilities and Observed Responses

                                                        Percent Concordant       42.1    Somers' D    0.027
                                                        Percent Discordant       39.4    Gamma        0.033
                                                        Percent Tied             18.5    Tau-a        0.013
                                                        Pairs                 2808000              0.513


                                                             Wald Confidence Interval for Parameters

                                                         Parameter     Estimate     95% Confidence Limits

                                                         Intercept       -255.5      -1734.3       1223.4
                                                         zdb1          0.000156     -0.00008     0.000393
                                                         zdb2             113.1       -586.6        812.7

                                                                             SAS 系统                                    2013年10月27日 星期日 上午08时39分45秒  79

                                                                      The LOGISTIC Procedure

                                                             Wald Confidence Interval for Parameters

                                                         Parameter     Estimate     95% Confidence Limits

                                                         zdb3             142.5      -1714.1       1999.1

                                                                             SAS 系统                                    2013年10月27日 星期日 上午08时39分45秒  80

                                                                      The LOGISTIC Procedure

                                                                        Model Information

                                                          Data Set                      WORK.DATAIN_1
                                                          Response Variable             zdvalue
                                                          Number of Response Levels     2
                                                          Model                         binary logit
                                                          Optimization Technique        Fisher's scoring


                                                              Number of Observations Read        3360
                                                              Number of Observations Used        3358


                                                                          Response Profile

                                                                 Ordered                      Total
                                                                   Value      zdvalue     Frequency

                                                                                         1757
                                                                                         1601

                                                                 Probability modeled is zdvalue=1.

NOTE: 2 observations were deleted due to missing values for the response or explanatory variables.


                                                                     Model Convergence Status

                                                          Convergence criterion (GCONV=1E-8) satisfied.



                                                                             SAS 系统                                    2013年10月27日 星期日 上午08时39分45秒  81

                                                                      The LOGISTIC Procedure

                                                                       Model Fit Statistics

                                                                                           Intercept
                                                                            Intercept            and
                                                              Criterion          Only     Covariates

                                                              AIC            4649.927       4649.758
                                                              SC             4656.046       4661.996
                                                              -2 Log L       4647.927       4645.758


                                                       R-Square    0.0006    Max-rescaled R-Square    0.0009


                                                              Testing Global Null Hypothesis: BETA=0

                                                      Test                 Chi-Square       DF     Pr > ChiSq

                                                      Likelihood Ratio         2.1684               0.1409
                                                      Score                    1.9923               0.1581
                                                      Wald                     1.7106               0.1909


                                                             Analysis of Maximum Likelihood Estimates

                                                                               Standard          Wald
                                                Parameter    DF    Estimate       Error    Chi-Square    Pr > ChiSq

                                                Intercept         0.0899      0.0346        6.7432        0.0094
                                                zdb1            0.000159    0.000121        1.7106        0.1909



                                                                             SAS 系统                                    2013年10月27日 星期日 上午08时39分45秒  82

                                                                      The LOGISTIC Procedure

                                                                       Odds Ratio Estimates

                                                                         Point          95% Wald
                                                            Effect    Estimate      Confidence Limits

                                                            zdb1         1.000       1.000       1.000


                                                   Association of Predicted Probabilities and Observed Responses

                                                        Percent Concordant        3.0    Somers' D    0.002
                                                        Percent Discordant        2.8    Gamma        0.032
                                                        Percent Tied             94.2    Tau-a        0.001
                                                        Pairs                 2812957              0.501


                                                             Wald Confidence Interval for Parameters

                                                         Parameter     Estimate     95% Confidence Limits

                                                         Intercept       0.0899       0.0220       0.1577
                                                         zdb1          0.000159     -0.00008     0.000396

                                                                             SAS 系统                                    2013年10月27日 星期日 上午08时39分45秒  83

                                                                      The LOGISTIC Procedure

                                                                        Model Information

                                                          Data Set                      WORK.DATAIN_1
                                                          Response Variable             zdvalue
                                                          Number of Response Levels     2
                                                          Model                         binary logit
                                                          Optimization Technique        Fisher's scoring


                                                              Number of Observations Read        3360
                                                              Number of Observations Used        3355


                                                                          Response Profile

                                                                 Ordered                      Total
                                                                   Value      zdvalue     Frequency

                                                                                         1755
                                                                                         1600

                                                                 Probability modeled is zdvalue=1.

NOTE: 5 observations were deleted due to missing values for the response or explanatory variables.


                                                                     Model Convergence Status

                                                          Convergence criterion (GCONV=1E-8) satisfied.



                                                                             SAS 系统                                    2013年10月27日 星期日 上午08时39分45秒  84

                                                                      The LOGISTIC Procedure

                                                                       Model Fit Statistics

                                                                                           Intercept
                                                                            Intercept            and
                                                              Criterion          Only     Covariates

                                                              AIC            4645.854       4645.127
                                                              SC             4651.972       4675.718
                                                              -2 Log L       4643.854       4635.127


                                                       R-Square    0.0026    Max-rescaled R-Square    0.0035


                                                              Testing Global Null Hypothesis: BETA=0

                                                      Test                 Chi-Square       DF     Pr > ChiSq

                                                      Likelihood Ratio         8.7273               0.0683
                                                      Score                    8.7190               0.0685
                                                      Wald                     8.7034               0.0690


                                                             Analysis of Maximum Likelihood Estimates

                                                                               Standard          Wald
                                                Parameter    DF    Estimate       Error    Chi-Square    Pr > ChiSq

                                                Intercept        -0.0822      0.1263        0.4239        0.5150
                                                zd1              -0.3802      0.1777        4.5744        0.0325
                                                zd2               0.8311      0.3526        5.5550        0.0184
                                                zd3              -0.4706      0.5280        0.7945        0.3727
                                                zd4               0.3622      0.4895        0.5475        0.4594

                                                                             SAS 系统                                    2013年10月27日 星期日 上午08时39分45秒  85

                                                                      The LOGISTIC Procedure

                                                                       Odds Ratio Estimates

                                                                         Point          95% Wald
                                                            Effect    Estimate      Confidence Limits

                                                            zd1          0.684       0.483       0.969
                                                            zd2          2.296       1.150       4.582
                                                            zd3          0.625       0.222       1.758
                                                            zd4          1.436       0.550       3.749


                                                   Association of Predicted Probabilities and Observed Responses

                                                        Percent Concordant       51.9    Somers' D    0.059
                                                        Percent Discordant       46.0    Gamma        0.060
                                                        Percent Tied              2.1    Tau-a        0.030
                                                        Pairs                 2808000              0.530


                                                             Wald Confidence Interval for Parameters

                                                         Parameter     Estimate     95% Confidence Limits

                                                         Intercept      -0.0822      -0.3296       0.1653
                                                         zd1            -0.3802      -0.7285      -0.0318
                                                         zd2             0.8311       0.1400       1.5222
                                                         zd3            -0.4706      -1.5054       0.5642
                                                         zd4             0.3622      -0.5972       1.3216

                                                                             SAS 系统                                    2013年10月27日 星期日 上午08时39分45秒  86

                                                                      The LOGISTIC Procedure

                                                                        Model Information

                                                          Data Set                      WORK.DATAIN_1
                                                          Response Variable             zdvalue
                                                          Number of Response Levels     2
                                                          Model                         binary logit
                                                          Optimization Technique        Fisher's scoring


                                                              Number of Observations Read        3360
                                                              Number of Observations Used        3357


                                                                          Response Profile

                                                                 Ordered                      Total
                                                                   Value      zdvalue     Frequency

                                                                                         1757
                                                                                         1600

                                                                 Probability modeled is zdvalue=1.

NOTE: 3 observations were deleted due to missing values for the response or explanatory variables.


                                                                     Model Convergence Status

                                                          Convergence criterion (GCONV=1E-8) satisfied.



                                                                             SAS 系统                                    2013年10月27日 星期日 上午08时39分45秒  87

                                                                      The LOGISTIC Procedure

                                                                       Model Fit Statistics

                                                                                           Intercept
                                                                            Intercept            and
                                                              Criterion          Only     Covariates

                                                              AIC            4648.445       4646.828
                                                              SC             4654.564       4659.065
                                                              -2 Log L       4646.445       4642.828


                                                       R-Square    0.0011    Max-rescaled R-Square    0.0014


                                                              Testing Global Null Hypothesis: BETA=0

                                                      Test                 Chi-Square       DF     Pr > ChiSq

                                                      Likelihood Ratio         3.6174               0.0572
                                                      Score                    3.6156               0.0572
                                                      Wald                     3.6053               0.0576


                                                             Analysis of Maximum Likelihood Estimates

                                                                               Standard          Wald
                                                Parameter    DF    Estimate       Error    Chi-Square    Pr > ChiSq

                                                Intercept        -0.0716      0.0935        0.5866        0.4437
                                                zd2               0.3235      0.1704        3.6053        0.0576



                                                                             SAS 系统                                    2013年10月27日 星期日 上午08时39分45秒  88

                                                                      The LOGISTIC Procedure

                                                                       Odds Ratio Estimates

                                                                         Point          95% Wald
                                                            Effect    Estimate      Confidence Limits

                                                            zd2          1.382       0.990       1.930


                                                   Association of Predicted Probabilities and Observed Responses

                                                        Percent Concordant       50.4    Somers' D    0.041
                                                        Percent Discordant       46.3    Gamma        0.042
                                                        Percent Tied              3.3    Tau-a        0.020
                                                        Pairs                 2811200              0.520


                                                             Wald Confidence Interval for Parameters

                                                         Parameter     Estimate     95% Confidence Limits

                                                         Intercept      -0.0716      -0.2550       0.1117
                                                         zd2             0.3235      -0.0104       0.6574

0

阅读 收藏 喜欢 打印举报/Report
  

新浪BLOG意见反馈留言板 欢迎批评指正

新浪简介 | About Sina | 广告服务 | 联系我们 | 招聘信息 | 网站律师 | SINA English | 产品答疑

新浪公司 版权所有