问题:如何方便抓取http://www.gtfund.com/webapps/etf/etf_gzsh.jsp?fundcode=511010中所有历史数据?
解决:查看源文件,发现html节点,文件下载中有href信息。So:
%%http://www.gtfund.com/webapps/etf/etf_gzsh.jsp?fundcode=511010
gRefresh=0;%是否网上提取最新数据
if
gRefresh==1
%利用wind的matlab接口,查询511010的历史交易日期
w=windmatlab;
[w_tdays_data,w_tdays_codes,w_tdays_fields,w_tdays_times,w_tdays_errorid,w_tdays_reqid]=w.tdays('2013-03-25',datestr(today,'yyyy-mm-dd'));
dayLength=length(w_tdays_times);
%组合历史交易日的申购赎回清单之url
urlStartStr=repmat(['http://www.gtfund.com/upload/etf/pcf511010/2013/511010'],dayLength,1);
urlInterStr=datestr(w_tdays_times,'mmdd');
urlEndStr=repmat(['.etf'],dayLength,1);
urlStr=
strcat( urlStartStr,urlInterStr,urlEndStr);
%利用正则表达式,查询所需信息。此处html格式比较统一且简单,所以不求甚解地这样处理,不当之处,将来有能力再改吧
tPre1Data=zeros(10,4);
tData=zeros(10,2);
for
i=1:dayLength
try
str=urlread(urlStr(i,:))
catch
fprintf('urlread读取%s上的数据失败!',urlStr(i,:))
continue
end
[mat,tok]=regexpi(str,'TradingDay=(\d{8})','match','tokens')
temp1=tok{1};
tData(i,1)=str2num(temp1{1});
%TradingDay
temp=tok{2};
tPre1Data(i,1)=str2num(temp{1});
%PreTradingDay
[mat,tok]=regexpi(str,'CashComponent=(\-?\d+.\d{2})','match','tokens')
temp1=tok{1};
tData(i,2)=str2num(temp1{1});
%EstimateCashComponent
temp=tok{2};
tPre1Data(i,2)=str2num(temp{1});
�shComponent
[mat,tok]=regexpi(str,'NAVperCU=(\d+.\d{2})','match','tokens')
temp=tok{1};
tPre1Data(i,3)=str2num(temp{1});
%NAVperCU
[mat,tok]=regexpi(str,'NAV=(\d+.\d{4})','match','tokens')
temp=tok{1};
tPre1Data(i,4)=str2num(temp{1});
%NAV
end
tData=sortrows(tData,-1)
tPre1Data=sortrows(tPre1Data,-1)
save
dayData.mat tData
save
preDayData.mat tPre1Data
else
load
dayData.mat
load
preDayData.mat
end
hf=figure('units','normalize','menubar','none','position',[0.2
0.2 .6 .4])
ht1=uitable('parent',hf,'units','normalize','position',[0 .01 .3
.98],'data',tData,'columnname',{'日期','预估现金部分(单位:元)'})
ht2=uitable('parent',hf,'units','normalize','position',[0.301 .01
.65
.98],'data',tPre1Data,'columnname',{'日期','现金差额(单位:元)','最小申购赎回单位资产净值(单位:元)','基金份额净值(单位:元)'})
参考文献:
Matlab 正则表达式零基础起步教程:http://blog.csdn.net/mathsoperator/article/details/7232334
加载中,请稍候......