【原创】用python批量抓取上市公司财务数据（二）_马步水

个人资料

微博

正文字体大小：大中小

【原创】用python批量抓取上市公司财务数据（二）

(2016-11-26 08:47:51)

分类： Python

另一个主要抓取函数在下面，要分析数据在网页上的呈现方式进而选择合适的抓取方式。这个是抓取主要财务指标中的资产负债率的数据。

#抓取网页数据

def Get_Main_Cell(url,code,count):

headers = {"User-Agent":"Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.1.6) Gecko/20091201 Firefox/3.5.6"}

req = urllib2.Request(url, headers = headers)

try:

content = urllib2.urlopen(req).read()

except:

return

soup = BeautifulSoup(content)

#先找到有资产负债率的那张表（网页上有多个表）

tables = soup.findAll("table",{"class":"table_bg001 border_box fund_analys"})

for table in tables:

#此处替换中文可修改成获取任意财务数据

if table.find('td',text=re.compile(u'资产负债率')):

for row in table.findAll("tr"):

cells = row.findAll("td")

if len(cells) > 0:#

j = 1

lencell = len(cells)

years=lencell-1#统计财务报表的年数

if cells[0].text.find(u'资产负债率')>=0:

#找到有资产负债率的tr行，然后把td中的数字抓取出来写入excel文件。

print cells[0].text

while j < lencell:

#print cells[j].text

ws.write(count, j 1, cells[j].text)

j=j 1

return years

阅读┊ 收藏 ┊ 喜欢 ▼ ┊打印┊举报/Report