我搜集了国内10几个电影网站的数据,里面近几十W条记录,用文本没法存,mongodb学习成本非常低,安装、下载、运行起来不会花你5分钟时间。
# -*- coding: utf-8 -*-# by awakenjoys. my site: /list/1_0_-1_-1_1_0_0_20_0_-1_0.html m_url = str(url[1]).replace('0_20_0_-1_0.html', '') movie_url = "%s%d_20_0_-1_0.html" % (m_url, x) print movie_url movie_html = gethtml(movie_url.encode('utf-8')) #print movie_html getmovielist(movie_html) time.sleep(0.1)