西刺代理是一个国内IP代理,由于代理倒闭了,所以我就把原来的代码放出来供大家学习吧。
镜像地址:https:///nn/{}".format(item) queue.put(url) print("[+] 生成爬行链接 {}".format(url)) for item in range(count): threads.append(AgentSpider(queue)) for t in threads: t.start() for t in threads: t.join()# 转换函数def ConversionAgentIP(FileName): result = [] fp = open(FileName,"r") data = fp.readlines() for item in data: dic = {} read_line = eval(item.replace("\n","")) Protocol = read_line[2].lower() if Protocol == "http": dic[Protocol] = "http://" + read_line[0] + ":" + read_line[1] else: dic[Protocol] = "https://" + read_line[0] + ":" + read_line[1] result.append(dic) return resultif __name__ == "__main__": parser = argparse.ArgumentParser() parser.add_argument("-p","--page",dest="page",help="指定爬行多少页") parser.add_argument("-f","--file",dest="file",help="将爬取到的结果转化为代理格式 SpiderAddr.json") args = parser.parse_args() if args.page: StartThread(int(args.page)) elif args.file: dic = ConversionAgentIP(args.file) for item in dic: print(item) else: parser.print_help()
以上就是python多线程爬取西刺代理的示例代码的详细内容,更多关于python多线程爬取代理的资料请关注其它相关文章!