python爬虫的正确方法：python爬虫简单的添加代理进行访问

小君 2023-01-30 22:34:10 102

python爬虫的正确方法：python爬虫简单的添加代理进行访问记得关注小编后私信【学习】领取Python学习教程哦。

在使用python对网页进行多次快速爬取的时候访问次数过于频繁服务器不会考虑User-Agent的信息会直接把你视为爬虫从而过滤掉拒绝你的访问在这种时候就需要设置代理我们可以给proxies属性设置一个代理的IP地址代码如下:

1 import requests 2 from lxml import etree 3 url = "https://www.ip.cn" 4 headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (Khtml like Gecko) Chrome/70.0.3538.102 Safari/537.36 OPR/57.0.3098.116" } 5 pro = { 6 # 'https': 'https://118.122.92.252:37901' #四川省成都市电信 7 'https': 'https://27.17.45.90:43411' #湖北省武汉市电信 8 } 9 try: 10 response = requests.get(url headers=headers proxies=pro) 11 HTML_str = response.content.decode() 12 # print(html_str) 13 html = etree.HTML(html_str) 14 message = html.xpath("//div[@class='well']//p/text()") 15 ip = html.xpath("//div[@class='well']//p/code/text()") 16 eng = html.xpath("//div[@class='well']/p/text()") 17 print(message[0] ip[0]) 18 print(message[1] ip[1]) 19 print(eng[2]) 20 except requests.exceptions.ProxyError as e: 21 print("当前代理异常") 22 except: 23 print("当前请求异常")

在上面的代码中调用requests库对一个IP地址查询网页进行访问随后使用lxml库的xpath对网页进行分析提取返回用户访问此网页时自己的IP地址如果代理设置成功则会返回你的信息和IP地址如下:

python爬虫的正确方法：python爬虫简单的添加代理进行访问(1)