[Python] 网络爬虫

时间：2020-01-29 22:49:03 收藏：0 阅读：85

头元素信息：

<title>：文档标题，只有一个

<base>：默认链接

<style>：样式

获取网页：requests包

http请求方式：

get：90%以上

post

import requests
r = requests.get(url = ‘https://www.baidu.com/s‘,params={‘wd‘:‘金正恩元帅‘},timeout=0.1)
#返回值
print(r)
print(type(r))
#网址
print(r.url)
#网页编码
print(r.encoding)
#网页源码
print(r.text)
#头域，返回字典
print(r.headers)

源码解析：BeautifulSoup包

分析文档树

子节点：.content

原文：https://www.cnblogs.com/cxc1357/p/10584752.html