python爬虫之获取页面script里面的内容
时间:2019-02-11 19:55:15
收藏:0
阅读:1297
这是网页上的script 我要获取的是00914这个数字 直接使用正则表达式即可
运行结果:
源码:
import re from bs4 import BeautifulSoup from urllib.request import urlopen url = "你要解析的网页URL" html = urlopen(url).read() soup = BeautifulSoup(html,"html.parser") titles = soup.select("body script") # CSS 选择器 i = 1 for title in titles: if i == 3: #print(title.get_text())# 标签体、标签属性 str=title.get_text() break if i == 2: i = 3 if i == 1: i = 2 print(str) str1 = "\"\"\""+"<script>"+str+"</script>"+"\"\"\"" soup = BeautifulSoup(str1, "html.parser") pattern = re.compile(r"var _url = ‘(.*?)‘;$", re.MULTILINE | re.DOTALL) script = soup.find("script", text=pattern) #print (pattern.search(script.text).string) s = pattern.search(script.text).string print (s.split(‘\‘‘)[11])
原文:https://www.cnblogs.com/mm20/p/10362963.html
评论(0)