English Version: https://today2tmr.com/en/2017/07/15/python-spider-study-note-week-twounit-fourintroduction-to-beautiful-soup
Beautiful Soup is library of the third party to analyze HTML and XML.
Beautiful Soup库的安装
- 美丽汤
- http://www.crummy.com/software/BeautifulSoup
pip install beautifulsoup4
- 打开网页 http://python123.io/ws/demo.html
- 源代码:
- 通过浏览器
123456<html><head><title>This is a python demo page</title></head><body><p class="title"><b>The demo python introduces several python courses.</b></p><p class="course">Python is a wonderful general-purpose programming language. You can learn Python from novice to professional by tracking the following courses:<a href="http://www.icourse163.org/course/BIT-268001" class="py1" id="link1">Basic Python</a> and <a href="http://www.icourse163.org/course/BIT-1001870001" class="py2" id="link2">Advanced Python</a>.</p></body></html>
- 通过Requests库
12345>>> import requests>>> r = requests.get("http://python123.io/ws/demo.html")>>> r.text'<html><head><title>This is a python demo page</title></head>\r\n<body>\r\n<p class="title"><b>The demo python introduces several python courses.</b></p>\r\n<p class="course">Python is a wonderful general-purpose programming language. You can learn Python from novice to professional by tracking the following courses:\r\n<a href="http://www.icourse163.org/course/BIT-268001" class="py1" id="link1">Basic Python</a> and <a href="http://www.icourse163.org/course/BIT-1001870001" class="py2" id="link2">Advanced Python</a>.</p>\r\n</body></html>'>>> demo = r.text
- 通过浏览器
Continue reading “Python 爬虫学习笔记 – 第二周/单元4/Beautiful Soup库入门”