目标
- 抓取糗事百科上的段子
- 实现每按一次回车显示一个段子
- 输入想要看的页数,按 'Q' 或者 'q' 退出
实现思路
- 目标网址:
- 使用requests抓取页面
- 使用bs4模块解析页面,
代码内容:
1 import requests 2 from bs4 import BeautifulSoup 3 4 5 def get_content(pages): # get jokes list 6 headers = { 'user_agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) Apple\ 7 WebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.87 Safari/537.36'} # 用户代理 8 content_list = [] 9 for page in range(1, pages+1): # 想看多少页10 url = 'http://www.qiushibaike.com/text/page/' + str(page) + '/?s=4928950'11 response = requests.get(url, headers=headers) # 获取网页内容12 html = response.text13 soup = BeautifulSoup(html, 'html5lib') # 解析网页内容14 jokes = soup.find_all('div', class_='content')15 for each in jokes:16 each_joke = each.get_text()17 joke = each_joke.replace('\n', '') # 将换行符替换18 content_list.append(joke)19 return content_list # 返回段子列表20 21 22 if __name__ == "__main__":23 number = int(input("How many pages do you want to read?\nIf you want to quit, just press 'q'.\n")) # 输入想要看的页数24 print() # 换行,便于阅读25 for paragraph in get_content(number):26 print(paragraph)27 user_input = input()28 if user_input == 'q': # 按'q'退出29 break
结果展示:
参考:
静谧的爬虫教程:
爬取段子参考: