内容

访问古诗文网站名句主页（https://so.gushiwen.cn/mingjus/）
爬取里面的名句和出处（包括链接）保存到一个文本文件poems.txt中去。每个名句占用一行，内容格式如下：

编号(从1开始，占3位做对齐)：名句--出处(全诗链接)
空两格（诗句的译文注释和赏析）

环境准备

确保已经安装了以下Python库：

requests
beautifulsoup4

可以使用以下命令安装：

1	pip install requests beautifulsoup4

代码

from bs4 import BeautifulSoup as BS
import requests

# 变量
rank = 0
temp_line2 = ''
fs = open("诗词.txt", 'w', encoding='utf-8')

# 获取名句页面内容
soup = BS(requests.get("https://so.gushiwen.cn/mingjus/").content.decode("utf-8"), "lxml")
content = soup.select('body > div.main3 > div.left > div.sons > div.cont')

for i in content:
    # 诗词出处、网址
    str = i.find_all('a')
    url = 'https://so.gushiwen.cn' + i.find('a')['href']
    temp_soup = BS(requests.get(url).content.decode("utf-8"), "lxml")
    
    # 诗词翻译内容
    temp_content = temp_soup.select('#sonsyuanwen > div.cont > div.contson')
    for x in temp_content:
        temp_line1 = x.text.split('\n')
        for z in temp_line1:
            temp_line2 += "  " + z + '\n'
    
    line2 = temp_line2[:-1]  # 去掉最后一个换行符
    temp_line2 = ''
    poem = str[0].text
    if len(str) == 1:
        poet = "没有出处"
    else:
        poet = "出自" + str[1].text
    
    rank += 1
    line1 = f"{rank}: {poem}--{poet}({url})"
    
    fs.write('{0:>3}'.format(line1) + '\n')
    fs.write(line2)

fs.close()

爬虫：古诗爬取

内容

环境准备

代码

结果展示