scrapy-命令行持久化数据到本地

警告
本文最后更新于 2020-11-23 17:19,文中内容可能已过时。
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
import scrapy


class FirstSpider(scrapy.Spider):
    name = 'first'

    # allowed_domains = ['www.soulchild.cn']
    start_urls = ['http://www.qiushibaike.com/text']

    def parse(self, response):
        div_list = response.xpath('//div[contains(@class,"article") and contains(@class,"mb15")]')
        all_data = []
        for i in div_list:
            author = i.xpath('./div[@class="author clearfix"]//h2/text()')[0].get()
            content = ''.join(i.xpath('.//div[@class="content"]/span//text()').getall())
            res = {
                "author": author,
                "content": content,
            }
            all_data.append(res)
        return all_data

将parse方法的返回值输出到本地csv文件中

1
scrapy crawl first -o qs.csv

支持的格式: 'json', 'jsonlines', 'jl', 'csv', 'xml', 'marshal', 'pickle'

请我喝杯水
SoulChild 微信号 微信号
SoulChild 微信打赏 微信打赏
0%