这是我的代码,代码基本雏形是在本网站的一位大佬的帖子里复制过来的,经过更改爬取的网页基本信息之后,发现只能爬取一个数据,真的不知道问题出现在哪里了,本人基础很薄弱很菜鸡,但还是想搞清楚问题出现在哪里,就上来求助了
import requests from lxml import etree import csv # from pip._internal import index # 发送请求 url = 'https://cs.lianjia.com/zufang/#contentList' headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64)' ' AppleWebKit/537.36 (KHTML, like Gecko) Chrome' '/131.0.0.0 Safari/537.36 Edg/131.0.0.0'} response = requests.get(url=url, headers=headers) # 获取数据 html_content = response.text et = etree.HTML(html_content) doc = et.xpath('//*[@id="content"]/div[1]/div[1]/div[1]/div') # 解析数据 list_1 = [] for li in doc: title = li.xpath('.//p[@class="content__list--item--title"]/a/text()')[0] price = li.xpath('.//span[@class="content__list--item-price"]/em/text()')[0] position = li.xpath('.//p[@class="content__list--item--des"]/a/text()') peculiarity = li.xpath('.//p[@class="content__list--item--bottom oneline"]/i/text()') area = li.xpath('.//p[@class="content__list--item--des"]/text()[5]') layout = li.xpath('.//p[@class="content__list--item--des"]/text()[7]') if position: position = '-'.join(position) list_1.append([title, price, position, peculiarity, area, layout]) # 保存数据 定义列名 headers = ['title', 'price', 'position', 'peculiarity', 'area', 'layout'] for p in list_1: with open('data31.csv', mode='a', encoding='utf-8', newline='') as file: csv_writer = csv.writer(file) csv_writer.writerow(headers) # 写入列名 csv_writer.writerow(p) # 检验list1的长度以及打印出列表内容 # print('List length:', len(list_1)) # print('List contents:', list_1)
这是爬出来的csv文件
如果 按照源代码的结构应该是这样的,但是还是只爬出一个数据
标签:xpath,毕设,python,text,list,li,content,--,发帖子 From: https://blog.csdn.net/woshiFUPOa/article/details/144436945