python爬取房产信息（仅供学习使用）

时间：2022-11-23 17:35:27浏览次数：48

标签：房产信息 item python text resp self 爬取 url find

import requests
from bs4 import BeautifulSoup
import time
import openpyxl


def write_mysql(lst):
    wk = openpyxl.Workbook()
    sheet = wk.active
    for i in lst:
        sheet.append(i)
    wk.save("1-房产交易信息.xlsx")
    # print('保存完成')


def parser_content(resp):
    html = resp.text
    bs = BeautifulSoup(html, 'html.parser')
    ul = bs.find('ul', class_='sellListContent')
    li = ul.find_all('li')
    # print(li, type(li))
    lst = []
    for item in li:
        title = item.find('div', class_='title').text
        positionInfo = item.find('div', class_='positionInfo').text
        houseInfo = item.find('div', class_='houseInfo').text
        followInfo = item.find('div', class_='followInfo').text
        totalPrice = item.find('div', class_='totalPrice totalPrice2').text
        unitPrice = item.find('div', class_='unitPrice').text
        lst.append([title, positionInfo, houseInfo, followInfo, totalPrice, unitPrice])
        # break
    # print(lst)
    return lst


class LianJiaSpider():

    def __init__(self):
        self.url = "https://bj.lianjia.com/ershoufang/pg{0}/"
        self.headers = {
            "User - Agent": "Mozilla / 5.0(Windows NT 10.0;Win64;x64) AppleWebKit / 537.36(KABUL, like Gecko) Chrome "
                            "/ 84.0.4147.125 Safari / 537.36 "
        }

    def send_request(self, url):
        resp = requests.get(url=url, headers=self.headers)
        if resp:
            return resp

    def start(self):
        for i in range(1, 11):
            full_url = self.url.format(i)
            # print(full_url)
            resp = self.send_request(full_url)
            # print(resp.text)
            data_list = parser_content(resp)
            write_mysql(data_list)
            time.sleep(5)


if __name__ == '__main__':
    lian_jia = LianJiaSpider()
    lian_jia.start()

标签：房产信息,item,python,text,resp,self,爬取,url,find
From： https://blog.51cto.com/u_14012524/5881524

python+OpenCv 图像噪声（椒盐噪声、高斯噪声）
由于图像采集、处理、传输等过程不可避免的会受到噪声的污染，妨碍人们对图像理解及分析处理。常见的图像噪声有高斯噪声、椒盐噪声等。一、椒盐噪声椒盐噪声也称为脉冲......
python——定时执行，间隔5s执行一次，blockingScheduler
本来一开始使用的time.sleep(5)，但是执行时间长了之后，会与实际时间有偏差，因为没有计算程序本身运行的时间。解决：使用blockingScheduler 将send()这个方法，每5s执......
python 使用 requests，requests_toolbelt上传图片文件，报错：AttributeError: 'int' objec
问题在使用pythonrequests_toolbelt库上传excel文件的过程中，几类问题报错1MultipartEncoder支持中文文件名称上传开始遇到的报错以为是中文文件名称不支持，查到的......
python中的集合
集合，简称集。由任意个元素构成的集体。高级语言都实现了这个非常重要的数据结构类型。Python中，它是可变的、无序的、不重复的元素的集合set()->newemptysetobjects......
python中利用pipreqs查询并安装项目所依赖的所有包
pipreqs的安装pipinstallpipreqs-ihttp://pypi.douban.com/simple--trusted-hostpypi.douban.com在terminal中，如要查看test文件夹下脚本所依赖包，则输入如下命令：pi......
Python-import xx 和 from xx import xx 的区别
1.importxx：导入模块，在使用的时候需要“模块.函数”来使用例如：1importmath2math.sqr(5) 2.fromxximportxx和fromxximport*这两个本质没有区......
安装Python后你的电脑多了哪些东西？
Python安装完成之后，我们的计算机都多出了哪些东西？我们在计算机搜索框中搜索“python”，会显示出python相关的程序。可以看到，我们的计算机会多出4个应用程序，如下：接......
Python中的切片
线性结构特征可迭代for...in 有长度，通过len(x)获取，容器通过整数下标可以访问元素。正索引、负索引可以切片切片sequence[start:stop]sequence......
Python中的元组 Tuple
一个有序的元素组成的集合使用小括号()表示元组是不可变对象新建t1=()#空元组t2=(1,)#必须有这个逗号t3=(1,)*5t4=(1,2,3)t5=1,'a't6=(......
scrapy爬取后中文乱码,解决word转为html 时cp1252编码问题
解决思路1、循环暴力寻找编码，但是不如思路3defparse(self,response):print(response.text[:100])body=response.body#直接是bytes,response.tex......

python爬取房产信息（仅供学习使用）

相关文章

赞助商

阅读排行