批量获取www.kegg.jp的数据

时间：2023-11-03 21:24:20浏览次数：35

标签：www url jp kegg BeautifulSoup content print td response

代码如下：

import requests
from bs4 import BeautifulSoup
import re

def visit2(url):
    response = requests.get(url)
    # 检查响应是否成功
    if response.status_code == 200:
        # 使用BeautifulSoup解析HTML
        soup = BeautifulSoup(response.text, "html.parser")
        
        # 查找<pre></pre>标签并提取内容
        pre_tag = soup.find("pre")
        if pre_tag:
            content = pre_tag.get_text()
            # 向ans.txt文件中追加content。如果没有这个文件则创建
            with open("ans.txt", "a", encoding="utf-8") as file:
                file.write(content)
        else:
            print("未找到<pre></pre>标签")
    else:
        print("请求失败，状态码:", response.status_code)

def visit1(url, content):
    response = requests.get(url)
    print("正在下载,url:", url)

    # 检查响应是否成功
    if response.status_code == 200:
        # 使用BeautifulSoup解析HTML
        soup = BeautifulSoup(response.text, "html.parser")
        content = response.text
        # 使用正则表达式提取正确的条件
        match = re.search(r"onclick=\"location.href='(.*?)';return false;\">AA seq</button>", content)
        if match:
            location = "https://www.kegg.jp" + match.group(1)
            visit2(location)
        else:
            print("未找到匹配的文本")
    else:
        print("请求失败，状态码:", response.status_code)

if __name__ == '__main__':
    # 发送HTTP请求获取页面内容
    url = "https://www.kegg.jp/entry/K01068"
    response = requests.get(url)

    # 检查响应是否成功
    if response.status_code == 200:
        # 使用BeautifulSoup解析HTML
        soup = BeautifulSoup(response.text, "html.parser")
        
        # 找到所有td元素，其中class="td41 defd"
        td_elems = soup.find_all("td", class_="td41 defd")
        
        if len(td_elems) >= 4:
            # 提取第四个匹配的元素的HTML内容
            third_td_elem = td_elems[3]
            content = third_td_elem.prettify()
            
            # 使用正则表达式提取链接
            links = re.findall(r'a href="(.*?)"', content)
            for link in links:
                if "javascript:void(0)" in link:
                    continue
                visit1("https://www.kegg.jp" + link, content)
        else:
            print("未找到足够的匹配元素")
    else:
        print("请求失败，状态码:", response.status_code)

标签：www,url,jp,kegg,BeautifulSoup,content,print,td,response
From： https://www.cnblogs.com/railgunRG/p/17808516.html

使用Spring Data JPA,您可以通过定义接口，面来避免Object[]以更优雅的格式返回数据，sql
使用SpringDataJPA,您可以通过定义接口，面来避免Object[]以更优雅的格式返回数据，sql的返回值和接口的属性名一致。jap会根据sql返回值映射到接口对应属性。cas*_*lin6根据定义,JPA将返回Object[]查询返回带有投影的列表的列表,即来自实体(或多个实体)的一组字段.使用......
MicroSIP-3.21.3+pjproject-2.13.1+ opus-1.3.1+VS2019
本文记录了我通过VS2019编译MicroSIP-3.21.3开源项目的过程。Microsip:MicroSIPsourcecodepjproject:DownloadPJSIP-OpenSourceSIP,Media,andNATTraversallibraryopus:Downloads–OpusCodec(opus-codec.org)下载并解压后如图：用vs2019将microsip的平......
PCB封装命名规则，本文转载https://www.xjx100.cn/news/432127.html?action=onClick
SO、SOP、SOIC、MSOP、TSSOP、TSOP、VSSOP、SSOP、SOJ封装详解 1. 简要信息如下： 2.SOP和SOIC的规格多是类似的，现在大多数厂商基本都采用的是SOIC的描述：SOIC8有窄体150mil的（外形封装宽度，不含管脚，下同),管脚间距是1.27mm，如下：有宽体的208mil的，管脚间距是1.27mm，如下：......
JAVA SWING之JFrame和JPanel布局
初学JAVA的时候学习过SWING，每次写程序就直接复制Jframe和Jpanel设置，再调一下大小（不知道有没有人跟我一样），到现在也不清楚它们有什么关系，才回顾学习。Swing虽然是很老的技术了，但也有很多工具是Swing写的例如JetBrains系列，最常用的IDEA。开发一下小工具还是不错的。掌握整体布局后，再......
https://www.modb.pro/db/1717179181560324096 --转载 Oracle 批量更新（BULK）优化技巧
面对一个需要更新大量数据的任务，我平时的处理方法是通过循环，每N行提交来完成这个任务。这样做的两个主要原因：1、频繁地提交大量小事务比处理和提交一个大事务更快，也更高效2、没有足够的UNDO空间今天在学到了一种新的解决思路，在此记录一下方便后面使用。假设我们有一个表T,......
jpa实现查询
jpa实现查询一、通过Specification查询importcom.google.common.collect.Lists;importcom.meritdata.cloud.dao.SyslogRepository;importcom.meritdata.cloud.entity.LogEntity;importorg.springframework.beans.factory.annotation.Autowired;importorg.springfr......
「Java开发指南」如何在MyEclipse中使用JPA和Spring管理事务？（二）
本教程中介绍一些基于JPA/spring的特性，重点介绍JPA-Spring集成以及如何利用这些功能。您将学习如何：为JPA和Spring设置一个项目逆向工程数据库表来生成实体实现创建、检索、编辑和删除功能启用容器管理的事务在上文中，我们为大家介绍了如何用JPA和SpringFacets创建一个Java......
[DataFocus Cloud 对比 QuickBI](https://www.datafocus.ai/comparison/quick-bi.html
产品对比对比Tableau对比PowerBI对比QuickSight对比Qlik对比ThoughtSpot对比FineBI对比SmartBI对比永洪BI对比QuickBI对比百度Sugar......
实体类使用临时字段 myBatis jpa Hibernate
Mybatis-Plus 使用数据库不存在的字段，可在实体类的属性加上@TableField注解** @TableField(exist=false)**jpaHibernate** @Transient**......
Spring Data JPA : 查-分页排序
1.分页查询 pageNumber是从0开始，pageNumber=0,pageSize=3就是获取前3条参考创建分页Pageable变量创建Pageable对象，再查询importjava.util.List;importorg.springframework.beans.factory.annotation.Autowired;importorg.springframework.data.domain.Page;import......

批量获取www.kegg.jp的数据

相关文章

赞助商

阅读排行