node获取网页内容（标题、摘要、图片）

时间：2023-02-15 14:24:04浏览次数：35

标签：node fs console log title 摘要 var html 网页内容

首先有node，然后是引入模块，这是必备

fs模块写入文件
path模块定义文件路径
request模块定义发送请求（requests可能要好点，各位如果需要改的自行百度）
cheerio模块定义内容加载成DOM（个人理解）
（不需要把内容转换成文本的可以不需要fs，psth模块）

这里我是需要网页中meta标签中的值

let fs = require('fs') // 读写文件
let path = require('path') // 定义文件路径
var request = require('request');
var cheerio = require('cheerio');
var options = {
    'url': 'https://mp.weixin.qq.com/s/P8q3CjZdH-GCtB2VHVy_qg',
    'headers': {
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36",
    }
};

request(options, function(error, response, body) {
    if (error) throw new Error(error);
    // console.log(body);
    // fs.writeFile(path.resolve(__dirname, 'index.html'), body, () => { //将请求得到的资源文件写入本地项目文件夹下的index.html（名字可改）中
    //     console.log("保存成功") // 数据爬取成功，输出“保存成功”
    // })
    getdom(body)
});

function getdom(html) {
    var $ = cheerio.load(html);
    //后面就像用jq一样获取页面元素就可以了
    // var a = $('meta').slice(last, last - 8)
    var obj = {}
    var desc = $('[name=description]').attr('content')
    var title = $('[property=og:title]').attr('content')
    var img = $('[property=og:image]').attr('content')
        // console.log(desc);
        // console.log(title);
        // console.log(img);
    obj = {
        'desc': desc,
        'title': title,
        'img': img
    }
    console.log(obj);
    fs.writeFile(path.resolve(__dirname, 'data.html'),
        html // 转换JSON格式
        , () => {
            console.log("保存成功")
        })
}

这是获取到网页的指定内容

标签：node,fs,console,log,title,摘要,var,html,网页内容
From： https://www.cnblogs.com/qinfengfumian/p/17122616.html

树莓派新手入门教程 - node下使用gpio
[b]安装Node[/b]为了运行Node脚本，树莓派必须安装Node，可以参考[url=http://thisdavej.com/beginners-guide-to-installing-node-js-on-a-raspberr......
CentOS7中通过二进制文件与配置环境变量的方式安装Node
场景Node官网下载地址：Download|Node.js下载64位二进制文件。注：博客：BADAO_LIUMANG_QIZHI的博客_霸道流氓气质_博客-C#,SpringBoot,架构之路领域博主关注......
CentOS7中通过npm的方式安装Node-RED
场景Node-RED简介与Windows上安装、启动和运行示例：Node-RED简介与Windows上安装、启动和运行示例_BADAO_LIUMANG_QIZHI的博客上面是在Windows上安装node-red,如果要在Ce......
CentOS7中后台运行Node-RED(关闭窗口也能访问服务)
场景CentOS7中通过npm的方式安装Node-RED：CentOS7中通过npm的方式安装Node-RED-霸道流氓CentOS7中通过npm的方式安装Node-RED_BADAO_LIUMANG_QIZHI的博客上面运......
CentOS7中使用PM2设置Node-RED开机自启动
场景CentOS7中后台运行Node-RED(关闭窗口也能访问服务)：CentOS7中后台运行Node-RED(关闭窗口也能访问服务)_BADAO_LIUMANG_QIZHI的博客在上面设置Node-RED后台启动后怎样......
CentOS7中使用编译github源码方式离线安装Node-RED
场景Windows上编译github源码方式运行Node-RED,以及离线迁移安装Node-RED：Windows上编译github源码方式运行Node-RED,以及离线迁移安装Node-RED_BADAO_LIUMANG_QIZHI的博......
Sam Altman的成功学｜升维指南（2023.01.29）阅读摘要（2023.02.10）
SamAltman的成功学｜升维指南（2023.01.29）-斯思的阅读摘要（2023.02.15）来源：微信公众号OneFILWSamAltman:斯坦福大学计算机系辍学，19岁成立位置服务提供商Loopt，被预付借记......
node版本更换出现The "from" argument must be of type string. Received undefined;E
使用--force或--legacy-peer-deps可解决这种情况。--force会无视冲突，并强制获取远端npm库资源，当有资源冲突时覆盖掉原先的版本。--legacy-peer-deps标志是在v7中引入的，......
node.js 定时任务/重复任务
文章目录git参数定时任务每天0点执行一次数据统计任务每半个小时执行一次数据统计任务重复任务指定执行时间氛围......
node.js 发送邮件
constnodemailer=require("nodemailer");lettransporter=null;asyncfunctioncreateMailServer(){transporter=nodemailer.createTransport({host:"......

node获取网页内容（标题、摘要、图片）

相关文章

赞助商

阅读排行