首页 > 其他分享 >【2024.01.19】huginn爬取什么值得买的排行榜

【2024.01.19】huginn爬取什么值得买的排行榜

时间:2024-01-19 14:44:07浏览次数:24  
标签:xpath feed 2024.01 19 value 爬取 div main id

一句命令就行,主要是搭配RSS使用

docker run -d -p 3000:3000 ghcr.io/yhdsl/huginn:latest

这次主要是为了自定义爬取内容

筛选掉一些我用不上的,比如说奶粉啥的

{
  "schema_version": 1,
  "name": "什么值得买榜单",
  "description": "关键词里面自己修改",
  "source_url": false,
  "guid": "3038bbb808e3628363d6d97ea85b50d5",
  "tag_fg_color": "#ffffff",
  "tag_bg_color": "#f33535",
  "icon": "gear",
  "exported_at": "2024-01-19T06:25:19Z",
  "agents": [
    {
      "type": "Agents::TriggerAgent",
      "name": "什么值得买-筛选数据(关键词)",
      "disabled": false,
      "guid": "0635876a2d42933095f5463f1b0d95bc",
      "options": {
        "expected_receive_period_in_days": "2",
        "keep_event": "true",
        "rules": [
          {
            "type": "!regex",
            "value": "酒|咖啡|手机|收藏|窖|过期|水饺|抽纸|老抽|生抽|牛肉|鸭|话费|奶粉|生蚝|羊肉|海鲜|螺狮粉",
            "path": "title"
          }
        ]
      },
      "keep_events_for": 259200,
      "propagate_immediately": true
    },
    {
      "type": "Agents::WebsiteAgent",
      "name": "什么值得买-获取数据",
      "disabled": false,
      "guid": "c693a156bc391111f3bd6ff08fb1ced2",
      "options": {
        "expected_update_period_in_days": "2",
        "url": "{{ url }}",
        "type": "html",
        "mode": "all",
        "extract": {
          "title": {
            "xpath": "//*[@id=\"feed-main\"]/div[2]/div/div[1]/h1",
            "value": "normalize-space(.) "
          },
          "content": {
            "xpath": "//*[@id=\"feed-main\"]/div[3]/article/div[1]",
            "value": "normalize-space(.) "
          },
          "photo": {
            "xpath": "//*[@id=\"feed-main\"]/div[2]/a/img",
            "value": "@src"
          },
          "worth": {
            "xpath": "//*[@id=\"rating_worthy_num\"]",
            "value": "normalize-space(.) "
          },
          "worth_percent": {
            "xpath": "//*[@id=\"rating_all_num\"]",
            "value": "normalize-space(.) "
          },
          "comment": {
            "xpath": "//*[@id=\"content\"]/div/div[1]/div[3]/a/span",
            "value": "normalize-space(.) "
          },
          "not_worth": {
            "xpath": "//*[@id=\"rating_unworthy_num\"]",
            "value": "normalize-space(.) "
          },
          "link_to_buy": {
            "xpath": "//*[@id=\"feed-main\"]/div[2]/a",
            "value": "@href"
          }
        }
      },
      "schedule": "every_12h",
      "keep_events_for": 259200,
      "propagate_immediately": true
    },
    {
      "type": "Agents::WebsiteAgent",
      "name": "什么值得买-源数据",
      "disabled": false,
      "guid": "c7c2f2cd1fff9b75c390fb6d1a3f0d54",
      "options": {
        "expected_update_period_in_days": "2",
        "url": "https://faxian.smzdm.com/h2s0t0f0c0p1/",
        "type": "html",
        "mode": "on_change",
        "extract": {
          "url": {
            "xpath": "//*[@id=\"feed-main-list\"]/li/div/div[1]/a[1]",
            "value": "@href"
          },
          "title": {
            "xpath": "//*[@id=\"feed-main-list\"]/li/div/h5/a",
            "value": "normalize-space(.)"
          },
          "img": {
            "xpath": "//*[@id=\"feed-main-list\"]/li/div/div[1]/a[1]/img",
            "value": "@src"
          },
          "price": {
            "xpath": "//*[@id=\"feed-main-list\"]/li/div/div[2]",
            "value": "normalize-space(.)"
          },
          "link": {
            "xpath": "//*[@id=\"feed-main-list\"]/li/div/div[5]/div[2]/div/div/a[1]",
            "value": "@href"
          }
        }
      },
      "schedule": "every_10m",
      "keep_events_for": 259200,
      "propagate_immediately": true
    },
    {
      "type": "Agents::DataOutputAgent",
      "name": "什么值得买-RSS",
      "disabled": false,
      "guid": "cc405cd989e17453bb9f5aacb0ab7ab3",
      "options": {
        "secrets": [
          "smzdm"
        ],
        "expected_receive_period_in_days": 2,
        "template": {
          "title": "什么值得买-热门榜",
          "description": "{{content}}",
          "item": {
            "title": "{{title}}",
            "description": "<p>{{content}}</p><p>值{{worth}} 不值{{not_worth}} {{worth_percent}}</p><div class=\"img_description\"></div><p></p><img src=\"{{photo}}\" referrerpolicy=\"no-referrer\"><div class=\"img_description\"></div><p><a href=\"{{link_to_buy}}\">{{购买链接}}</a></p>",
            "link": "{{link_to_buy}}"
          },
          "link": "{{link_to_buy}}"
        },
        "ns_media": "true"
      },
      "propagate_immediately": true
    }
  ],
  "links": [
    {
      "source": 0,
      "receiver": 1
    },
    {
      "source": 1,
      "receiver": 3
    },
    {
      "source": 2,
      "receiver": 0
    }
  ],
  "control_links": [

  ]
}

标签:xpath,feed,2024.01,19,value,爬取,div,main,id
From: https://www.cnblogs.com/mokou/p/17974597

相关文章

  • 20240119
    卡常狗能不能死一死啊A.构造87bitset瞎搞#include<bits/stdc++.h>usingnamespacestd;#defineintlonglong#defineullunsignedlonglong#defineALL(a)(a).begin(),(a).end()#definepbpush_back#definemkmake_pair#definepiipair<int,int>#define......
  • Convert a number from decimal to binary【1月19日学习笔记】
    点击查看代码//Convertanumberfromdecimaltobinary#include<iostream>usingnamespacestd;structnode{ intdata; node*next;};node*A;voidinsert(intx){ node*temp=newnode; temp->data=x; temp->next=NULL; if(A==NULL){ A......
  • 1.19学习进度
    1.standalone是一个完整的分布式集群环境;standalone集群在进程上主要有三类进程:主节点master及昵称、从节点的worker进程、历史服务器哦historyserver(可选)2.4040:是一个运行的application在运行的过程中临时绑定的端口,用以查看当前任务的状态。4040被占用会顺延到4041、4042等。404......
  • Quick sort【1月19日学习笔记】
    点击查看代码//Quicksort#include<iostream>usingnamespacestd;intpartition(intA[],intstart,intend){ intpivot=A[end];//默认选取末尾为主元 intpIndex=start;//分区索引初始化 for(inti=start;i<end;i++){//从索引start开始扫描 if(A[i]<......
  • 2024-1-19事件绑定,input与hover事件
    目录事件绑定,input与hover事件事件绑定hover事件input事件事件绑定.on()方法注意:off()方法事件绑定,input与hover事件在jQ内很多中事件常用的事件有下面的click(function(){...})//绑定一个点击事件hover(function(){...})//悬停触发事件blur(function(){...})//失焦事件处理......
  • 2024.1.19日报
    本质:启动一个JVMProcess进程(一个进程里有多个线程),执行任务TaskLocal模式可以限制模拟Spark集群环境的线程数量,即Local[N]或Local[*]其中N代表可以使用N个线程,每个线程拥有一个cpucore,如果不指定N,则默认是1个线程(该线程有一个core)。通常Cpu有几个core,就指定几个线程,最大化利用......
  • 2024-1-19事件绑定,input与hover事件
    目录事件绑定,input与hover事件事件绑定hover事件input事件事件绑定,input与hover事件在jQ内很多中事件常用的事件有下面的click(function(){...})//绑定一个点击事件hover(function(){...})//悬停触发事件blur(function(){...})//失焦事件处理focus(function(){...})//焦点......
  • SpiderFlow爬虫平台漏洞利用分析(CVE-2024-0195)
    1.漏洞介绍SpiderFlow爬虫平台项目中spider-flow-web\src\main\java\org\spiderflow\controller\FunctionController.java文件的FunctionService.saveFunction函数调用了saveFunction函数,该调用了自定义函数validScript,该函数中用户能够控制 functionName、parameters 或 sc......
  • 【杂题乱写】2024.01 #2
    AtCoder-JOIOPEN2022_Aシーソー开局考虑二分,然后不会做,没想到不需要二分。以初始的重心为基准,记为\(mid\),可以对操作\(i\)次得到的所有可能区间求出重心在\(mid\)左侧且最靠右的以及在\(mid\)右侧且最靠左的两个区间,容易发现这两个区间左右端点都差\(1\),记靠左的一个......
  • 1978:扩号匹配问题C
    #include<stdio.h>intmain(){chars[101];while(scanf("%s",s)!=EOF){printf("%s\n",s);chartem[101];inta[101]={0};inttop=0;inti=0;for(;s[i]!='\0';i++){......