首页 > 其他分享 >utils

utils

时间:2023-06-05 11:24:56浏览次数:25  
标签:NAME utils echo item policy import data

1.kills.sh 

#!/bin/sh

NAME=$1   # $1 运行时输入参数  为文件名称
if [ -z "$NAME" ]; then
 echo "STRING is empty"
 NAME="aa"
fi
echo $NAME
ID=`ps -ef | grep "$NAME" | grep -v "$0" | grep -v "grep" | awk '{print $2}'`
echo $ID
echo "---------------"
for id in $ID
do
kill -9 $id
echo "killed $id"
done
echo "---------------"

2.run_py.sh

#!/bin/sh
NAME=$1  # $1 运行时输入参数  为文件名称
NAME=${NAME%%.*}
if [ -z "$NAME" ]; then
 echo "STRING is empty"
 NAME="aa"
fi
echo $NAME
ID=`ps -ef | grep "$NAME" | grep -v "$0" | grep -v "grep" | awk '{print $2}'`
echo $ID
echo "---------------"
for id in $ID
do
kill -9 $id
echo "killed $id"
done
echo "---------------"
 
sleep 1
 
current_dir=$(cd $(dirname $0); pwd)
 
echo $current_dir
if [ ! -d "$current_dir/logs" ]; then
    echo "$current_dir/logs does not exist"
    `mkdir $current_dir/logs`
fi
 
echo "---------------"
echo "nohup /usr/bin/python3.6 $current_dir/$NAME.py > $current_dir/logs/$NAME.log 2>&1 &"
echo "---------------"
echo "tail -f $current_dir/logs/$NAME.log"
 
`nohup /usr/bin/python3.6 $current_dir/$NAME.py > $current_dir/logs/$NAME.log 2>&1 &`
 
echo "启动成功"

3.run_zc.py

# -*- coding: UTF-8 -*-
import logging
import os
import platform
import subprocess
import time

logging.basicConfig(
    level=logging.INFO,  # 定义输出到文件的log级别,大于此级别的都被输出
    format='%(asctime)s  %(filename)s  %(levelname)s : %(message)s',  # 定义输出log的格式
    datefmt='%Y-%m-%d %H:%M:%S',  # 时间
    filename=os.path.splitext(os.path.basename(__file__))[0],  # log文件名
    filemode='a',  # 写入模式“w”或“a”
)
console = logging.StreamHandler()
console.setLevel(logging.INFO)
formatter = logging.Formatter('%(asctime)s  %(filename)s  %(levelname)s : %(message)s')
console.setFormatter(formatter)
logging.getLogger('').addHandler(console)


def run_main():
    system = platform.system()
    path = os.path.abspath('.')
    if 'indo' in system:
        spider_path = '{}\yational_policy.py'.format(path)
        res = subprocess.call("python {}".format(spider_path), shell=True)
    else:
        spider_path = '{}/yational_policy.py'.format(path)
        res = subprocess.call("python3.6 {}".format(spider_path), shell=True)
    logger.info(spider_path)
    logger.info(res)


if __name__ == '__main__':
    for i in range(100000000000):
        run_main()
        time.sleep(60)

# nohup /usr/bin/python3.6 /home/pachong/yoyo/work/national_policy/national_policy/run_zc.py > /home/pachong/yoyo/work/national_policy/logs/run_zc.log 2>&1 &

4.多线程简单爬虫

import concurrent.futures
import threading
import requests
import pymysql
from yscredit_tools.MySQL import insert_update_data_mysql, select_data_mysql, insert_data_mysql
from yscredit_tools.utils import clear_dict

headers = {
    'User-Agent': 'Apifox/1.0.0 (https://www.apifox.cn)',
    'Accept': '*/*',
    'Host': 'sqzc.gd.gov.cn',
    'Connection': 'keep-alive'
}
db = pymysql.connect(host="10.1.3.29", port=3306, database="crawler_data_prd", user="root", password="root", charset='utf8', autocommit=True)
cursor = db.cursor()
lock = threading.RLock()


def get_data(i):
    print(i)
    url = "https://sqzc.gd.gov.cn/sqzc/m/cms/policy/getPolicyListPage2?pageNumber={}&pageSize=10&keywords=&publisher=&city=".format(str(i))
    response = requests.get(url, headers=headers)
    html = response.json()
    lock.acquire()
    for data in html["data"]:
        item = {}
        item["title"] = data["title"]
        item["publishDate"] = data["publishDate"]
        item["publisher"] = data["publisher"]
        item["city"] = data["city"]
        item["viewCount"] = data["viewCount"]
        item["place"] = data["place"]
        item["page"] = i
        # cur = select_data_mysql(dbhost="localhost", dbname="crawler_data_prd", tablename="policy", where='title = "{}"'.format(item["title"]))
        # if not cur.rowcount:
        #    insert_update_data_mysql(dbhost="localhost", dbname="crawler_data_prd", tablename="policy", **clear_dict(item))
        insert_data_mysql(dbhost="localhost", dbname="crawler_data_prd", tablename="policy", **clear_dict(item))
        print(item["title"])
        print("*" * 100)
    lock.release()


with concurrent.futures.ThreadPoolExecutor(max_workers=50) as executor:
    future_to_url = executor.map(get_data, [i for i in range(1, 3078)])

# for i in range(1, 3078):  # 3078
#    print(i)
#    url = "https://sqzc.gd.gov.cn/sqzc/m/cms/policy/getPolicyListPage2?pageNumber={}&pageSize=10&keywords=&publisher=&city=".format(str(i))
#    response = requests.get(url, headers=headers)
#    html = response.json()
#    for data in html["data"]:
#       item = {}
#       item["title"] = data["title"]
#       item["publishDate"] = data["publishDate"]
#       item["publisher"] = data["publisher"]
#       item["city"] = data["city"]
#       item["viewCount"] = data["viewCount"]
#       item["place"] = data["place"]
#       insert_update_data_mysql(dbhost="localhost", dbname="crawler_data_prd", tablename="policy", **clear_dict(item))
#       print(item)
#       print("*" * 100)

  

 

标签:NAME,utils,echo,item,policy,import,data
From: https://www.cnblogs.com/yoyo1216/p/17457324.html

相关文章

  • utils.js
    加减乘除运算/***@description:加法运算*@param{*}arg1*@param{*}arg2*@param{*}number展示小数点后位数*@return{*}*/exportfunctionoperationAdd(arg1,arg2,number=2){letl1=0,l2=0,m,c;try{l1=arg1.toS......
  • 还在用BeanUtils拷贝对象?MapStruct才是王者!【附源码】
    前几天,远在北京的小伙伴在群里抛出了“MapStruct”的概念。对于只闻其名,未见其人的我来说,决定对其研究一番。本文我们就从MapStruct的概念出发,通过具体的代码示例来研究它的使用情况,最后与“市面上”的其它工具来做个对比!官方介绍首先我们打开MapStruct的官网地址,映入眼帘的就......
  • java.lang.ClassNotFoundException: weblogic.utils.NestedException
    我单元测试的时候报这种错误Causedby:java.lang.ClassNotFoundException:weblogic.utils.NestedException atjava.net.URLClassLoader$1.run(URLClassLoader.java:202) atjava.security.AccessController.doPrivileged(NativeMethod) atjava.net.URLClassLoader.findC......
  • django.db.utils.integrityerror: (1048, "Column 'phone' cannot be null")
    1背景:模型表中字段为:phone=models.CharField(default='',max_length=64,verbose_name=u'电话',blank=True) 2分析:在保存模型实例时,‘phone’被设置为空值.但是该字段在数据库中被设置为(NOTNULL),因此导致完整性约束错误. blank=True,在Django模型验证中,......
  • 还在用BeanUtils拷贝对象,MapStruct才是yyds | 附源码
    前几天,远在北京的小伙伴在群里抛出了“MapStruct”的概念。对于只闻其名,未见其人的我来说,决定对其研究一番。本文我们就从MapStruct的概念出发,通过具体的代码示例来研究它的使用情况,最后与“市面上”的其它工具来做个对比!官方介绍首先我们打开MapStruct的官网地址,映入眼帘的就......
  • GZIPUtils工具类
    GZIPUtils.java工具类importjava.io.ByteArrayInputStream;importjava.io.ByteArrayOutputStream;importjava.io.IOException;importjava.util.zip.GZIPInputStream;importjava.util.zip.GZIPOutputStream;importorg.apache.commons.codec.binary.StringUtils;pub......
  • daal utils printNumericTable
    #===============================================================================#Copyright2014-2017IntelCorporation#AllRightsReserved.##IfthissoftwarewasobtainedundertheIntelSimplifiedSoftwareLicense,#thefollowingtermsapply:......
  • Apache-DBUtils
    1. Apache—DBUtils  8461.1 先分析一个问题   8461.关闭connection后,resultSet结果集无法使用2.resultSet不利于数据的管理3.示意图1.2 用自己的土方法来解决  847代码在com.stulzl.dbutils_My_DBUtilspackagecom.stulzl.dbutils_;importcom.stulzl.jdbcutils_druid......
  • coreutils test 源码分析
    Test的代码中主要解析如下语法,当然使用的时候也可以参考如下语法进行语句的编写/*test(1)acceptsthefollowinggrammar:oexpr::=aexpr|aexpr"-o"oexpr;aexpr::=nexpr|nexpr"-a"aexpr;nexpr::=primary|"!"primaryprimary::=un......
  • HttpUtils方法
    引入依赖:<!--httpclient--><dependency><groupId>org.apache.httpcomponents</groupId><artifactId>httpclient</artifactId></dependency><!--io常用工具类--><depe......