首页 > 其他分享 >向es中导入数据的几个方式

向es中导入数据的几个方式

时间:2023-08-26 16:45:40浏览次数:53  
标签:index jdbc 几个 id 导入 goods path logstash es

方式一,使用kibana控制台添加(该方式数据量有上限,批量导入推荐CURL)

该方式需要安装kibana,启动后打开控制台

http://kibana部署IP:5601/app/dev_tools#/console

POST_bulk
{"index":{"_index":"test_goods","_type":"goods","_id":10001}}
{"code":"1","price":10,"name":"商品1"}
{"index":{"_index":"test_goods","_type":"goods","_id":10002}}
{"code":"2","price":20,"name":"商品2"}
{"index":{"_index":"test_goods","_type":"goods","_id":10003}}

方式二,使用CURL批量导入,十万加耗时3s左右(curl官网下载地址

//用到的工具为crul.exe ,数据集为 goods.json
curl -H "Content-Type: application/json" -XPOST "ES服务IP:9200/test_goods/goods/_bulk?refresh" --data-binary "@goods.json"

方式三,使用 logstash 进行自定义导入

3.1 MySQL导出再导入ES

input {
  jdbc {
    jdbc_driver_library => "./mysql-connector-java-5.1.46/mysql-connector-java-5.1.46-bin.jar"
    jdbc_driver_class => "com.mysql.jdbc.Driver"
    jdbc_connection_string => "jdbc:mysql://xxxxxx.mysql.singapore.rds.aliyuncs.com:3306/fle_staging"
    jdbc_user => "xxxx"
    jdbc_password => "xxxx"
    schedule => "* * * * *"
    statement => "SELECT * FROM parcxxxnfo WHERE created_at >= :sql_last_value  order by created_at limit 200000"
    use_column_value => true
    #tracking_column_type => "numeric"
    tracking_column_type => "timestamp"
    tracking_column => "created_at"
    last_run_metadata_path => "syncpoint_table_parcel_info"
    #处理中文乱码问题
    codec => plain { charset => "UTF-8"}
 
    #使用其它字段追踪,而不是用时间
    #use_column_value => true
    #追踪的字段
    #tracking_column => src_phone
    record_last_run => true
    #上一个sql_last_value值的存放文件路径, 必须要在文件中指定字段的初始值
    #last_run_metadata_path => "mysql/station_parameter.txt"
    jdbc_default_timezone => "Asia/Shanghai"
  }
}
output {
  elasticsearch {
    hosts => ["172xxxx2.83"]
    user => ""
    password => ""
    index => "parcxxxnfo"
    document_id => "%{pno}"
  }
  file{
    path => "/tmp/%{+YYYY.MM.dd}-file.txt"
  }
}

3.2 从文件导入数据到ES,通过命令

logstash.bat -f F:\logstash-7.13.2-windows-x86_64\logstash-7.13.2\config\logstash.conf 来加载配置文件:

配置文件为:

# Sample Logstash configuration for creating a simple
# Beats -> Logstash -> Elasticsearch pipeline.
 
# input {
#   beats {
#     port => 5044
#   }
# }
# 
# output {
#   elasticsearch {
#     hosts => ["http://localhost:9200"]
#     index => "%{[@metadata][beat]}-%{[@metadata][version]}-%{+YYYY.MM.dd}"
#     #user => "elastic"
#     #password => "changeme"
#   }
# }
 
input {
  file {
    path => "F:/logstash-data-movie-latest/ml-latest/movies.csv"
    start_position => "beginning"
    sincedb_path => "F:/logstash-data-movie-latest/ml-latest/movies.stash.log"
  }
  beats {
    port => 5044
  }
}
filter {
  csv {
    separator => ","
    columns => ["id","content","genre"]
  }
  
  mutate {
    split => {"genre" => "|"}
    remove_field => ["path","host","@timestamp","message"]
  }
  
  mutate {
    split => ["content","("]
    add_field => {"title" => "%{[content][0]}"}
    add_field => {"year" => "%{[content][1]}"}
  }
  
  mutate {
    convert => {
      "year" => "integer"
    }
    strip => ["title"]
    remove_field => ["path","host","@timestamp","message","content"]
  }
}
output {
  elasticsearch {
    hosts => "http://11.1.217.245:9200"
    index => "movies"
    document_id => "%{id}"
  }
  stdout{}
}

CSV的数据格式如下:

movieId,title,genres
1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy
2,Jumanji (1995),Adventure|Children|Fantasy
3,Grumpier Old Men (1995),Comedy|Romance
4,Waiting to Exhale (1995),Comedy|Drama|Romance
5,Father of the Bride Part II (1995),Comedy

原文链接:https://blog.csdn.net/yunzhonghefei/article/details/11835415

标签:index,jdbc,几个,id,导入,goods,path,logstash,es
From: https://www.cnblogs.com/cgy-home/p/17659039.html

相关文章

  • Applescript脚本实现全自动无痕检测手机号码是否注册iMessage的原理
    一、检测数据的两种方式:1.人工筛选,将要验证的号码输出到文件中,以逗号分隔。再将文件中的号码粘贴到iMessage客户端的地址栏,iMessage客户端会自动逐个检验该号码是否为iMessage账号,检验速度视网速而定。红色表示不是iMessage账号,蓝色表示iMessage账号。2.编写脚本控制Macos/iphon......
  • 查看已下载的Docker镜像latest具体版本
    查看已下载的Docker镜像latest具体版本dockerimageinspectlscr.io/linuxserver/calibre-web:latest|grep-iversiondockerpulllscr.io/linuxserver/calibre-web:0.6.20-ls221dockerimages|grepcalibre-web~#dockerimageinspectlscr.io/linuxserver/calibre-we......
  • CF1858D Trees and Segments
    一道考查预处理技巧的dp。观察式子\(a\timesL_0+L_1\),一个显然的想法是“定一求一”,即预处理求出对于每个\(L_1\)最大的\(L_0\),然后对于每个\(a\),枚举\(L_1\),统计最大的\(a\timesL_0+L_1\)。这样,我们将问题转化为了:已知\(L_1=len\),求出\(dp_{len}=L_{0max}\)。dp数......
  • cv2.threshold阈值相关用法说明
    ret,dst=cv2.threshold(src,thresh,maxval,type)src:输入图,只能输入单通道图像,通常来说为灰度图dst:输出图thresh:阈值maxval:当像素值超过了阈值(或者小于阈值,根据type来决定),所赋予的值type:二值化操作的类型,就是怎么处理阈值,包含以下5种类型:cv2.THRESH_BINARY;cv2.THRESH_BINARY......
  • CodeForces 825G Tree Queries
    洛谷传送门CF传送门模拟赛赛时做法。看到查询路径点权最小值,想到建重构树,满足重构树上\(\operatorname{LCA}(x,y)\)为原树上\(x\toy\)路径的点权最小值。建树方法可以参考CF1797FLiHuaandPath。于是问题变成了,维护一个点集,支持加点,查询给定点\(x\)到点集中所有......
  • Saving your Plot: Stylesheets and Layouts (STY, LAY, LPK)
    DescriptionWedescribetheusesanddifferencesbetweenTecplot360’sthreedifferentmethodsforsavingthestyleandsettingsofyourplot.Theseincludestylesheets(STYfiles)linkeddatalayouts(LAYfiles)andpackageddatalayouts(LPKfiles).The......
  • iptables使用方法
    -A(添加-I(插入-D(删除-F(清除所有-L(显示 -P(默认策略举例:iptables-AINPUT-s10.7.226.135iptcp--dport22-jDROP(丢弃iptables-LINPUT(输入  FORWARD(转发 OUTPUT(输出iptables-AINPUT-s192.168.1.10-jACCEPT(同意 iptables-IINPUT2-s192.168.1.1......
  • How to get the Axios's response file type All In One
    HowtogettheAxios'sresponsefiletypeAllInOne获取Axios请求响应头中的文件类型AxiosAPIhttps://www.npmjs.com/package/axiosdemosimportfsfrom'node:fs';importpathfrom'path';importaxiosfrom'axios';asyncfunctio......
  • Palo Alto PAN-OS 10.2.5 for ESXi & KVM 全功能试用版 (包含 TP URL WF 等高级订阅许
    PaloAltoPAN-OS10.2.5forESXi&KVM全功能试用版(包含TPURLWF等高级订阅许可)TP,URL,WildFire,DNSSecurity以及GP和SDWAN请访问原文链接:https://sysin.org/blog/pan-os-10-vm-series-trial/,查看最新版。原创作品,转载请保留出处。作者主页:sysin.orgPalo......
  • BGP Wedgies
    [email protected],2023DescriptionBGPWedgiesRFC4262,这是在BGPv4出来之前的,不知道现在还是否存在,以及如何解决的,是否还是需要主动断开连接来解决。【看了BGPWedgies,andhowtoavoidthem,目前应该还是需要手动恢复】BGPwedgies......