ES支持以插件形式,热插拔需要的插件。对于中文分词器,我们这边选用IK分词器,下边来看下基于Docker形式怎么安装IK分析器插件
下载IK分析器
IK分词器在github上有大神以开源,直接拉取压缩包就可以了。注意:IK分析器必须要和ES版本保持一致
下载地址:https://github.com/medcl/elasticsearch-analysis-ik/releases,找到对应的版本即可。
Linux系统下,解压缩
cd es_kibana
mkdir plugin
rz elasticsearch-analysis-ik-7.1.0.zip
sudo unzip elasticsearch-analysis-ik-7.1.0.zip -d ./plugin
修改docker-compose.yml文件,挂载插件目录到容器中
version: "3.8"
volumes:
data:
config:
plugin:
networks:
es:
services:
kibana:
image: kibana:7.1.0
ports:
- "5601:5601"
networks:
- "es"
volumes:
- ./kibana.yml:/usr/share/kibana/config/kibana.yml
elasticsearch:
image: elasticsearch:7.1.0
environment:
- "ES_JAVA_OPTS=-Xms512m -Xmx512m"
- "discovery.type=single-node"
volumes:
- data:/usr/share/elasticsearch/data
- config:/usr/share/elasticsearch/config
- ./plugin:/usr/share/elasticsearch/plugins
ports:
- "9200:9200"
- "9500:9300"
networks:
- "es"
重启验证
sudo docker compose down
sudo docker compose up -d
## 访问kibana调试
POST /_analyze
{
"analyzer": "ik_smart",
"text":"中华人民共和国国歌"
}
## 输出结果
{
"tokens" : [
{
"token" : "中华人民共和国",
"start_offset" : 0,
"end_offset" : 7,
"type" : "CN_WORD",
"position" : 0
},
{
"token" : "国歌",
"start_offset" : 7,
"end_offset" : 9,
"type" : "CN_WORD",
"position" : 1
}
]
}
POST /_analyze
{
"analyzer": "ik_max_word",
"text":"中华人民共和国国歌"
}
##输出结果
{
"tokens" : [
{
"token" : "中华人民共和国",
"start_offset" : 0,
"end_offset" : 7,
"type" : "CN_WORD",
"position" : 0
},
{
"token" : "中华人民",
"start_offset" : 0,
"end_offset" : 4,
"type" : "CN_WORD",
"position" : 1
},
{
"token" : "中华",
"start_offset" : 0,
"end_offset" : 2,
"type" : "CN_WORD",
"position" : 2
},
{
"token" : "华人",
"start_offset" : 1,
"end_offset" : 3,
"type" : "CN_WORD",
"position" : 3
},
{
"token" : "人民共和国",
"start_offset" : 2,
"end_offset" : 7,
"type" : "CN_WORD",
"position" : 4
},
{
"token" : "人民",
"start_offset" : 2,
"end_offset" : 4,
"type" : "CN_WORD",
"position" : 5
},
{
"token" : "共和国",
"start_offset" : 4,
"end_offset" : 7,
"type" : "CN_WORD",
"position" : 6
},
{
"token" : "共和",
"start_offset" : 4,
"end_offset" : 6,
"type" : "CN_WORD",
"position" : 7
},
{
"token" : "国",
"start_offset" : 6,
"end_offset" : 7,
"type" : "CN_CHAR",
"position" : 8
},
{
"token" : "国歌",
"start_offset" : 7,
"end_offset" : 9,
"type" : "CN_WORD",
"position" : 9
}
]
}
标签:end,CN,IK,分析器,start,token,offset,type,ES
From: https://www.cnblogs.com/tenic/p/16795904.html