一,测试分词命令:
1,查看已安装的插件:
[lhdop@blog ~]$ curl -X GET "localhost:9200/_cat/plugins?v&s=component"
name component version
2,standard分词
[lhdop@blog ~]$ curl -X GET "localhost:9200/_analyze?pretty" -H 'Content-Type: application/json' -d'
> {
"analyzer": "standard",
"text": "Text to analyze"
}
'
{
"tokens" : [
{
"token" : "text",
"start_offset" : 0,
"end_offset" : 4,
"type" : "<ALPHANUM>",
"position" : 0
},
{
"token" : "to",
"start_offset" : 5,
"end_offset" : 7,
"type" : "<ALPHANUM>",
"position" : 1
},
{
"token" : "analyze",
"start_offset" : 8,
"end_offset" : 15,
"type" : "<ALPHANUM>",
"position" : 2
}
]
}
2, 从命令行安装smartcn分词插件:
[lhdop@blog bin]$ ./elasticsearch-plugin install analysis-smartcn
warning: ignoring JAVA_HOME=/usr/local/soft/jdk-17.0.11; using ES_JAVA_HOME
-> Installing analysis-smartcn
-> Downloading analysis-smartcn from elastic
[=================================================] 100%
-> Installed analysis-smartcn
-> Please restart Elasticsearch to activate any plugins installed
3,smartcn安装到了plugins目录下,查看文件:
[lhdop@blog elasticsearch-8.14.2]$ ls plugins/analysis-smartcn/
analysis-smartcn-8.14.2.jar lucene-analysis-smartcn-9.10.0.jar plugin-descriptor.properties
安装完后,如果想让插件生效,需要重启elasticsearch服务
关闭
[root@blog ~]# kill 260903
启动:
[root@blog ~]# /usr/local/soft/elasticsearch-8.14.2/bin/elasticsearch -d
4,试用smartcn分词,效果不怎么理想,把'海鲜味'给分成了'海'和'鲜味'两个词
[lhdop@blog elasticsearch-8.14.2]$ curl -X GET "localhost:9200/_analyze?pretty" -H 'Content-Type: application/json' -d'
{
"analyzer": "smartcn",
"text": "这是一碗海鲜味方便面"
}
'
{
"tokens" : [
{
"token" : "这",
"start_offset" : 0,
"end_offset" : 1,
"type" : "word",
"position" : 0
},
{
"token" : "是",
"start_offset" : 1,
"end_offset" : 2,
"type" : "word",
"position" : 1
},
{
"token" : "一",
"start_offset" : 2,
"end_offset" : 3,
"type" : "word",
"position" : 2
},
{
"token" : "碗",
"start_offset" : 3,
"end_offset" : 4,
"type" : "word",
"position" : 3
},
{
"token" : "海",
"start_offset" : 4,
"end_offset" : 5,
"type" : "word",
"position" : 4
},
{
"token" : "鲜味",
"start_offset" : 5,
"end_offset" : 7,
"type" : "word",
"position" : 5
},
{
"token" : "方便面",
"start_offset" : 7,
"end_offset" : 10,
"type" : "word",
"position" : 6
}
]
}
5,查看已安装的插件,已经可以看到安装后的smartcn插件了:
[lhdop@blog elasticsearch-8.14.2]$ curl -X GET "localhost:9200/_cat/plugins?v&s=component"
name component version
iZ2zejc9t0hf6pnw6sewrxZ analysis-smartcn 8.14.2
二,ik分词插件安装
1,github地址
https://github.com/infinilabs/analysis-ik/releases
2,官网:
https://release.infinilabs.com/analysis-ik/stable/
3,查看本地es的版本:
[lhdop@blog ~]$ /usr/local/soft/elasticsearch-8.14.2/bin/elasticsearch --version
warning: ignoring JAVA_HOME=/usr/local/soft/jdk-17.0.11; using ES_JAVA_HOME
Version: 8.14.2, Build: tar/2afe7caceec8a26ff53817e5ed88235e90592a1b/2024-07-01T22:06:58.515911606Z, JVM: 17.0.11
4, 安装支持的elasticsearch版本地址:
说明:ik的版本是要和es的版本严格对应的,否则可能会导致安装或运行报错
[lhdop@blog elasticsearch-8.14.2]$ bin/elasticsearch-plugin install https://get.infini.cloud/elasticsearch/analysis-ik/8.14.2
warning: ignoring JAVA_HOME=/usr/local/soft/jdk-17.0.11; using ES_JAVA_HOME
-> Installing https://get.infini.cloud/elasticsearch/analysis-ik/8.14.2
-> Downloading https://get.infini.cloud/elasticsearch/analysis-ik/8.14.2
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@ WARNING: plugin requires additional permissions @
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
* java.net.SocketPermission * connect,resolve
See https://docs.oracle.com/javase/8/docs/technotes/guides/security/permissions.html
for descriptions of what these permissions allow and the associated risks.
Continue with installation? [y/N]y
-> Installed analysis-ik
-> Please restart Elasticsearch to activate any plugins installed
5,重启服务:
关闭
[root@blog ~]# kill 264687
启动:
[root@blog ~]# /usr/local/soft/elasticsearch-8.14.2/bin/elasticsearch -d
6,安装完成后查看插件列表
[lhdop@blog elasticsearch-8.14.2]$ ./bin/elasticsearch-plugin list
warning: ignoring JAVA_HOME=/usr/local/soft/jdk-17.0.11; using ES_JAVA_HOME
analysis-ik
analysis-smartcn
三,测试效果
1,两种分词方式
ik中文分词效果
ik分词插件支持 ik_smart 和 ik_max_word 两种分词器
ik_smart - 粗粒度的分词
ik_max_word - 会尽可能的枚举可能的关键词,就是分词比较细致一些,会分解出更多的关键词
2,测试ik_smart分词器:
[lhdop@blog elasticsearch-8.14.2]$ curl -X GET "localhost:9200/_analyze?pretty" -H 'Content-Type: application/json' -d'
{
"analyzer": "ik_smart",
"text": "这是一碗海鲜味方便面"
}
> '
{
"tokens" : [
{
"token" : "这是",
"start_offset" : 0,
"end_offset" : 2,
"type" : "CN_WORD",
"position" : 0
},
{
"token" : "一碗",
"start_offset" : 2,
"end_offset" : 4,
"type" : "CN_WORD",
"position" : 1
},
{
"token" : "海",
"start_offset" : 4,
"end_offset" : 5,
"type" : "CN_CHAR",
"position" : 2
},
{
"token" : "鲜味",
"start_offset" : 5,
"end_offset" : 7,
"type" : "CN_WORD",
"position" : 3
},
{
"token" : "方便面",
"start_offset" : 7,
"end_offset" : 10,
"type" : "CN_WORD",
"position" : 4
}
]
}
3,测试ik_max_word分词器:
[lhdop@blog elasticsearch-8.14.2]$ curl -X GET "localhost:9200/_analyze?pretty" -H 'Content-Type: application/json' -d'
{
"analyzer": "ik_max_word",
"text": "这是一碗海鲜味方便面"
}'
{
"tokens" : [
{
"token" : "这是",
"start_offset" : 0,
"end_offset" : 2,
"type" : "CN_WORD",
"position" : 0
},
{
"token" : "一碗",
"start_offset" : 2,
"end_offset" : 4,
"type" : "CN_WORD",
"position" : 1
},
{
"token" : "一",
"start_offset" : 2,
"end_offset" : 3,
"type" : "TYPE_CNUM",
"position" : 2
},
{
"token" : "碗",
"start_offset" : 3,
"end_offset" : 4,
"type" : "COUNT",
"position" : 3
},
{
"token" : "海鲜",
"start_offset" : 4,
"end_offset" : 6,
"type" : "CN_WORD",
"position" : 4
},
{
"token" : "鲜味",
"start_offset" : 5,
"end_offset" : 7,
"type" : "CN_WORD",
"position" : 5
},
{
"token" : "方便面",
"start_offset" : 7,
"end_offset" : 10,
"type" : "CN_WORD",
"position" : 6
},
{
"token" : "方便",
"start_offset" : 7,
"end_offset" : 9,
"type" : "CN_WORD",
"position" : 7
},
{
"token" : "面",
"start_offset" : 9,
"end_offset" : 10,
"type" : "CN_CHAR",
"position" : 8
}
]
}
四,查看es版本
[lhdop@blog ~]$ /usr/local/soft/elasticsearch-8.14.2/bin/elasticsearch --version
warning: ignoring JAVA_HOME=/usr/local/soft/jdk-17.0.11; using ES_JAVA_HOME
Version: 8.14.2, Build: tar/2afe7caceec8a26ff53817e5ed88235e90592a1b/2024-07-01T22:06:58.515911606Z, JVM: 17.0.11
标签:end,token,start,ik,elasticsearch,offset,8.14,type From: https://www.cnblogs.com/architectforest/p/18295458