elasticsearch 设置自定义分词

时间：2023-07-18 22:46:07浏览次数：36

标签：自定义数据库 Elasticsearch 分词器 MySQL elasticsearch 分词

要在Elasticsearch中使用MySQL数据库中定义的分词，你需要执行以下步骤：

将MySQL数据库中的分词数据导入到Elasticsearch中：
- 从MySQL数据库中提取分词数据，包括分词规则、停用词等。
- 将这些数据转换为适合Elasticsearch使用的格式，例如JSON。
- 使用Elasticsearch的API（如Bulk API）将分词数据导入到Elasticsearch的索引中。
创建自定义分词器并指定使用MySQL数据库中的分词规则：
- 在Elasticsearch中创建一个自定义分词器，使用相应的标记器和过滤器。
- 在自定义分词器的配置中，使用MySQL数据库中的分词规则，例如自定义字符过滤器或标记器。
将自定义分词器应用于Elasticsearch索引的字段：
- 在索引的映射中，为需要使用自定义分词器的字段指定使用该分词器。

以下是一个示例，展示如何将MySQL数据库中的分词数据导入到Elasticsearch中，并创建自定义分词器应用于索引的字段：

从MySQL数据库中提取分词数据：
- 连接到MySQL数据库并执行查询，提取分词数据。
- 将分词数据保存为适合Elasticsearch使用的格式，如JSON。
导入分词数据到Elasticsearch中：
- 使用Elasticsearch的API（如Bulk API）将分词数据导入到Elasticsearch的索引中。

// 示例：从MySQL数据库导入分词数据到Elasticsearch

// 1. 从MySQL数据库中提取分词数据
List<String> stopwords = new ArrayList<>();
List<String> synonyms = new ArrayList<>();
// 执行查询，提取分词数据，并将结果存储在stopwords和synonyms列表中

// 2. 导入分词数据到Elasticsearch
RestHighLevelClient client = new RestHighLevelClient(
    RestClient.builder("localhost:9200"));

// 创建索引
CreateIndexRequest createIndexRequest = new CreateIndexRequest("my_index");
client.indices().create(createIndexRequest, RequestOptions.DEFAULT);

// 创建分词器配置
String analyzerJson = "{\n" +
    "  \"settings\": {\n" +
    "    \"analysis\": {\n" +
    "      \"filter\": {\n" +
    "        \"my_stopwords_filter\": {\n" +
    "          \"type\": \"stop\",\n" +
    "          \"stopwords\": " + stopwords.toString() + "\n" +
    "        },\n" +
    "        \"my_synonyms_filter\": {\n" +
    "          \"type\": \"synonym\",\n" +
    "          \"synonyms\": " + synonyms.toString() + "\n" +
    "        }\n" +
    "      },\n" +
    "      \"analyzer\": {\n" +
    "        \"my_custom_analyzer\": {\n" +
    "          \"type\": \"custom\",\n" +
    "          \"tokenizer\": \"standard\",\n" +
    "          \"filter\": [\n" +
    "            \"lowercase\",\n" +
    "            \"my_stopwords_filter\",\n" +
    "            \"my_synonyms_filter\"\n" +
    "          ]\n" +
    "        }\n" +
    "      }\n" +
    "    }\n" +
    "  }\n" +
    "}";

// 应用分词器配置到索引的映射
PutMappingRequest putMappingRequest = new PutMappingRequest("my_index")
    .source(analyzerJson, XContentType.JSON);
client.indices().putMapping(putMappingRequest, RequestOptions.DEFAULT);

client.close();

在上述示例中，我们从MySQL数据库中提取了分词数据，包括停用词和同义词，并将其保存为字符串列表（stopwords和synonyms）。然后，我们使用Elasticsearch的API创建了一个索引，并创建了一个包含自定义分词器配置的JSON字符串（analyzerJson）。最后，我们将分词器配置应用到索引的映射中。

请注意，示例中的分词器配置是一个简单的示例，你需要根据自己的分词规则和需求进行调整。

希望这个示例对你有所帮助！如有其他问题，请随时提问。

标签：自定义,数据库,Elasticsearch,分词器,MySQL,elasticsearch,分词
From： https://www.cnblogs.com/wanglichaoya/p/17564326.html

Tomcat中配置自定义404错误页面
（1）%CATALINA_HOME%\conf\web.xml中web-app节点中添加<error-page><error-code>404</error-code><location>/404.html</location></error-page>在webapps下ROOT新增404.html页面<htmllang="en"><head&g......
node_export自定义启动监控指标
/usr/local/bin/node_exporter--collector.ntp--collector.supervisord--collector.supervisord.url=http://localhost:9001/RPC2--collector.textfile.directory=/var/opt--collector.time--collector.cpu--collector.filesystem--collector.filefd--collector.loa......
axios封装的请求及拦截统一管理，和之前相比方便添加自定义请求头和超时
1、api.js文件importaxiosfrom'axios'import{Message}from'element-ui'consttimeout=5000//默认超时constapi=axios.create({baseURL:'',//设置API的基础URLtimeout:timeout,//设置超时时间，单位为毫秒headers:{'Content-......
4.ElasticSearch~进阶(二)
1、aggregation(执行聚合)聚合提供了从数据中分组和提取数据的能力。最简单的聚合大致等于SQL的聚合函数。在ElasticSearch中，你有执行搜索返回hit，并且同时返回聚合结果，把一个响应中所有hits分隔开的能力，这是非常强大且有效的。您可以执行查询和多个聚合并且在一次使用中得到各自......
3.ElasticSearch~进阶
ES支持两种基本方式检索：一个是通过使用RESTrequestURI来发送搜索参数(uri+检索参数)GETbank/_search?q=*&sort=account_number:asc另一个是通过RESTrequestbody来发送他们(uri+请求体)GETbank/_search{"query":{"match_all":{}},"sort":[{"balance&qu......
Pytorch自定义数据集模型完整训练流程
2、导入各种需要用到的包importtorch //用于导入名为"torch"的模块。torch 是一个广泛使用的库，用于构建和训练神经网络。它提供了丰富的功能和工具，包括张量操作、自动求导、优化算法等，使得深度学习任务更加简单和高效。可以使用torch.Tensor类来创建张量，使用torch.nn.Modul......
terrasolid自定义点云读取格式
步骤如下： ......
自定义CANoe工程——按键控制后备箱开关
工程文件地址：C:\Users\Public\Documents\Vector\CANoe\Projects\TrunkOpen1.新建数据库，需要有一个节点Trunk，一个报文TrunkOpen，报文上的信号为OpenOrClose。定义好这些以后不要忘了以下几点：信号要关联到它的对应报文；报文由哪个节点发送，具体在哪如下图所示；添加节点在建立数据......
ElasticSearch安装中文分词器（插件）、分词测试
https://github.com/medcl/elasticsearch-analysis-ik分词测试：https://www.elastic.co/guide/en/elasticsearch/reference/6.8/indices-analyze.html请求URL：http://127.0.0.1:9200/_analyze请求方式：POST请求体/类型（JSON）：{"analyzer":"ik_max_word",......
ElasticSearch-Mapping类型映射-增删改查
https://www.elastic.co/guide/en/elasticsearch/reference/6.8/mapping.html7.x版本后默认都是_doc类型增加Mapping映射先说一个特殊的字段_all：https://www.elastic.co/guide/en/elasticsearch/reference/6.8/mapping-all-field.html#mapping-all-field_all字段是一个特......

elasticsearch 设置自定义分词

相关文章

赞助商

阅读排行