标签：7.17 启动 ml gradle 源码 elasticsearch data es

通过这篇文章，了解ES 如何源码启动、如何定位对应请求的实现类。

1. 准备环境

Jdk: 17

Es: 7.17

IDEA： 2024.1

Gradle: 8.7

安装jdk、idea
下载es 源码： (我从github 下载的7.17.8 的代码)
https://github.com/elastic/elasticsearch 或者: https://gitee.com/mirrors/elasticsearch
gradle下载(这一步也可以跳过)

其实就是让gradle 默认走本地文件，不然下载比较慢。

1. elasticsearch源码\gradle\wrapper\gradle-wrapper.properties
distributionBase=GRADLE_USER_HOME
distributionPath=wrapper/dists
distributionUrl=https\://services.gradle.org/distributions/gradle-7.5.1-all.zip
zipStoreBase=GRADLE_USER_HOME
zipStorePath=wrapper/dists
distributionSha256Sum=db9c8211ed63f61f60292c69e80d89196f9eb36665e369e7f00ac4cc841c2219
2. https\://services.gradle.org/distributions/gradle-7.5.1-all.zip 下载
3. 放置 gradle-7.5.1-all.zip 到elasticsearch\gradle\wrapper
4. 修改gradle-wrapper.properties
distributionUrl=gradle-7.5.1-all.zip

修改全局gradle仓库地址
在USER_HOME/.gradle/下面创建新文件 init.gradle（没有这个文件的可以手动创建），输入下面的内容并保存。
修改gradle的远程仓库地址为阿里云的仓库

allprojects{
    repositories {
        def ALIYUN_REPOSITORY_URL = 'https://maven.aliyun.com/repository/public/'
        def ALIYUN_GRADLE_PLUGIN_URL = 'https://maven.aliyun.com/repository/gradle-plugin/'
        all { ArtifactRepository repo ->
            if(repo instanceof MavenArtifactRepository){
                def url = repo.url.toString()
                if (url.startsWith('https://repo1.maven.org/maven2/')) {
                    project.logger.lifecycle "Repository ${repo.url} replaced by $ALIYUN_REPOSITORY_URL."
                    remove repo
                }
                if (url.startsWith('https://jcenter.bintray.com/')) {
                    project.logger.lifecycle "Repository ${repo.url} replaced by $ALIYUN_REPOSITORY_URL."
                    remove repo
                }
                if (url.startsWith('https://plugins.gradle.org/m2/')) {
                    project.logger.lifecycle "Repository ${repo.url} replaced by $ALIYUN_GRADLE_PLUGIN_URL."
                    remove repo
                }
            }
        }
        maven { url ALIYUN_REPOSITORY_URL }
        maven { url ALIYUN_GRADLE_PLUGIN_URL }
    }
}

2. IDEA 运行

1. 环境准备

IDEA 导入源码项目

File->Open->选中es根目录进入导入

project struct 设置项目SDK，这里选择idea 自带的默认的17

设置gradle 的编译环境

perference 搜索gradle:

2. 开始编译

编译源码

导入IDEA 之后右下角会弹窗load gradle project，如果自己没点，可以点gradle然后手动Reload

点击完成之后需要等待一段时间，build 比较费时间。

这里不需要自己设置子项目为 gradle 项目，我在一开始还自己设置了，在自己 reload all projects 的时候会自动加载子项目。

构建发布包

操作：根据自己的操作系统，选择对应的 no-jdk-*-tar 的 build 按钮，构建 Elasticsearch 发布包。

构建完成：在对应的 xxx-tar 目录会有相应的build 目录以及文件

构建原因：distribution/archives/no-jdk-darwin-aarch64-tar/build/install/elasticsearch-7.17.8-SNAPSHOT 目录下会有许多模块， Elasticsearch 采用模块化，所以我们在改动到 modules 模块的代码时，都需要重新 build 一次，即使只添加了代码注释。否则，IDEA Debug 调试时，代码行号会对应不上哈。

构建的过程中，发现资源下载失败：

错误信息如下：
Could not determine the dependencies of task ':x-pack:plugin:ml:bundlePlugin'.

Could not resolve all task dependencies for configuration ':x-pack:plugin:ml:nativeBundle'.
Could not resolve org.elasticsearch.ml:ml-cpp:7.17.8-SNAPSHOT.
Required by:
project :x-pack:plugin:ml
> Could not resolve org.elasticsearch.ml:ml-cpp:7.17.8-SNAPSHOT.
> Could not get resource 'https://artifacts-snapshot.elastic.co/ml-cpp/7.17.8-SNAPSHOT/downloads/ml-cpp/ml-cpp-7.17.8-SNAPSHOT.zip'.
> Could not HEAD 'https://artifacts-snapshot.elastic.co/ml-cpp/7.17.8-SNAPSHOT/downloads/ml-cpp/ml-cpp-7.17.8-SNAPSHOT.zip'.

Connect to 127.0.0.1:33210 [/127.0.0.1] failed: Connection refused

解决办法：参考 https://github.com/elastic/elasticsearch/issues/48350

修改elasticsearch-7.17.8/x-pack/plugin/ml/build.gradle文件：

最终下载地址：

https://prelert-artifacts.s3.amazonaws.com/maven/org/elasticsearch/ml/ml-cpp/7.17.8-SNAPSHOT/ml-cpp-7.17.8-SNAPSHOT.zip

ps：如果下载失败，可能需要FQ，或者自己下载下载修改该文件走localRepo 的逻辑。

3. 源码启动

0. 源码简介

整个es java 源代码大概233W行，可以想象如果想弄清楚是多么的复杂。

es 采用模块化操作， server 是和服务端的主要程序; transport-netty4 模块是 Elasticsearch 基于 Netty 实现网络通讯，我们常用的 9200 或 9300 就是由它提供的。

程序的启动入口在： server/src/main/java/org/elasticsearch/bootstrap/Elasticsearch.java

接收前端的请求在包：server/src/main/java/org/elasticsearch/action

1. 相关文件修改

修改主启动类：

server 工程下 org.elasticsearch.bootstrap.Elasticsearch#main(java.lang.String[])， main 方法开头增加：

        String esHome = "/Users/xxx/app/xm/es_source/elasticsearch-7.17.8/distribution/archives/no-jdk-darwin-aarch64-tar/build/install/elasticsearch-7.17.8-SNAPSHOT"; // 自己build出来的文件基路径
        System.setProperty("es.path.home", esHome); // 设置 Elasticsearch 的【根】目录
        System.setProperty("es.path.conf", esHome + "/config");  // 设置 Elasticsearch 的【配置】目录
        System.setProperty("log4j2.disable.jmx", "true"); // 禁用 log4j2 的 JMX 监控，避免报错
        System.setProperty("java.security.policy", esHome + "/config/java.policy"); // 设置 Java 的安全策略

distribution/archives/no-jdk-darwin-aarch64-tar/build/install/elasticsearch-7.17.8-SNAPSHOT/config/elasticsearch.yml 文件增加：

node.name: node-1 # 设置 ES 节点名
xpack.security.enabled: false # 禁用 X-Pack 提供的安全认证功能，方便测试
ingest.geoip.downloader.enabled: false # 先关闭geoip库的更新

启动之后如果报磁盘水位的问题：

1. 问题:
[node-1] high disk watermark [90%] exceeded on [eo6zdEm8RWWOodoaSMXNXw][node-1][/Users/xxx/Desktop/es_file/es-7.17.8/0/data/nodes/0] free: 18.9gb[8.3%], shards will be relocated away from this node; currently relocating away shards totalling [0] bytes; the node is expected to continue to exceed the high disk watermark when these relocations are complete

2. 修复方案:上面文件继续添加
cluster.routing.allocation.disk.threshold_enabled: false

distribution/archives/no-jdk-darwin-aarch64-tar/build/install/elasticsearch-7.17.8-SNAPSHOT/config 新增文件java.policy

grant {
    permission java.security.AllPermission;
};

server/src/main/resources/org/elasticsearch/bootstrap/security.policy 文件删掉codeBase 相关：

2. 启动

运行主类 server 模块下： org.elasticsearch.bootstrap.Elasticsearch#main(java.lang.String[])

会看到日志：

访问9200:

xxxx % curl localhost:9200/
{
  "name" : "node-1",
  "cluster_name" : "elasticsearch",
  "cluster_uuid" : "V3cJUOHbQA2ZqeHZO67JdA",
  "version" : {
    "number" : "7.17.8",
    "build_flavor" : "unknown",
    "build_type" : "unknown",
    "build_hash" : "unknown",
    "build_date" : "unknown",
    "build_snapshot" : true,
    "lucene_version" : "8.11.1",
    "minimum_wire_compatibility_version" : "6.8.0",
    "minimum_index_compatibility_version" : "6.0.0-beta1"
  },
  "tagline" : "You Know, for Search"
}

org.elasticsearch.cli.EnvironmentAwareCommand#execute(org.elasticsearch.cli.Terminal, joptsimple.OptionSet) 这里可以看到给es 传变量可以有两种方式：

第一种是代码启动的环境变量设置: es.path.data, org.elasticsearch.bootstrap.Elasticsearch#main(java.lang.String[]) 增加

        // 设置data目录和日志文件目录
        System.setProperty("es.path.data", "/Users/xxx/Desktop/es_file/es-7.17.8/0/data"); // 设置 Elasticsearch 的【根】目录
        System.setProperty("es.path.logs", "/Users/xxx/Desktop/es_file/es-7.17.8/0/logs");  // 设置 Elasticsearch 的【配置】目录

第二种是程序参数加: -Epath.logs=xxx

-Epath.data=/Users/xxx/Desktop/es_file/es-7.17.8/0/data -Epath.logs=/Users/xxx/Desktop/es_file/es-7.17.8/0/logs

3. 创建&查看索引、插入数据debug

0. 逻辑解释

Elasticsearch 提供 RESTful API，对应到源码就是 server 项目下的 action 包
每个 API 转发到对应的 TransportXXXAction 的实现类，进行相应的代码逻辑。而 TransportXXXAction 需要在 ActionModule 中进行注册。

1. 创建索引

对应的类是：TransportCreateIndexAction

下断点：

调用：

curl -X PUT -H 'Content-Type:application/json' -d '{"mappings":{"properties":{"name":{"type":"keyword"},"age":{"type":"long"},"address":{"type":"text","analyzer":"standard"},"location":{"type":"geo_point"},"birth_date":{"type":"date"},"birth_date_value":{"type":"long"},"likes":{"type":"keyword"},"well_person":{"type":"boolean"},"salary":{"type":"integer_range"},"school":{"type":"wildcard"},"feature":{"type":"nested","properties":{"height":{"type":"double"},"weight":{"type":"double"}}}}}}' localhost:9200/qlq_user

会进入自己的断点，说明成功。

2. 查看索引

对于固定的url，可以用路径uri 进行搜索

对应类：org.elasticsearch.rest.action.cat.RestIndicesAction#doCatRequest

xxx % curl localhost:9200/_cat/indices
yellow open qlq_user OwnK3cMUT2-L7Rog062oHA 1 1 0 0 226b 226b

3. 新增文档

对应方法： org.elasticsearch.action.bulk.TransportShardBulkAction#dispatchedShardOperationOnPrimary

请求

curl -X POST -H 'Content-Type:application/json' -d '{"name":"张三","school":"Beijing Xicheng Middle School","age":30,"address":"北京市朝阳区","location":{"lat":39.9075,"lon":116.39723},"birth_date":"1990-01-01","birth_date_value":631120800000,"likes":["读书","旅行"],"feature":[{"height":175.5,"weight":70.0}],"salary":{"gte":5000,"lte":10000},"well_person":true}' localhost:9200/qlq_user/_doc/

4. 查询文档

1. 查询总数

接口：org.elasticsearch.action.search.TransportSearchAction#executeRequest

测试：

xxx % curl localhost:9200/qlq_user/_count
{"count":1,"_shards":{"total":1,"successful":1,"skipped":0,"failed":0}}%

2. 查询数据

接口： org.elasticsearch.action.search.TransportSearchAction#executeRequest

测试：

xxx % curl -X GET -H 'Content-Type:application/json' -d '{"query":{"term":{"likes":{"value":"读书"}}}}' localhost:9200/qlq_user/_search

{"took":4,"timed_out":false,"_shards":{"total":1,"successful":1,"skipped":0,"failed":0},"hits":{"total":{"value":1,"relation":"eq"},"max_score":0.3616575,"hits":[{"_index":"qlq_user","_type":"_doc","_id":"Q38ee48BpAUI2PvOZWk9","_score":0.3616575,"_source":{"name":"张三","school":"Beijing Xicheng Middle School","age":30,"address":"北京市朝阳区","location":{"lat":39.9075,"lon":116.39723},"birth_date":"1990-01-01","birth_date_value":631120800000,"likes":["读书","旅行"],"feature":[{"height":175.5,"weight":70.0}],"salary":{"gte":5000,"lte":10000},"well_person":true}}]}}

5. 删除索引

接口：

org.elasticsearch.action.admin.indices.delete.TransportDeleteIndexAction#doExecute

测试：

xxx % curl -X DELETE localhost:9200/qlq_user
{"acknowledged":true}

4. 错误：

Gradle JVM 参数错误

错误信息：

Unrecognized option: --add-exports
Error: Could not create the Java Virtual Machine.
Error: A fatal exception has occurred. Program will exit.

-----------------------
Check the JVM arguments defined for the gradle process in:
 - gradle.properties in project root directory

原因：我一开始用的JDK8 版本比较低，导致JVM参数不符合。

修复：调整为高版本JDK，我这里用17.

编译相关tar 报错

错误：
Could not determine the dependencies of task ':x-pack:plugin:ml:bundlePlugin'.
> Could not resolve all task dependencies for configuration ':x-pack:plugin:ml:nativeBundle'.
   > Could not resolve org.elasticsearch.ml:ml-cpp:7.17.8-SNAPSHOT.
     Required by:
         project :x-pack:plugin:ml
      > Could not resolve org.elasticsearch.ml:ml-cpp:7.17.8-SNAPSHOT.
         > Could not get resource 'https://artifacts-snapshot.elastic.co/ml-cpp/7.17.8-SNAPSHOT/downloads/ml-cpp/ml-cpp-7.17.8-SNAPSHOT.zip'.
            > Could not HEAD 'https://artifacts-snapshot.elastic.co/ml-cpp/7.17.8-SNAPSHOT/downloads/ml-cpp/ml-cpp-7.17.8-SNAPSHOT.zip'.
               > Connect to 127.0.0.1:33210 [/127.0.0.1] failed: Connection refused
解决办法：

5. 源码以集群方式启动

启动三个节点，原来shell 脚本启动方式如下：

sh elasticsearch -Ehttp.port=9200 -Epath.data=/Users/qiao-zhi/app/software/elk/data/0 -Epath.logs=/Users/qiao-zhi/app/software/elk/log/0 -Enode.roles=data 
sh elasticsearch -Ehttp.port=9201 -Epath.data=/Users/qiao-zhi/app/software/elk/data/1 -Epath.logs=/Users/qiao-zhi/app/software/elk/log/1 -Enode.roles=master 
sh elasticsearch -Ehttp.port=9202 -Epath.data=/Users/qiao-zhi/app/software/elk/data/2 -Epath.logs=/Users/qiao-zhi/app/software/elk/log/2

        // 设置data目录和日志文件目录
//        System.setProperty("es.path.data", "/Users/xxx/Desktop/es_file/es-7.17.8/0/data"); // 设置 Elasticsearch 的【根】目录
//        System.setProperty("es.path.logs", "/Users/xxx/Desktop/es_file/es-7.17.8/0/logs");  // 设置 Elasticsearch 的【配置】目录

JVM 启动参数设置(允许并发执行)

-Ehttp.port=9201 -Enode.name=node1 -Epath.data=/Users/xxx/Desktop/es_file/es-7.17.8/1/data -Epath.logs=/Users/xxx/Desktop/es_file/es-7.17.8/1/log -Enode.roles=master

-Ehttp.port=9200 -Enode.name=node2 -Epath.data=/Users/xxx/Desktop/es_file/es-7.17.8/0/data -Epath.logs=/Users/xxx/Desktop/es_file/es-7.17.8/0/log -Enode.roles=data

-Ehttp.port=9202 -Enode.name=node3 -Epath.data=/Users/xxx/Desktop/es_file/es-7.17.8/2/data -Epath.logs=/Users/xxx/Desktop/es_file/es-7.17.8/2/log

启动后查看集群信息

GET /_cat/nodes?v
---
ip        heap.percent ram.percent cpu load_1m load_5m load_15m node.role   master name
127.0.0.1            3          99  35    3.53                  d           -      node2
127.0.0.1            3          99  35    3.53                  cdfhilmrstw -      node3
127.0.0.1            4          99  35    3.53                  m           *      node1

参考

https://www.iocoder.cn/Elasticsearch/build-debugging-environment/

标签：7.17,启动,ml,gradle,源码,elasticsearch,data,es
From： https://www.cnblogs.com/qlqwjy/p/18246591

es源码启动