Elasticsearch 是一个开源的分布式搜索引擎,它能够快速地存储、搜索和分析大量的文本数据。它基于 Apache Lucene 构建,广泛应用于日志分析、全文搜索、推荐系统等场景。本文将详细介绍如何在 Java 项目中集成 Elasticsearch,包括如何配置、索引文档、查询数据、以及与 Elasticsearch 进行交互的常见操作。
1. 环境准备
在开始使用 Elasticsearch 之前,确保你已经安装了以下工具:
- Elasticsearch:你可以通过 官方文档 下载并安装 Elasticsearch。
- JDK:确保你已安装 Java 8 或更高版本。可以通过
java -version
命令查看 Java 版本。 - Maven:Maven 是 Java 的构建工具,你可以通过 官网 安装它。
1.1 启动 Elasticsearch
-
下载并解压 Elasticsearch。
-
在解压目录中打开终端,运行以下命令启动 Elasticsearch:
bin/elasticsearch
默认情况下,Elasticsearch 会在 localhost:9200
上启动。如果启动成功,你应该能够通过浏览器或 curl 命令访问到它:
curl -X GET "localhost:9200/"
你会看到类似如下的响应:
{
"name" : "your-node-name",
"cluster_name" : "elasticsearch",
"cluster_uuid" : "some-uuid",
"version" : {
"number" : "7.x.x",
"build_flavor" : "default",
"build_type" : "deb",
"build_hash" : "somehash",
"build_date" : "2020-10-01T12:34:56.789Z",
"build_snapshot" : false,
"lucene_version" : "8.x.x",
"minimum_wire_compatibility_version" : "7.x.x",
"minimum_index_compatibility_version" : "7.x.x"
},
"tagline" : "You Know, for Search"
}
-
这样,Elasticsearch 就已启动并运行。
2. 添加依赖
Elasticsearch 提供了多个官方客户端,常用的客户端有 RestHighLevelClient 和 Elasticsearch Java Client。在这篇文章中,我们将使用 RestHighLevelClient
来进行操作。
首先,确保在 Maven 的 pom.xml
文件中添加以下依赖:
<dependencies>
<!-- Elasticsearch Rest High Level Client -->
<dependency>
<groupId>org.elasticsearch.client</groupId>
<artifactId>elasticsearch-rest-high-level-client</artifactId>
<version>7.10.0</version> <!-- 根据你本地的版本选择合适的版本 -->
</dependency>
<!-- Elasticsearch Core -->
<dependency>
<groupId>org.elasticsearch</groupId>
<artifactId>elasticsearch</artifactId>
<version>7.10.0</version>
</dependency>
<!-- Jackson用于JSON处理 -->
<dependency>
<groupId>com.fasterxml.jackson.core</groupId>
<artifactId>jackson-databind</artifactId>
<version>2.10.3</version>
</dependency>
<!-- Log4j2 用于日志输出 -->
<dependency>
<groupId>org.apache.logging.log4j</groupId>
<artifactId>log4j-core</artifactId>
<version>2.13.3</version>
</dependency>
</dependencies>
3. 创建 Elasticsearch 客户端
在 Java 项目中使用 Elasticsearch,首先需要创建 RestHighLevelClient
客户端,它用于与 Elasticsearch 进行交互。
3.1 创建客户端
import org.apache.http.HttpHost;
import org.elasticsearch.client.RestClient;
import org.elasticsearch.client.RestHighLevelClient;
public class ElasticsearchClient {
public static RestHighLevelClient createClient() {
return new RestHighLevelClient(
RestClient.builder(
new HttpHost("localhost", 9200, "http")));
}
public static void main(String[] args) {
RestHighLevelClient client = createClient();
System.out.println("Elasticsearch client created successfully!");
// 不要忘记关闭客户端
try {
client.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
RestClient.builder(new HttpHost("localhost", 9200, "http"))
:通过HttpHost
构建与 Elasticsearch 的连接,指定服务器地址和端口。client.close()
:关闭客户端连接,释放资源。
4. 创建索引和添加文档
4.1 创建索引
在 Elasticsearch 中,数据被存储为文档,文档属于索引(类似于数据库中的表)。我们首先创建一个索引,用于存储数据。
import org.elasticsearch.action.admin.indices.create.CreateIndexRequest;
import org.elasticsearch.client.RequestOptions;
import org.elasticsearch.common.xcontent.XContentType;
public class ElasticsearchExample {
public static void createIndex(RestHighLevelClient client) throws Exception {
CreateIndexRequest request = new CreateIndexRequest("my_index");
String jsonString = "{\n" +
" \"settings\": {\n" +
" \"number_of_shards\": 3,\n" +
" \"number_of_replicas\": 2\n" +
" },\n" +
" \"mappings\": {\n" +
" \"properties\": {\n" +
" \"name\": {\"type\": \"text\"},\n" +
" \"age\": {\"type\": \"integer\"},\n" +
" \"joined\": {\"type\": \"date\"}\n" +
" }\n" +
" }\n" +
"}";
request.source(jsonString, XContentType.JSON);
client.indices().create(request, RequestOptions.DEFAULT);
System.out.println("Index 'my_index' created successfully.");
}
public static void main(String[] args) throws Exception {
RestHighLevelClient client = ElasticsearchClient.createClient();
createIndex(client);
client.close();
}
}
CreateIndexRequest
用于创建索引。request.source
方法传入索引的设置(如分片数和副本数)以及映射定义(字段类型)。- 使用
client.indices().create()
创建索引。
4.2 添加文档
接下来,我们可以向索引中添加文档。
import org.elasticsearch.action.index.IndexRequest;
import org.elasticsearch.client.RequestOptions;
import org.elasticsearch.index.IndexNotFoundException;
import java.util.HashMap;
import java.util.Map;
public class ElasticsearchExample {
public static void addDocument(RestHighLevelClient client) throws Exception {
Map<String, Object> jsonMap = new HashMap<>();
jsonMap.put("name", "John Doe");
jsonMap.put("age", 29);
jsonMap.put("joined", "2024-11-06");
IndexRequest request = new IndexRequest("my_index")
.id("1") // 文档 ID(可选)
.source(jsonMap);
client.index(request, RequestOptions.DEFAULT);
System.out.println("Document added successfully.");
}
public static void main(String[] args) throws Exception {
RestHighLevelClient client = ElasticsearchClient.createClient();
addDocument(client);
client.close();
}
}
IndexRequest
用于构建要插入的文档。request.source
接收一个Map
,包含要索引的字段和值。client.index(request, RequestOptions.DEFAULT)
执行添加文档的操作。
5. 查询文档
5.1 基本查询
import org.elasticsearch.action.search.SearchRequest;
import org.elasticsearch.action.search.SearchResponse;
import org.elasticsearch.index.query.QueryBuilders;
import org.elasticsearch.search.builder.SearchSourceBuilder;
import org.elasticsearch.search.hits.SearchHit;
public class ElasticsearchExample {
public static void searchDocument(RestHighLevelClient client) throws Exception {
// 创建查询请求,指定要查询的索引
SearchRequest searchRequest = new SearchRequest("my_index");
// 使用 SearchSourceBuilder 构建查询条件
SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();
sourceBuilder.query(QueryBuilders.matchQuery("name", "John")); // 查找 name 字段包含 "John" 的文档
searchRequest.source(sourceBuilder);
// 执行查询
SearchResponse response = client.search(searchRequest, RequestOptions.DEFAULT);
// 打印查询结果
System.out.println("Search results: ");
for (SearchHit hit : response.getHits()) {
System.out.println(hit.getSourceAsString()); // 输出文档的 JSON 数据
}
}
public static void main(String[] args) throws Exception {
RestHighLevelClient client = ElasticsearchClient.createClient();
searchDocument(client);
client.close();
}
}
在上述代码中:
SearchRequest
用于指定查询的索引,这里是"my_index"
。SearchSourceBuilder
用于构建查询体,我们使用matchQuery
来查找name
字段中包含 "John" 的文档。client.search()
执行查询操作,返回SearchResponse
。- 通过
response.getHits()
获取查询结果,然后遍历SearchHit
来输出每个匹配文档的内容。
5.2 高级查询:范围查询
Elasticsearch 支持多种类型的查询,比如范围查询。以下是一个范围查询的例子,用来查找年龄大于 25 的所有文档:
import org.elasticsearch.index.query.QueryBuilders;
public class ElasticsearchExample {
public static void searchRange(RestHighLevelClient client) throws Exception {
SearchRequest searchRequest = new SearchRequest("my_index");
SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();
sourceBuilder.query(QueryBuilders.rangeQuery("age").gt(25)); // 查找 age > 25 的文档
searchRequest.source(sourceBuilder);
SearchResponse response = client.search(searchRequest, RequestOptions.DEFAULT);
System.out.println("Range search results: ");
for (SearchHit hit : response.getHits()) {
System.out.println(hit.getSourceAsString());
}
}
public static void main(String[] args) throws Exception {
RestHighLevelClient client = ElasticsearchClient.createClient();
searchRange(client);
client.close();
}
}
在这个例子中:
rangeQuery("age").gt(25)
表示查找age
字段大于 25 的所有文档。- 你可以使用
lt()
、gte()
、lte()
等方法来设置不同的查询条件。
5.3 聚合查询
Elasticsearch 还支持聚合查询,用于计算统计信息,例如计算某个字段的平均值、总和等。下面是一个简单的聚合查询示例,计算所有文档中年龄字段的平均值。
import org.elasticsearch.action.search.SearchRequest;
import org.elasticsearch.action.search.SearchResponse;
import org.elasticsearch.search.aggregations.AggregationBuilders;
import org.elasticsearch.search.aggregations.terms.Terms;
import org.elasticsearch.search.builder.SearchSourceBuilder;
import org.elasticsearch.search.aggregations.Aggregation;
public class ElasticsearchExample {
public static void aggregationExample(RestHighLevelClient client) throws Exception {
SearchRequest searchRequest = new SearchRequest("my_index");
SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();
// 聚合查询:计算 age 字段的平均值
sourceBuilder.aggregation(AggregationBuilders.avg("avg_age").field("age"));
searchRequest.source(sourceBuilder);
// 执行查询并获取响应
SearchResponse response = client.search(searchRequest, RequestOptions.DEFAULT);
// 输出聚合结果
Aggregation avgAggregation = response.getAggregations().get("avg_age");
double avgAge = ((org.elasticsearch.search.aggregations.metrics.Avg) avgAggregation).getValue();
System.out.println("Average age: " + avgAge);
}
public static void main(String[] args) throws Exception {
RestHighLevelClient client = ElasticsearchClient.createClient();
aggregationExample(client);
client.close();
}
}
在这个例子中:
AggregationBuilders.avg("avg_age").field("age")
创建了一个聚合,计算所有文档中age
字段的平均值。- 通过
response.getAggregations()
获取聚合结果,然后提取出计算的平均值。
6. 更新和删除文档
6.1 更新文档
可以使用 UpdateRequest
来更新已存在的文档。这里我们演示如何更新文档中的某个字段:
import org.elasticsearch.action.update.UpdateRequest;
import org.elasticsearch.client.RequestOptions;
import org.elasticsearch.common.xcontent.XContentType;
public class ElasticsearchExample {
public static void updateDocument(RestHighLevelClient client) throws Exception {
UpdateRequest updateRequest = new UpdateRequest("my_index", "1"); // 指定索引和文档ID
String jsonString = "{ \"doc\": { \"age\": 30 } }"; // 更新内容:将 age 改为 30
updateRequest.doc(jsonString, XContentType.JSON);
client.update(updateRequest, RequestOptions.DEFAULT);
System.out.println("Document updated successfully.");
}
public static void main(String[] args) throws Exception {
RestHighLevelClient client = ElasticsearchClient.createClient();
updateDocument(client);
client.close();
}
}
UpdateRequest
用于指定要更新的文档 ID 和索引。updateRequest.doc()
方法用于指定更新的字段和新值。
6.2 删除文档
你也可以使用 DeleteRequest
来删除文档:
import org.elasticsearch.action.delete.DeleteRequest;
public class ElasticsearchExample {
public static void deleteDocument(RestHighLevelClient client) throws Exception {
DeleteRequest deleteRequest = new DeleteRequest("my_index", "1"); // 指定要删除的文档 ID 和索引
client.delete(deleteRequest, RequestOptions.DEFAULT);
System.out.println("Document deleted successfully.");
}
public static void main(String[] args) throws Exception {
RestHighLevelClient client = ElasticsearchClient.createClient();
deleteDocument(client);
client.close();
}
}
DeleteRequest
用于删除指定 ID 的文档。