What's new in 8.7?
https://www.elastic.co/guide/en/elasticsearch/reference/8.7/release-highlights.html , orther versions:8.6 | 8.5 | 8.4 | 8.3 | 8.2 | 8.1 | 8.0
Time series (TSDS) GA (时间序列)
Time Series Data Stream (TSDS) is a feature for optimizing Elasticsearch indices for time series data. This involves sorting the indices to achieve better compression and using synthetic _source to reduce index size. As a result, TSDS indices are significantly smaller than non-time_series indices that contain the same data. TSDS is particularly useful for managing time series data with high volume.
时间序列数据流(TSDS)是用于优化时间序列数据的 Elasticsearch 索引的一个特性。这涉及到对索引进行排序以实现更好的压缩,并使用综合 _source 来减少索引大小。因此, TSDS 指数明显小于包含相同数据的非时间序列指数。TSDS 对于管理大容量的时间序列数据特别有用。
Downsampling GA (降采样GA)
Downsampling is a feature that reduces the number of stored documents in Elasticsearch time series indices, resulting in smaller indices and improved query latency. This optimization is achieved by pre-aggregating time series indices, using the time_series index schema to identify the time series. Downsampling is configured as an action in ILM, making it a useful tool for managing large volumes of time series data in Elasticsearch.
降采样(Downsampling)是 Elasticsearch 中的一项功能,它可以减少时间序列索引中存储的文档数量,从而降低索引大小并提高查询响应速度。通过使用时间序列索引架构识别时间序列并进行预聚合,实现这种优化。降采样是在 ILM 中配置的一个操作,可用于管理 Elasticsearch 中的大量时间序列数据,是一个非常有用的工具。 通过预先聚合数据,降采样减少了查询时需要进行的计算量,从而提高了查询响应速度。此外,由于降采样过程减少了索引文档的数量,可以减少索引存储空间的要求,对于大规模部署来说非常重要。 总的来说,降采样是 Elasticsearch 中的一个强大功能,可帮助用户优化它们的时间序列数据存储和分析能力,并提高其整体查询性能。
Geohex aggregations on both 都是地球六角形聚集体 geo_point and 还有 geo_shape
fields 田野
edit编辑
geo_shape
fields 田野Previously Elasticsearch 8.1.0 expanded geo_grid aggregation support from rectangular tiles (geotile and geohash) to include hexagonal tiles, but for
geo_point only. Now Elasticsearch 8.7.0 will support Geohex aggregations over
geo_shape as well, which completes the long desired need to perform hexagonal aggregations on spatial data.
之前的 Elasticsearch 8.1.0扩展了 geo _ grid 聚合支持,从矩形瓦片(土工瓦和地质哈希)到包括六角形瓦片,但只针对 geo _ point。现在 Elasticsearch 8.7.0也将支持 geo _ form 上的 Geohex 聚合,这完成了对空间数据执行六边形聚合的长期需求。
2018年,优步(Uber)宣布开放了自己的 h3库,使地球正六边形镶嵌能够更好地分析自己的流量和区域定价模型。使用六角瓦片进行分析已经变得越来越流行,因为每个瓦片代表地球上非常相似的地理区域,以及瓦片中心之间的距离在所有方向上都非常相似,并且在整个地图上都是一致的。现在所有 Elasticsearch 用户都可以享受这些好处。
Allow more than one KNN search clause允许多个 KNN 搜索子句edit编辑
Some vector search scenarios require relevance ranking using a few kNN clauses, e.g. when ranking based on several fields, each with its own vector, or when a document includes a vector for the image and another vector for the text. The user may want to obtain relevance ranking based on a combination of all of these kNN clauses.
一些向量搜索场景需要使用几个 kNN 子句进行相关性排序,例如,当基于几个字段进行排序时,每个字段都有自己的向量,或者当文档包含图像的向量和文本的另一个向��时。用户可能希望基于所有这些 kNN 子句的组合获得相关性排名。
Make natural language processing GA制作自然语言处理 GAedit编辑
From 8.7, NLP model management, model allocation, and support for inference against third party models are generally available. (The new text_embedding extension to
knn search is still in technical preview.)
从8.7开始,NLP 模型管理、模型分配和支持对第三方模型的推理通常是可用的。(knn 搜索的新文本嵌入扩展仍处于技术预览阶段。)
Speed up ingest geoip processors加速摄取地理位置处理器edit编辑
The geoip
ingest processor is significantly faster.
Geoip 摄取处理器明显更快。
Previous versions of the geoip library needed special permission to execute databinding code, requiring an expensive permissions check and AccessController.doPrivileged call. The current version of the geoip library no longer requires that, however, so the expensive code has been removed, resulting in better performance for the ingest geoip processor.
以前版本的 Geoip 库需要特殊权限来执行数据绑定代码,需要昂贵的权限检查和 AccessController.doPrivileged 调用。然而,当前版本的 Geoip 库不再需要这个功能,因此删除了昂贵的代码,从而为摄取的 Geoip 处理器带来了更好的性能。
Speed up ingest set and append processors加速摄取集和追加处理器edit编辑
The set
and append ingest processors that use mustache templates are significantly faster.
设置和追加使用胡子模板的摄取处理器要快得多。
Improved downsampling performance改进的下采样性能edit编辑
Several improvements were made to the performance of downsampling. All hashmap lookups were removed. Also metrics/label producers were modified so that they extract the doc_values directly from the leaves. This allows for extra optimizations for cases such as labels/counters that do not extract doc_values unless they are consumed. Those changes yielded a 3x-4x performance improvement of the downsampling operation, as measured by our benchmarks.
对下采样的性能作了一些改进。所有的散列表查找都被删除了。还修改了度量/标签生成器,以便它们直接从叶子中提取 doc _ value。这允许对诸如标签/计数器之类的情况进行额外的优化,除非使用了 doc _ value,否则它们不会提取 doc _ value。根据我们的基准测试,这些变化使得下采样操作的性能提高了3-4倍。
The Health API is now generally availableHealthAPI 现在普遍可用edit编辑
Elasticsearch introduces a new Health API designed to report the health of the cluster. The new API provides both a high level overview of the cluster health, and a very detailed report that can include a precise diagnosis and a resolution.
Elasticsearch 引入了一个新的 HealthAPI,旨在报告集群的健康状况。新的 API 既提供了集群健康状况的高级概述,也提供了包括精确诊断和解决方案的非常详细的报告。
Improved performance for get, mget and indexing with explicit `_id`s使用显式的“ _ id”改进了 get、 mget 和索引的性能edit编辑
The false positive rate for the bloom filter on the _id field was reduced from ~10% to ~1%, reducing the I/O load if a term is not present in a segment. This improves performance when retrieving documents by
_id
, which happens when performing get or mget requests, or when issuing _bulk requests that provide explicit `_id`s.
开花滤波器的假阳性率从 ~ 10% 降低到 ~ 1% ,减少了 I/O 负荷,如果一个项目没有出现在一个段。这提高了通过 _ id 检索文档时的性能,这在执行 get 或 mget 请求时发生,或者在发出提供显式‘ _ id’的 _ mass 请求时发生。
Speed up ingest processing with multiple pipelines使用多个管道加速摄取处理edit编辑
Processing documents with both a request/default and a final pipeline is significantly faster.
处理同时具有请求/默认值和最终管道的文档要快得多。
Rather than marshalling a document from and to json once per pipeline, a document is now marshalled from json before any pipelines execute and then back to json after all pipelines have executed.
现在不需要在每个管道之前将文档从 json 封送到 json,而是在执行任何管道之前将文档从 json 封送到 json,然后在执行所有管道之后将文档返回到 json。
Support geo_grid ingest processor支持 geo _ grid 摄取处理器edit编辑
The geo_grid
ingest processor supports creating indexable geometries from geohash, geotile and H3 cells.
Geo _ grid 摄取处理器支持从 Geohash、 Geotiles 和 H3单元创建可索引的几何图形。
There already exists a circle
ingest processor that creates a polygon from a point and radius definition. This concept is useful when there is need to use spatial operations that work with indexable geometries on geometric objects that are not defined spatially (or at least not indexable by lucene). In this case, the string 4/8/5 does not have spatial meaning, until we interpret it as the address of a rectangular
geotile
, and save the bounding box defining its border for further use. Likewise we can interpret geohash strings like
u0
as a tile, and H3 strings like 811fbffffffffff as an hexagonal cell, saving the cell border as a polygon.
已经存在一个圆摄取处理器,它根据点和半径定义创建一个多边形。当需要使用空间操作时,这个概念非常有用,这些空间操作可以对没有空间定义的几何对象(或者至少不能被 Lucene 索引)使用可索引的几何形状。在这种情况下,字符串4/8/5没有空间意义,直到我们将其解释为一个矩形土工织物的地址,并保存定义其边界的边框以供进一步使用。同样,我们可以将 u0这样的地理哈希字符串解释为平铺字符串,将811fbffffffff 这样的 H3字符串解释为六边形单元格,从而将单元格边界保存为多边形。
Make 制造 frequent_item_sets aggregation GA 聚合 GA
edit编辑
The frequent_item_sets
aggregation has been moved from technical preview to general availability.
经常项目集聚合已经从技术预览转移到通用可用性。
Release time_series and rate (on counter fields) aggegations as tech preview作为技术预览发布时间序列和速率(在计数器字段上)聚合edit编辑
Make time_series
aggregation and rate
aggregation (on counter fields) available without using the time series feature flag. This change makes these aggregations available as tech preview.
使时间序列聚合和速率聚合(在计数器字段上)可用,而不使用时间序列特性标志。此更改使这些聚合可以作为技术预览。
Currently there is no documentation about the time_series aggregation. This will be added in a followup change.
目前没有关于 time _ Series 聚合的文档,这将在后续更改中添加。
标签:What,8.7,edit,series,精讲,索引,Elasticsearch,time,geo From: https://www.cnblogs.com/zuoyang/p/17366742.html