首页 > 其他分享 >splunk 索引过程

splunk 索引过程

时间:2023-06-02 17:36:46浏览次数:68  
标签:data splunk indexing 索引 Enterprise Splunk peer nodes 过程

术语:

Event :Events are records of activity in log files, stored in Splunk indexes. 简单说,处理的日志或话单中中一行记录就是一个Event;
Source type: 来源类型,identifies the format of the data,简单说,一种特定格式的日志,可以定义为一种source type;Splunk默认提供有500多种确定格式数据的type,包括apache log、常见OS的日志、Cisco等网络设备的日志等;
Index: The index is the repository for Splunk Enterprise data. Splunk transforms incoming data into events, which it stores in indexes. 有两层含义:一是数据物理存储上的表达,也是一个数据处理的动作表达:Splunk indexes your data,这个过程会产生两类数据:
The raw data in compressed form (rawdata)
Indexes that point to the raw data, plus some metadata files (index files)
Indexer: An indexer is a Splunk Enterprise instance that indexes data. 通常说的索引概念,也是对Splunk中“Indexer”这个特定模块的称呼,是一种Splunk Enterprise Instance;
Bucket: Index储存的两类数据按照age组织为不同的目录,称为buckets;

职责——具体再见后文图:

Search Head:前端搜索;
Deployment Server:相当于配置管理中心,对其它节点统一管理;

Forwarder:负责收集、预处理和前转数据至Indexer(consume data and forward it on to indexers),配合构成类似Flume的Agent和Collector的机制;动作包括:
· Tagging of metadata (source, sourcetype, and host)
· Configurable buffering
· Data compression
· SSL security
· Use of any available network ports
· Running scripted inputs locally

注意:转发器可以传输三种类型的数据:原始、未解析、已解析。转发器可以发送的数据类型取决于转发器类型以及配置方式。通用转发器和轻型转发器可以发送原始或未解析

的数据。重型转发器可以发送原始或解析的数据。

Indexer:负责对数据“索引化”处理,即indexing process,也可称为event processing;包括:
· Separating the datastream into individual, searchable events.(分行)
· Creating or identifying timestamps. (识别时间戳)
· Extracting fields such as host, source, and sourcetype. (外置公共字段处理)
· Performing user-defined actions on the incoming data, such as identifying custom fields, masking sensitive data, writing new or modified keys, applying breaking rules for multi-line events, filtering unwanted events, and routing events to specified indexes or servers.

Parts of an indexer cluster——分布式部署

An indexer cluster is a group of Splunk Enterprise instances, or nodes, that, working in concert, provide a redundant indexing and searching capability. Each cluster has three types of nodes:

The master node manages the cluster. It coordinates the replicating activities of the peer nodes and tells the search head where to find data. It also helps manage the configuration of peer nodes and orchestrates remedial activities if a peer goes down.

The peer nodes receive and index incoming data, just like non-clustered, stand-alone indexers. Unlike stand-alone indexers, however, peer nodes also replicate data from other nodes in the cluster. A peer node can index its own incoming data while simultaneously storing copies of data from other nodes. You must have at least as many peer nodes as the replication factor. That is, to support a replication factor of 3, you need a minimum of three peer nodes.

The search head runs searches across the set of peer nodes. You must use a search head to manage searches across indexer clusters.——将搜索请求发给indexer节点,然后合并搜索请求

For most purposes, it is recommended that you use forwarders

Here is a diagram of a basic, single-site indexer cluster, containing three peer nodes and supporting a replication factor of 3:

splunk 索引过程_ide

This diagram shows a simple deployment, similar to a small-scale non-clustered deployment, with some forwarders sending load-balanced data to a group of indexers (peer nodes), and the indexers sending search results to a search head. There are two additions that you don't find in a non-clustered deployment:

  •  The indexers are streaming copies of their data to other indexers.
  •  The master node, while it doesn't participate in any data streaming, coordinates a range of activities involving the search peers and the search head.

How indexing works

Splunk Enterprise can index any type of time-series data (data with timestamps). When Splunk Enterprise indexes data, it breaks it into events, based on the timestamps.

Event processing

Event processing occurs in two stages, parsing and indexing. All data that comes into Splunk Enterprise enters through the parsing pipeline as large (10,000 bytes) chunks. During parsing, Splunk Enterprise breaks these chunks into events which it hands off to the indexing pipeline, where final processing occurs.

While parsing, Splunk Enterprise performs a number of actions, including:

  •  Extracting a set of default fields for each event, including hostsource, and sourcetype.
  •  Configuring character set encoding.
  •  Identifying line termination using linebreaking rules. While many events are short and only take up a line or two, others can be long.
  •  Identifying timestamps or creating them if they don't exist. At the same time that it processes timestamps, Splunk identifies event boundaries.
  •  Splunk can be set up to mask sensitive event data (such as credit card or social security numbers) at this stage. It can also be configured toapply custom metadata to incoming events.

In the indexing pipeline, Splunk Enterprise performs additional processing, including:

  •  Breaking all events into segments that can then be searched upon. You can determine the level of segmentation, which affects indexing and searching speed, search capability, and efficiency of disk compression.
  •  Building the index data structures.
  •  Writing the raw data and index files to disk, where post-indexing compression occurs.

The breakdown between parsing and indexing pipelines is of relevance mainly when deploying forwardersHeavy forwarders can parse data and then forward the parsed data on to indexers for final indexing. Some source types - those that reference structured data - require configuration on the forwarder prior to indexing. See "Extract data from files with headers".

For more information about events and what happens to them during the indexing process, see the chapter "Configure event processing" in the Getting Data In Manual.

Note: Indexing is an I/O-intensive process.

This diagram shows the main processes inherent in indexing:

splunk 索引过程_ide_02

Note: This diagram represents a simplified view of the indexing architecture. It provides a functional view of the architecture and does not fully describe Splunk Enterprise internals. In particular, the parsing pipeline actually consists of three pipelines: parsingmerging, and typing, which together handle the parsing function. The distinction can matter during troubleshooting, but does not generally affect how you configure or deploy Splunk Enterprise.

How indexer acknowledgment works

In brief, indexer acknowledgment works like this: The forwarder sends data continuously to the receiving peer, in blocks of approximately 64kB. The forwarder maintains a copy of each block in memory until it gets an acknowledgment from the peer. While waiting, it continues to send more data blocks.

If all goes well, the receiving peer:

1. receives the block of data, parses and indexes it, and writes the data (raw data and index data) to the file system.

2. streams copies of the raw data to each of its target peers.

3. sends an acknowledgment back to the forwarder.

The acknowledgment assures the forwarder that the data was successfully written to the cluster. Upon receiving the acknowledgment, the forwarder releases the block from memory.

If the forwarder does not receive the acknowledgment, that means there was a failure along the way. Either the receiving peer went down or that peer was unable to contact its set of target peers. The forwarder then automatically resends the block of data. If the forwarder is using load-balancing, it sends the block to another receiving node in the load-balanced group. If the forwarder is not set up for load-balancing, it attempts to resend data to the same node as before.

Important: To ensure end-to-end data fidelity, you must explicitly enable indexer acknowledgment for each forwarder that's sending data to the cluster, as described earlier in this topic. If end-to-end data fidelity is not a requirement for your deployment, you can skip this step.

For more information on how indexer acknowledgment works, read "Protect against loss of in-flight data" in the Forwarding Data manual.

 

标签:data,splunk,indexing,索引,Enterprise,Splunk,peer,nodes,过程
From: https://blog.51cto.com/u_11908275/6404279

相关文章

  • splunk的统计分析功能——特定字段的统计功能包括取值分布(+topK,min/max/平均值)
    特定字段的统计功能——取值分布,topK,min/max/平均值例如:date_second60值,100%的事件时段平均值时段最大值时段最小值上限值时段上限值罕见值具有此字段的事件平均: 30.963998最小值: 0最大值: 59标准 偏差: 17.300073前10个值计数% 50643.032% 51502.368% 22492.321%......
  • 索引库操作
    索引库就类似数据库表,mapping映射就类似表的结构。我们要向es中存储数据,必须先创建“库”和“表”。mapping映射属性mapping是对索引库中文档的约束,常见的mapping属性包括:type:字段数据类型,常见的简单类型有:字符串:text(可分词的文本)、keyword(精确值,例如:品牌、国家、ip地址)数......
  • 面试官:说下你对方法区演变过程和内部结构的理解
    之前我们已经了解过“运行时数据区”的程序计数器、虚拟机栈、本地方法栈和堆空间,今天我们就来了解一下最后一个模块——方法区。简介创建对象时内存分配简图《Java虚拟机规范》中明确说明:“尽管所有的方法区在逻辑上属于堆的一部分,但一些简单的实现可能不会选择去进行垃圾收集或......
  • 聚集索引和非聚集索引
    一、深入浅出理解索引结构实际上,您可以把索引理解为一种特殊的目录。微软的SQL SERVER提供了两种索引:聚集索引(clustered index,也称聚类索引、簇集索引)和非聚集索引(nonclustered index,也称非聚类索引、非簇集索引)。下面,我们举例来说明一下聚集索引和非聚集索引的区别:......
  • 商业模式:收入、价值与过程
    如上,只说了求橙商学院,商业模式课内的一个小练习,这篇稍稍展开到浙大的郭斌教授第一个半天的核心内容。商业的基础是市场经济,市场经济的基本行为是交易,交易的本质是价值交换,所以,我们可以用如下一句话来概括所有的商业模式:为谁提供什么价值,靠什么方式转换为收入,前两者之间做什么商业过......
  • 强化学习基础篇【1】:基础知识点、马尔科夫决策过程、蒙特卡洛策略梯度定理、REINFORCE
    强化学习基础篇【1】:基础知识点、马尔科夫决策过程、蒙特卡洛策略梯度定理、REINFORCE算法1.强化学习基础知识点智能体(agent):智能体是强化学习算法的主体,它能够根据经验做出主观判断并执行动作,是整个智能系统的核心。环境(environment):智能体以外的一切统称为环境,环境在与智能体......
  • SeaTunnel V2.3.1源码分析--zeta引擎启动过程分析
    今天主要看SeaTunnel自研的数据同步引擎,叫Zeta。首先,如果使用的是zeta引擎,那么第一步一定是运行bin/seatunnel-cluster.sh脚本,这个脚本就是启动zeta的服务端的。打开seatunnel-cluster.sh看看,可以看到其实是去启动seatunnel-core/seatunnel-starter/src/main/java/org/apache/se......
  • python包上传到pypi过程
    python包上传到pypi过程提示:文章写完后,目录可以自动生成,如何生成可参考右边的帮助文档文章目录python包上传到pypi过程前言一、pypi是什么?二、使用步骤1.创建目录结构2.创建pyproject.toml3.创建README.md4.创建许可证5.打包6.注册pypi账号和testpypi账号7.上传到testpypi8......
  • 微软自动化框架Playwright学习和使用-脚本录制和回访过程
    接上回,可以使用 playwright inspector 来进行脚本录制。今天就说下具体的录制步骤。playwright inspector都会显示出来。   1.点击PlaywrightInspector中的 Record按钮,开始录制  2.点击 Record按钮后,Record按钮显示为红色,代表已经开始录制了。这时将鼠标移动到要测......
  • 通俗易举例说明面向对象和面向过程有什么区别
    一. 面向对象1. 概念可以说,在面向对象的编程规范中,“一切皆对象”,对象就是面向对象编程的核心。我们把现实世界中的一个个对象或物体,抽象地体现在编程世界中。就好比我们想驾驶一辆奥迪A6,A6就是一个对象,制造商给A6赋予了各种功能和特性,我们就可以调用这个对象完成一系列操控。所......