Source
AVRO Source
1,AVRO Source监听指定端口,接收被AVRO序列化之后的数据
2,结合AVRO Sink可以实现多级扇入扇出流动
a1.sources = s1
a1.channels = c1
a1.sinks = k1
# 配置AVRO Source
a1.sources.s1.type = avro
# 要监听的主机名或者IP地址
a1.sources.s1.bind = hadoop01
# 要监听的端口
a1.sources.s1.port = 8888
a1.channels.c1.type = memory
a1.sinks.k1.type = logger
a1.sources.s1.channels = c1
a1.sinks.k1.channel = c1
Spooling Directory Source
1,Spooling Directory Source监听指定的目录,如果目录中产生新的文件,那么自动收集新文件中的内容
a1.sources = s1
a1.channels = c1
a1.sinks = k1
# 配置Spooling Directory Source
a1.sources.s1.type = spooldir
# 要监听的目录
a1.sources.s1.spoolDir = /opt/flume_data
# 被收集完的文件的后缀
a1.sources.s1.fileSuffix = .finished
a1.channels.c1.type = memory
a1.sinks.k1.type = logger
a1.sources.s1.channels = c1
a1.sinks.k1.channel = c1
Taildir Source
1,可以利用Exec Source监听指定文件,利用Spooling Directory Source监听指定目录,Tairdir Source监听一组或多组问价
2,Taildir Source不支持在Windows系统使用
a1.sources = s1
a1.channels = c1
a1.sinks = k1
# 配置Taildir Source
a1.sources.s1.type = TAILDIR
# 给要监听的文件组起名
a1.sources.s1.filegroups = f1 f2
# 要监听的文件名
# 监听指定目录下所有的txt文件
a1.sources.s1.filegroups.f1 = /opt/flume_data/.*txt.*
# 监听指定目录下所有的log文件
a1.sources.s1.filegroups.f2 = /opt/flume_data/.*log.*
# 索引文件的存储位置
a1.sources.s1.positionFile = /opt/flume_data/taildir_position.json
a1.channels.c1.type = memory
a1.sinks.k1.type = logger
a1.sources.s1.channels = c1
a1.sinks.k1.channel = c1
HTTP Source
1,HTTP Source监听HTTP请求,将请求的内容作为日志进行收集
2,HTTP Source只能监听GET和POST请求,GET请求监听不稳定,所以一般只用于监听POST请求
a1.sources = s1
a1.channels = c1
a1.sinks = k1
# 配置HTTP Source
a1.sources.s1.type = http
# 要监听的端口
a1.sources.s1.port = 8080
a1.channels.c1.type = memory
a1.sinks.k1.type = logger
a1.sources.s1.channels = c1
a1.sinks.k1.channel = c1
Custom Source
1,Flume中,自定义Source分为两种
被动型Source,需要用户自己定义线程来获取并封装相数据
主动型Source,提供了线程获取数据,用户只需要考虑怎么封装数据
2,实际过程中,还需要考虑获取数据文件中的属性,所以还需要实现Configurable接口
标签:Flume,sources,s1,a1,Source,理解,c1,监听 From: https://blog.csdn.net/m0_63130425/article/details/139191839