从指定的网络端口采集数据输出到控制台
进入官网,查看文档,setting up an agent,看到a simple example
使用Flume的关键就是写flume的agent配置文件
1. 配置source
2. 配置channel
3. 配置sink
4. 把以上三个组件串起来
文章目录
- 从指定的网络端口采集数据输出到控制台
- (1)例如:写一个example.conf配置文件,放置到flume的conf文件夹下
- (2)启动agent,可见官网文档starting an agent
- (3)使用telnet进行进行测试
(1)例如:写一个example.conf配置文件,放置到flume的conf文件夹下
# example.conf: A single-node Flume configuration
# Name the components on this agent
# 解析:a1代表agent的名称,r1代表source的名称,k1代表sink的名称,c1代表channel的名称
a1.sources = r1
a1.sinks = k1
a1.channels = c1
# Describe/configure the source
# 详见官网 NetCat TCP Source配置说明
a1.sources.r1.type = netcat
a1.sources.r1.bind = hadoop000
a1.sources.r1.port = 44444
# Describe the sink
# 详见官网 logger sink配置说明
a1.sinks.k1.type = logger
# Use a channel which buffers events in memory
# 详见官网 memory channel
a1.channels.c1.type = memory
#a1.channels.c1.capacity = 1000
#a1.channels.c1.transactionCapacity = 100
# Bind the source and sink to the channel
# source-channel-sink 一个source可以输出到多个channel,一个channel只能输出到一个sink
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
(2)启动agent,可见官网文档starting an agent
flume-ng agent命令用于启动一个agent
-n 即–name,代表agent名称,必填
-c 即–conf,代表在具体哪个目录下使用configs配置文件
-f 即–conf-file,代表指定使用哪个config配置文件的具体位置
-Dflume.root.logger=INFO,console 把日志信息打印到控制台上
$ bin/flume-ng agent
-n $agent_name -c conf -f conf/flume-conf.properties.template
例如:
flume-ng agent \
--name a1 \
--conf $FLUME_HOME/conf \
--conf-file $FLUME_HOME/conf/example.conf \
-Dflume.root.logger=INFO,console
报错:
2021-04-04 14:54:28,745 (lifecycleSupervisor-1-5) [ERROR - org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run(LifecycleSupervisor.java:253)] Unable to start EventDrivenSourceRunner: { source:org.apache.flume.source.NetcatSource{name:r1,state:IDLE} } - Exception follows.
org.apache.flume.FlumeException: java.net.BindException: Cannot assign requested address
at org.apache.flume.source.NetcatSource.start(NetcatSource.java:173)
at org.apache.flume.source.EventDrivenSourceRunner.start(EventDrivenSourceRunner.java:44)
at org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run(LifecycleSupervisor.java:251)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.net.BindException: Cannot assign requested address
at sun.nio.ch.Net.bind0(Native Method)
at sun.nio.ch.Net.bind(Net.java:433)
at sun.nio.ch.Net.bind(Net.java:425)
at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223)
at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:67)
at org.apache.flume.source.NetcatSource.start(NetcatSource.java:167)
... 9 more
解决:
hadoop000对应的ip地址不正确
(3)使用telnet进行进行测试
使用git bash或其他终端远程连接虚拟机
telnet hadoop000 44444
输入内容即可
此时Flume收集到信息并打印到控制台中
其中Event是Flume中数据传输的基本单元,
Event = 可选的header + byte array