1.Flume的下载与安装

本次学习在Ubuntu的Linux操作系统下进行,首先进入Ubuntu,输入指令

1
wget https://archive.apache.org/dist/flume/1.9.0/apache-flume-1.9.0-bin.tar.gz

下载完成后,在目录下输入指令进行解压

1
tar -zxvf 压缩包的名称

2.netcat日志采集

2.1 配置文件

进入目录下的conf文件中,创建example.conf文件,输入以下内容:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
# 设置Agent上的各个组件名称
a1.sources = r1 #可以定义多个,r1 r2 r3 …
a1.sinks = k1
a1.channels = c1

# 配置Source
a1.sources.r1.type = netcat
a1.sources.r1.bind = localhost
a1.sources.r1.port = 44444

# 配置Sink
a1.sinks.k1.type = logger

# 配置Channel
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

# 把Source和Sink绑定到Channel上
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

2.2启动测试

输入完成后,回到Flume的根目录下,输入

1
./bin/flume-ng agent --conf ./conf --conf-file ./conf/example.conf --name  a1 -Dflume.root.logger=INFO,console

如果是Windows操作系统,则可能是

1
.\bin\flume-ng agent --conf .\conf --conf-file .\conf\example.conf --name  a1 -property flume.root.logger=INFO,console 

确保telnet在主机上启用后,输入

1
telnet localhost 端口号

敲下回车键,如果终端显示‘OK’,则说明telnet上了,这时候输入任意字符,终端上就会显示消息,则说明测试完成。

3.采集文件数据到指定位置

3.1 配置文件

回到conf目录下,新建example1.conf文件,配置以下信息,别忘了改路径

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
# 设置Agent上的各个组件名称
a1.sources = r1 #可以定义多个,r1 r2 r3 …
a1.sinks = k1
a1.channels = c1

# 配置Source
a1.sources.r1.type = netcat
a1.sources.r1.bind = localhost
a1.sources.r1.port = 44444

# 配置Sink
a1.sinks.k1.type = file_roll
a1.sinks.k1.sink.directory = /home/ldyer/
a1.sinks.k1.sink.rollInterval = 0
a1.sinks.k1.sink.file.name.timeFormat = yyyyMMddHH

# 配置Channel
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

# 把Source和Sink绑定到Channel上
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

3.2启动测试

输入之前的指令(别忘了改conf的名字),启动Flume,然后telnet端口,随便输入一些消息,然后退出,就看看到之前输入的地址下面有一个文件,打开文件,内容就是刚才输入的信息,测试成功。

4.采集文件数据到指定位置(静态)

4.1配置文件

和上述一样,创建example2.conf,输入以下配置信息(注意spoolDir必须是目录,不是文件)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
#配置Agent上的各个组件名称
a1.sources = r1
a1.sinks = k1
a1.channels = c1

# 配置source
a1.sources.r1.type = spooldir
a1.sources.r1.spoolDir =/home/ldyer/log
# a1.sources.r1.fileHeader = true
# a1.sources.r1.interceptors = i1
# a1.sources.r1.interceptors.i1.type = timestamp

# 配置sink
a1.sinks.k1.type = logger

# 配置 channel
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

# 绑定source,channel,sink
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

4.2启动测试

启动Flume,在配置中的目录下进行操作,如创建文件、删除文件等,都可以在终端上看到消息,测试成功。

5.采集文件数据到指定位置(动态)

5.1 配置文件

配置example3.conf,输入以下内容:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
# Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1

# Describe/configure the source
a1.sources.r1.type = exec
a1.sources.r1.command = tail -F /home/ldyer/log/ldy.txt

# Describe the sink
a1.sinks.k1.type = logger

# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

5.2启动测试

测试步骤如上

6.基于Avro多Agent分布式日志采集

6.1 Agent1

创建 Agent1.conf文件,输入以下内容:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
# 设置Agent1上采集telnet数据
a1.sources = r1
a1.sinks = k1
a1.channels = c1

# 配置Source
a1.sources.r1.type = netcat
a1.sources.r1.bind = localhost
a1.sources.r1.port = 44444

# 配置Sink
a1.sinks.k1.type = avro
a1.sinks.k1.channel = c1
#实际应用为目标机制地址不能是localhost或127.0.0.1
a1.sinks.k1.hostname = localhost
#目标机器的端口号
a1.sinks.k1.port = 55555
a1.sinks.k1.batch-size = 1


# 配置Channel
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

# 把Source和Sink绑定到Channel上
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

输入指令启动

1
./bin/flume-ng agent --conf ./conf --conf-file ./conf/Agent1.conf --name  a1 

6.2 Agent2

输入以下内容:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
# 设置Agent2上的各个组件名称
a1.sources = r1
a1.sinks = k1
a1.channels = c1

# 配置Source
#source中的avro组件是一个接收者服务
a1.sources.r1.type = avro
a1.sources.r1.channels = c1

#当前集群IP,必须是ip地址
a1.sources.r1.bind = localhost
a1.sources.r1.port = 55555


# 配置Sink
a1.sinks.k1.type = logger

# 配置Channel
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

# 把Source和Sink绑定到Channel上
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

然后输入指令启动(这和Agent1有所区别,因为Agent2要显示Agent1的内容)

1
./bin/flume-ng agent --conf ./conf --conf-file ./conf/Agent2.conf --name  a1 -Dflume.root.logger=INFO,console