<< node.js 사용기 | Home | WhySQL? - Evernote의 사례 >>

Hadoop-pcap-hive 기반의 IP 패킷 분석기

RIPE-NCC-hadoop-pcap 오픈소스를 통해 기본 패킷 데이터를 분석해서 접근 이력이 많은 IP를 분석하는 내용을 공유합니다.
향후 Hadoop 기반 패킷분석시스템(DDOS Detect 등)을 만드는데 도움이 많이 될 듯 합니다.

1. PCAP 헤더 정보
- PCAP File Format : 파일의 맨 처음에 Global Header가 하나가 있고, 그 후로 [Packet Header - Packet Data]쌍으로 이루어짐.

Global Header

Packet Header

Packet Data

Packet Header

Packet Data

Packet Header

Packet Data

...



. 상세한 내용은 여기가서 보면 된다.


2. tcpdump로 PCAP 파일 생성
[mimul]/logs> tcpdump -vvv -s 1600 -X -f "ip host 192.168.1.102" -w a.pcap
[mimul]/logs> tcpdump -ttttnnr a.pcap#보기 좋게 뷰잉해서 확인.


3. PCAP 파일 hadoop에 카피
[mimul]/logs> hadoop fs -mkdir /pcaps
[mimul]/logs> hadoop fs -put a.pcap /pcaps/

[mimul]/logs> hadoop fs -ls /pcaps
Found 1 items
-rw-r--r-- 1 k2 supergroup 12385195 2012-02-27 16:37 /pcaps/a.pcap


4. RIPE-NCC/hadoop-pcap 오픈 소스 재 컴파일
- RIPE-NCC/hadoop-pcap 오픈 소스 git clone.
- hadoop-0.20.205.0, hive-0.8.0 버전에 맞도록 소스 일부 수정 및 재컴파일.
- hadoop-pcap-serde-0.2-SNAPSHOT-jar-with-dependencies.jar 라이브러리 생성됨.

5. hive로 데이터 import
#라이브러리 추가
hive> ADD JAR hadoop-pcap-serde-0.2-SNAPSHOT-jar-with-dependencies.jar;
# 100MB 사이즈로 Aplit함
hive> SET hive.input.format=
  org.apache.hadoop.hive.ql.io.CombineHiveInputFormat;
hive> SET mapred.max.split.size=104857600;

# 테이블 생성
hive> SET net.ripe.hadoop.pcap.io.reader.class=
  net.ripe.hadoop.pcap.DnsPcapReader;
hive> CREATE EXTERNAL TABLE pcaps (ts bigint,
                      protocol string,
                      src string,
                      src_port int,
                      dst string,
                      dst_port int,
                      len int,
                      ttl int,
                      dns_queryid int,
                      dns_flags string,
                      dns_opcode string,
                      dns_rcode string,
                      dns_question string,
                      dns_answer array,
                      dns_authority array,
                      dns_additional array)
ROW FORMAT SERDE 'net.ripe.hadoop.pcap.serde.PcapDeserializer'
STORED AS INPUTFORMAT 'net.ripe.hadoop.pcap.io.PcapInputFormat'
 OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION 'hdfs:///pcaps/';


6. 소스 IP, 접속건수 카운트 쿼리 실행
- 쿼리 조작으로 악의적 접속자 추출 가능할 것으로 보임.
hive> SELECT src, COUNT(src) FROM pcaps GROUP BY src;
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=
In order to set a constant number of reducers:
  set mapred.reduce.tasks=
Starting Job = job_201202141631_0003, Tracking URL = 
  http://kth:50030/jobdetails.jsp?jobid=job_201202141631_0003
Kill Command = /database/server/hadoop-0.20.205.0/bin/../bin/hadoop job  
 -Dmapred.job.tracker=kth:9001 -kill job_201202141631_0003
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2012-02-27 19:05:00,050 Stage-1 map = 0%,  reduce = 0%
2012-02-27 19:05:06,119 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 3.38 sec
2012-02-27 19:05:07,133 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 3.38 sec
2012-02-27 19:05:08,143 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 3.38 sec
2012-02-27 19:05:09,162 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 3.38 sec
2012-02-27 19:05:10,253 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 3.38 sec
2012-02-27 19:05:11,263 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 3.38 sec
2012-02-27 19:05:12,273 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 3.38 sec
2012-02-27 19:05:13,283 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 3.38 sec
2012-02-27 19:05:14,293 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 3.38 sec
2012-02-27 19:05:15,303 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 3.38 sec
2012-02-27 19:05:16,313 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 3.38 sec
2012-02-27 19:05:17,325 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 6.12 sec
2012-02-27 19:05:18,333 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 6.12 sec
2012-02-27 19:05:19,343 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 6.12 sec
2012-02-27 19:05:20,353 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 6.12 sec
2012-02-27 19:05:21,363 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 6.12 sec
2012-02-27 19:05:22,373 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 6.12 sec
2012-02-27 19:05:23,383 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 6.12 sec
MapReduce Total cumulative CPU time: 6 seconds 120 msec
Ended Job = job_201202141631_0003
MapReduce Jobs Launched: 
Job 0: Map: 1  Reduce: 1   Accumulative CPU: 6.12 sec   
 HDFS Read: 12385376 HDFS Write: 745 SUCESS
Total MapReduce CPU Time Spent: 6 seconds 120 msec
OK
1.202.218.8     6
1.234.2.193     22751
1.234.2.209     920
109.230.216.60  123
110.70.10.151   178
110.9.88.16     242
111.91.137.34   9
111.91.139.50   334
111.91.139.66   10
112.171.126.99  335
112.172.131.177 36
116.125.143.78  14
119.147.75.137  5
123.125.71.114  6
124.215.250.217 5
150.70.75.37    88
157.55.16.86    6
157.55.18.22    7
159.253.132.100 1
175.196.79.162  351
180.76.5.188    6
199.59.148.87   5
203.215.201.193 14
209.200.154.254 1
209.85.238.40   28
210.217.175.248 326
211.115.97.47   365
211.210.117.3   294
211.212.39.221  8
211.242.223.51  234
211.37.183.105  25963
211.41.205.50   8
211.45.150.101  2
220.181.108.174 6
223.33.130.133  374
61.42.211.5     379
65.52.108.66    7
65.52.110.200   10
66.249.67.72    73
66.249.68.74    58
67.170.236.235  18
67.228.172.188  1
78.140.130.236  110
Time taken: 33.717 seconds

이후 해야할 작업은 flume-plugin-pcap 만들어서 로그 수집부터 hive까지 이어지는 프로세스를 만들어 볼 생각입니다.

[참조 사이트]
Tags : , , ,


Re: Hadoop-pcap-hive 기반의 IP 패킷 분석기

캬... 호기심가는 글이네요. 저도 한번 세차게 파봐야할 아티클.


Add a comment Send a TrackBack