fluentd에서 가장 까다로운 것이 로그를 정규표현식으로 분석하는 것이다.

잘못하면 로그가 drop될 수 있기 때문에.. 신중하게 써야 하는데.


헤로쿠의 다음 url에서 fluent의 정규 표현식을 테스트해볼 수 있다. 


http://fluentular.herokuapp.com/




예를 들어 다음 로그를


"192.168.6.118 - - [25/May/2014:23:44:21 +0000]  GET /images/templates/main/favicon_16_24_32.ico HTTP/1.1 \"200\" 1811 \"-\" \"test-domain.com\" test-domain.com \"Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.114 Safari/537.36\" \"-\" upstream_response_time 0.028 upstream_addr 192.168.6.7:8017 msec 1401061461.752 request_time 0.028"



다음과 같은 regex로 파싱할 수 있다. 


^(?<remote_addr>[^ ]*) - (?<remote_user>[^ ]*) \[(?<time>[^\]]*)\]\s+(?<request_type>[^ ]*) (?<request_url>[^ ]*) (?<request_http_protocol>[^ ]*) "(?<status>[^"]*)" (?<body_bytes_sent>[^ ]*) "(?<http_referer>[^"]*)" "(?<http_host>[^"]*)" (?<host>[^ ]*) "(?<http_user_agent>[^"]*)" "(?<http_x_forwarded_for>[^"]*)" upstream_response_time (?<upstream_response_time>[^ ]*) upstream_addr (?<upstream_addr>[^ ]*) msec (?<msec request_time>[^ ]*) request_time (?<request_time>[^ ]*)



url은 get으로 볼 수 있다.


http://fluentular.herokuapp.com/parse?regexp=%5E%28%3F%3Cremote_addr%3E%5B%5E+%5D*%29+-+%28%3F%3Cremote_user%3E%5B%5E+%5D*%29+%5C%5B%28%3F%3Ctime%3E%5B%5E%5C%5D%5D*%29%5C%5D%5Cs%2B%28%3F%3Crequest_type%3E%5B%5E+%5D*%29+%28%3F%3Crequest_url%3E%5B%5E+%5D*%29+%28%3F%3Crequest_http_protocol%3E%5B%5E+%5D*%29+%22%28%3F%3Cstatus%3E%5B%5E%22%5D*%29%22+%28%3F%3Cbody_bytes_sent%3E%5B%5E+%5D*%29+%22%28%3F%3Chttp_referer%3E%5B%5E%22%5D*%29%22+%22%28%3F%3Chttp_host%3E%5B%5E%22%5D*%29%22+%28%3F%3Chost%3E%5B%5E+%5D*%29+%22%28%3F%3Chttp_user_agent%3E%5B%5E%22%5D*%29%22+%22%28%3F%3Chttp_x_forwarded_for%3E%5B%5E%22%5D*%29%22+upstream_response_time+%28%3F%3Cupstream_response_time%3E%5B%5E+%5D*%29+upstream_addr+%28%3F%3Cupstream_addr%3E%5B%5E+%5D*%29+msec+%28%3F%3Cmsec+request_time%3E%5B%5E+%5D*%29+request_time+%28%3F%3Crequest_time%3E%5B%5E+%5D*%29&input=%22192.168.6.118+-+-+%5B25%2FMay%2F2014%3A23%3A44%3A21+%2B0000%5D++GET+%2Fimages%2Ftemplates%2Fmain%2Ffavicon_16_24_32.ico+HTTP%2F1.1+%5C%22200%5C%22+1811+%5C%22-%5C%22+%5C%22test-domain.com%5C%22+test-domain.com+%5C%22Mozilla%2F5.0+%28Windows+NT+6.3%3B+WOW64%29+AppleWebKit%2F537.36+%28KHTML%2C+like+Gecko%29+Chrome%2F35.0.1916.114+Safari%2F537.36%5C%22+%5C%22-%5C%22+upstream_response_time+0.028+upstream_addr+192.168.6.7%3A8017+msec+1401061461.752+request_time+0.028%22&time_format=






<source> 
  type tail 
  path /var/log/foo/bar.log 
  pos_file /var/log/td-agent/foo-bar.log.pos 
  tag foo.bar 
  format /^(?<remote_addr>[^ ]*) - (?<remote_user>[^ ]*) \[(?<time>[^\]]*)\]\s+(?<request_type>[^ ]*) (?<request_url>[^ ]*) (?<request_http_protocol>[^ ]*) "(?<status>[^"]*)" (?<body_bytes_sent>[^ ]*) "(?<http_referer>[^"]*)" "(?<http_host>[^"]*)" (?<host>[^ ]*) "(?<http_user_agent>[^"]*)" "(?<http_x_forwarded_for>[^"]*)" upstream_response_time (?<upstream_response_time>[^ ]*) upstream_addr (?<upstream_addr>[^ ]*) msec (?<msec request_time>[^ ]*) request_time (?<request_time>[^ ]*)/ 
</source>



Posted by 김용환 '김용환'