ALB Log Analysis with AWS Athena

Let analsysis ALB Logfiles with Athena.

First THX to Rob Witoff https://medium.com/@robwitoff/athena-alb-log-analysis-b874d0958909 for your excellent example.

Next THX is to AWS Support that helped me to figure out some issues.

 

I tested below CREATE TABLE query with your sample data, and using below schema helped to read your alb logs data successfully:

 


CREATE EXTERNAL TABLE IF NOT EXISTS logs.web_alb
( type string,
time string,
elb string,
client_ip string,
client_port int,
target_ip string,
target_port int,
request_processing_time double,
target_processing_time double,
response_processing_time double,
elb_status_code string,
target_status_code string,
received_bytes bigint,
sent_bytes bigint,
request_verb string,
request_url string,
request_proto string,
user_agent string,
ssl_cipher string,
ssl_protocol string,
target_group_arn string,
trace_id string) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
WITH SERDEPROPERTIES (
'serialization.format' = '1','input.regex' = '([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*):([0-9]*) ([^ ]*)[:\-]([0-9]*) ([-.0-9]*) ([-.0-9]*) ([-.0-9]*) (|[-0-9]*) (-|[-0-9]*) ([-0-9]*) ([-0-9]*) \"([^ ]*) ([^ ]*) (- |[^ ]*)\" (\"[^\"]*\") ([A-Z0-9-]+) ([A-Za-z0-9.-]*) ([^ ]*) (.*)' )
LOCATION 's3://<your_bucket>/'

Please note that in your sample data, there was no value for year, month and day as provided in the PARTITIONED BY clause in your create table query: PARTITIONED BY(year string, month string, day string) Hence I had to remove the PARTITIONED BY clause from your query. Can you try creating table using above schema and see if that helps in reading alb logs.

 

 

 

  1. We must activate enable access logs in the properties from ALB.

 

athena_elb_create_s3_bucket

Enable access logs, choose a name of your bucket and say create this location for me. After this Athena have access to the log files also.

You can generate some data with benchmark tools of the website or you wait some minutes to have some data to analysis.

Then we go to Athena

now it will be a little bit tricky

we must create a db from hand, say add table

athena_create_db

 

fill it out, database is logs

Table name is web_alb

put your s3 url into it

s3://com.mywebsite.httplog2/AWSLogs/595264310722/elasticloadbalancing/eu-central-1/2017/05/

athena_create_db_2

Say next and choose Apache Web Logs

 

athena_create_db_3

add one table with name test, we will delete it later

athena_create_db_4

 

athena_create_db_5

Do nothing here and click, create table

 

Now we have created a Database with Name logs and a table with Name web_alb

athena_create_db_6.jpg

now we just drop the table web_alb

DROP TABLE IF EXISTS web_alb PURGE;

 

athena_drob_table

so then we fill in the correct table with all columns that we need

 

 


CREATE EXTERNAL TABLE IF NOT EXISTS logs.web_alb (
 type string,
 time string, 
 elb string, 
 client_ip string, 
 client_port int, 
 target_ip string, 
 target_port int, 
 request_processing_time double, 
 target_processing_time double, 
 response_processing_time double, 
 elb_status_code string, 
 target_status_code string, 
 received_bytes bigint, 
 sent_bytes bigint, 
 request_verb string, 
 request_url string, 
 request_proto string, 
 user_agent string, 
 ssl_cipher string, 
 ssl_protocol string,
 target_group_arn string,
 trace_id string) 
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
WITH SERDEPROPERTIES (
 'serialization.format' = '1','input.regex' = '([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*):([0-9]*) ([^ ]*)[:\-]([0-9]*) ([-.0-9]*) ([-.0-9]*) ([-.0-9]*) (|[-0-9]*) (-|[-0-9]*) ([-0-9]*) ([-0-9]*) \"([^ ]*) ([^ ]*) (- |[^ ]*)\" (\"[^\"]*\") ([A-Z0-9-]+) ([A-Za-z0-9.-]*) ([^ ]*) (.*)' ) 
LOCATION 's3://com.mysite.httplog2/AWSLogs/595264310722/elasticloadbalancing/eu-central-1/2017/05/'

 

athena_create_db_7

 

 

we will fire up our first select

SELECT * FROM logs.web_alb LIMIT 100;

athena_first_select

That´s was all

Example for most visitor sorted by most visit with user agent

SELECT user_agent, client_ip, COUNT(*) as count
FROM logs.web_alb
GROUP BY user_agent, client_ip
ORDER BY COUNT(*) DESC

 

 

 

 

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s