2-Olly
Observability
Athena_Glue_S3_Fluentbit
Eks Glue

------------------------------------------

Querying EKS Logs from S3 using Fluent Bit and Amazon Athena

This document outlines the end-to-end process for shipping logs from an Amazon EKS cluster to S3 using Fluent Bit and making them easily searchable with standard SQL using AWS Glue and Amazon Athena.

The primary goal is to create a serverless, cost-effective, and powerful log analysis platform that avoids the need to manually download and search through thousands of small log files.


Part 1: The Logging Pipeline (Fluent Bit to S3)

We start by configuring Fluent Bit to send all container logs to a designated S3 bucket.

Fluent Bit Configuration (fluent-bit-values.yaml)

The setup uses the official Fluent Bit Helm chart. The most important section is the outputs configuration, which defines the S3 destination and the log file structure.

config:
  outputs: |
    [OUTPUT]
        Name                s3
        Match               *
        bucket              eks-logs-archive-production-497836541334
        region              ap-south-1
        total_file_size     50M
        upload_timeout      5m
        s3_key_format       /prod-logs/%Y/%m/%d/%H/logs-$UUID.json
        # Note: If your logs are gzipped, use .json.gz and set 'compression gzip'

Key Configuration Explained

  • s3_key_format: /prod-logs/%Y/%m/%d/%H/...: This is the most critical setting. It organizes logs into a folder structure based on the UTC timestamp of when the log was processed.
  • Structure: /prod-logs/YEAR/MONTH/DAY/HOUR/
  • Important Note: This format (/2025/10/04/02/) is not Hive-style (/year=2025/month=10/...). This distinction is why we use Partition Projection in Athena, which is a more modern and efficient approach.

Part 2: Making Logs Searchable with Glue & Athena

With logs flowing to S3, we now set up the AWS services to query them.

Step 1: Create an AWS Glue Crawler (Initial Discovery)

The crawler's only job is to perform a one-time scan of your data to give us a starting schema.

  1. Navigate to AWS Glue in the console.
  2. Create a Crawler: Give it a name (e.g., eks-logs-initial-crawler).
  3. Data Source: Point it to your S3 log location: s3://eks-logs-archive-production-497836541334/prod-logs/.
  4. IAM Role: Allow Glue to create a new role with permissions to access the S3 bucket.
  5. Output: Create a new database named eks_log_database.
  6. Run the crawler once. It will create a table named prod_logs. This auto-generated schema will be incorrect, but it provides the template we need to fix.

Step 2: The Final Table Schema (table-definition.json)

The automatically created schema is too rigid. We must update it to correctly handle Kubernetes metadata and the S3 folder structure. The best way is to define the perfect schema in a JSON file and update the table using the AWS CLI.

Below is the final, corrected JSON definition. Save this content as table-definition.json.

{
    "Name": "prod_logs",
    "Description": "EKS logs with partition projection enabled",
    "Retention": 0,
    "StorageDescriptor": {
        "Columns": [
            { "Name": "date", "Type": "string" },
            { "Name": "log", "Type": "string" },
            { "Name": "kubernetes", "Type": "struct<pod_name:string,namespace_name:string,pod_id:string,labels:map<string,string>,annotations:map<string,string>,host:string,pod_ip:string,container_name:string,docker_id:string,container_hash:string,container_image:string>" }
        ],
        "Location": "s3://eks-logs-archive-production-497836541334/prod-logs/",
        "InputFormat": "org.apache.hadoop.mapred.TextInputFormat",
        "OutputFormat": "org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat",
        "Compressed": false,
        "SerdeInfo": {
            "SerializationLibrary": "org.apache.hive.hcatalog.data.JsonSerDe"
        },
        "Parameters": {
            "compressionType": "none",
            "classification": "json"
        }
    },
    "PartitionKeys": [
        { "Name": "partition_0", "Type": "string" },
        { "Name": "partition_1", "Type": "string" },
        { "Name": "partition_2", "Type": "string" },
        { "Name": "partition_3", "Type": "string" }
    ],
    "TableType": "EXTERNAL_TABLE",
    "Parameters": {
        "projection.enabled": "true",
        "projection.partition_0.type": "integer",
        "projection.partition_0.range": "2024,2030",
        "projection.partition_1.type": "integer",
        "projection.partition_1.range": "01,12",
        "projection.partition_1.digits": "2",
        "projection.partition_2.type": "integer",
        "projection.partition_2.range": "01,31",
        "projection.partition_2.digits": "2",
        "projection.partition_3.type": "integer",
        "projection.partition_3.range": "00,23",
        "projection.partition_3.digits": "2",
        "storage.location.template": "s3://eks-logs-archive-production-497836541334/prod-logs/${partition_0}/${partition_1}/${partition_2}/${partition_3}"
    }
}

Key table-definition.json Sections Explained

  • kubernetes column: The Type is defined as a struct where labels and annotations are map<string,string>. This allows you to query any label or annotation without errors.
  • SerdeInfo: We use the robust org.apache.hive.hcatalog.data.JsonSerDe to correctly parse the JSON log files.
  • compressionType: none: This tells Athena not to expect GZIP compression. If your logs are gzipped (e.g., logs.json.gz), you should set this to gzip and Compressed to true.
  • PartitionKeys: We define four partition keys. While named generically (partition_0, etc.), they correspond to Year, Month, Day, and Hour.
  • Partition Projection (Parameters block): This is the most important part. It tells Athena how to calculate S3 paths without a crawler.
    • projection.enabled: true: Turns the feature on.
    • projection.*.type/range/digits: Defines the type and possible values for each partition part (year, month, day, hour).
    • storage.location.template: This template tells Athena exactly how to build the S3 folder path from the values in your query's WHERE clause.

Step 3: Apply the Schema Update

Run the following AWS CLI command to update your Glue table with the correct definition from the file.

aws glue update-table --database-name eks_log_database --table-input file://table-definition.json

With this, your setup is complete. You will never need to run a crawler or MSCK REPAIR TABLE prod_logs; again.


Part 3: Top 10 Sample Athena Queries 🔍

You can now run SQL queries in the Athena console. Remember that all partitions are based on UTC time.

1. Get Recent Logs from a Specific Pod

SELECT "date", log
FROM "prod_logs"
WHERE kubernetes.pod_name = 'my-app-pod-xyz123'
AND partition_0 = '2025' AND partition_1 = '10' AND partition_2 = '03'
ORDER BY "date" DESC LIMIT 100;

2. Get Logs from a Namespace in a Specific Hour

SELECT "date", kubernetes.pod_name, log
FROM "prod_logs"
WHERE kubernetes.namespace_name = 'ai-assistant'
AND partition_0 = '2025' -- Year
AND partition_1 = '10'   -- Month
AND partition_2 = '03'   -- Day (UTC)
AND partition_3 = '20'   -- Hour (UTC, 8 PM)
ORDER BY "date" DESC;

3. Search for a String in Logs (Grep)

SELECT "date", kubernetes.pod_name, log
FROM "prod_logs"
WHERE log LIKE '%bc8cf23b-d34b-417f-8124-2dd3c39449fa%'
AND partition_0 = '2025' AND partition_1 = '10' AND partition_2 = '03'
LIMIT 100;

4. Filter by a Kubernetes Label

SELECT "date", kubernetes.pod_name, log
FROM "prod_logs"
WHERE kubernetes.labels['app'] = 'tracemypods-askapi'
AND partition_0 = '2025' AND partition_1 = '10' AND partition_2 = '03'
ORDER BY "date" DESC LIMIT 100;

5. Find All "WARN" or "ERROR" Logs

SELECT "date", kubernetes.pod_name, log
FROM "prod_logs"
WHERE (log LIKE '%"level":"WARN"%' OR log LIKE '%"level":"ERROR"%')
AND partition_0 = '2025' AND partition_1 = '10' AND partition_2 = '03'
ORDER BY "date" DESC;

6. Count Logs by Pod Name

SELECT kubernetes.pod_name, COUNT(*) as log_count
FROM "prod_logs"
WHERE partition_0 = '2025' AND partition_1 = '10' AND partition_2 = '03'
GROUP BY kubernetes.pod_name
ORDER BY log_count DESC;

7. Count Logs by Namespace

SELECT kubernetes.namespace_name, COUNT(*) as log_count
FROM "prod_logs"
WHERE partition_0 = '2025' AND partition_1 = '10'
GROUP BY kubernetes.namespace_name
ORDER BY log_count DESC;

8. Find Logs from a Specific Container

SELECT "date", log
FROM "prod_logs"
WHERE kubernetes.container_name = 'my-specific-container'
AND partition_0 = '2025' AND partition_1 = '10' AND partition_2 = '03'
ORDER BY "date" DESC LIMIT 100;

9. Filter by a Kubernetes Annotation

SELECT "date", kubernetes.pod_name
FROM "prod_logs"
WHERE kubernetes.annotations['prometheus.io/scrape'] = 'true'
AND partition_0 = '2025' AND partition_1 = '10' AND partition_2 = '03'
LIMIT 100;

10. A Complex Combined Query

This finds a specific error message for an application within a namespace during a specific 1-hour window.

SELECT "date", kubernetes.pod_name, log
FROM "prod_logs"
WHERE
    kubernetes.namespace_name = 'ai-assistant'
    AND kubernetes.labels['app'] = 'tracemypods-deliverapi'
    AND log LIKE '%The group is rebalancing, so a rejoin is needed%'
    AND partition_0 = '2025'
    AND partition_1 = '10'
    AND partition_2 = '03'
    AND partition_3 = '20'
ORDER BY "date" DESC;

---------------------------Re- Extra Setup ------------------------

Step 1: Create an AWS Glue Crawler

The crawler will scan your S3 files to automatically figure out the data structure.

  1. Go to the AWS Glue service in the AWS Console.
  2. In the left menu, select Crawlers and click Create crawler.
  3. Name your crawler: eks-logs-json-crawler.
  4. Data sources:
    • Click Add a data source.
    • Data source: S3.
    • S3 path: Enter the path to your main log folder, s3://eks-logs-archive-production-497836541334/prod-logs/.
    • Ensure Crawl all sub-folders is selected. Click Add an S3 data source.
  5. IAM Role:
    • Choose Create an IAM role.
    • Give it a name, like Glue-EKS-Logs-Reader-Role.
  6. Output and scheduling:
    • Click Add database. Name your database eks_log_database.
    • For scheduling, leave the Frequency as On demand.
  7. Review the details and click Create crawler.

Step 2: Run the Crawler and Modify the Table (Critical Step)

This is the most important step where we tell Glue that your files are not actually compressed.

  1. Select your new crawler from the list and click Run crawler. Wait for it to complete.
  2. Once it's finished, navigate to Databases in the left menu, click on eks_log_database, and then go to the Tables.
  3. You'll see a new table named prod_logs. Click on it.
  4. Click the Actions button and select Edit table.
  5. Scroll down to the Table properties section at the bottom.
  6. You will likely see a property with a key of compressionType and a value of gzip. The crawler added this because your files end in .gz.
  7. You must change the value from gzip to none.
  8. Click Save.

Your system is now correctly configured to read the logs as uncompressed JSON, and you are ready to query.


Step 3: Query Your Logs in Amazon Athena 🔍

This is where you can start finding the exact logs you need.

  1. Go to the Amazon Athena service in the AWS Console.
  2. Set the Database to eks_log_database. You should see your prod_logs table.

Basic Queries

Let's start by querying the top-level fields and the kubernetes object.

To find the 10 most recent logs from a specific pod:

SELECT
    "date",
    kubernetes.namespace_name,
    log
FROM
    "prod_logs"
WHERE
    kubernetes.pod_name = 'tracemypods-deliverapi-7645b8957-srfdc'
ORDER BY
    "date" DESC
LIMIT 10;

Advanced Query: Searching Inside the log Field

Your log field is a string that contains another JSON object inside it. To search within this, you need to first extract the JSON part of the string and then query it. We'll use the regexp_extract and json_extract_scalar functions.

To find all "WARN" level logs from the 'ai-assistant' namespace:

This query looks complex, but it's very powerful. It first pulls out the clean JSON from the log string and then searches for the log level within it.

SELECT
    "date",
    kubernetes.pod_name,
    -- Step 2: Extract the 'message' field from the clean JSON
    json_extract_scalar(log_json, '$.message') AS log_message
FROM
    (
        -- Step 1: Extract the JSON part from the raw log string
        SELECT
            "date",
            kubernetes,
            regexp_extract(log, '(\{.+\})', 1) AS log_json
        FROM
            "prod_logs"
    )
WHERE
    -- Step 3: Filter by fields inside the nested JSON and the kubernetes object
    json_extract_scalar(log_json, '$.level') = 'WARN'
    AND kubernetes.namespace_name = 'ai-assistant'
ORDER BY
    "date" DESC;

OLD

 
{
    "Name": "prod_logs",
    "Description": "EKS logs with partition projection enabled",
    "Retention": 0,
    "StorageDescriptor": {
        "Columns": [
            { "Name": "date", "Type": "string" },
            { "Name": "log", "Type": "string" },
            { "Name": "kubernetes", "Type": "struct<pod_name:string,namespace_name:string,pod_id:string,labels:map<string,string>,annotations:map<string,string>,host:string,pod_ip:string,container_name:string,docker_id:string,container_hash:string,container_image:string>" }
        ],
        "Location": "s3://eks-logs-archive-production-497836541334/prod-logs/",
        "InputFormat": "org.apache.hadoop.mapred.TextInputFormat",
        "OutputFormat": "org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat",
        "Compressed": false,
        "SerdeInfo": { "SerializationLibrary": "org.apache.hive.hcatalog.data.JsonSerDe" },
        "Parameters": {
            "compressionType": "none",
            "classification": "json"
        }
    },
    "PartitionKeys": [
        { "Name": "partition_0", "Type": "string" },
        { "Name": "partition_1", "Type": "string" },
        { "Name": "partition_2", "Type": "string" },
        { "Name": "partition_3", "Type": "string" }
    ],
    "TableType": "EXTERNAL_TABLE",
    "Parameters": {
        "projection.enabled": "true",
        "projection.partition_0.type": "integer",
        "projection.partition_0.range": "2024,2030",
        "projection.partition_1.type": "integer",
        "projection.partition_1.range": "01,12",
        "projection.partition_1.digits": "2",
        "projection.partition_2.type": "integer",
        "projection.partition_2.range": "01,31",
        "projection.partition_2.digits": "2",
        "projection.partition_3.type": "integer",
        "projection.partition_3.range": "00,23",
        "projection.partition_3.digits": "2",
        "storage.location.template": "s3://eks-logs-archive-production-497836541334/prod-logs/${partition_0}/${partition_1}/${partition_2}/${partition_3}"
    }
}

Athena Table Definition for EKS Logs with Partition Projection for AWS Glue

{
    "Name": "prod_logs",
    "Description": "EKS logs with partition projection enabled",
    "Retention": 0,
    "StorageDescriptor": {
        "Columns": [
            { "Name": "date", "Type": "string" },
            { "Name": "log", "Type": "string" },
            { "Name": "kubernetes", "Type": "struct<pod_name:string,namespace_name:string,pod_id:string,labels:map<string,string>,annotations:map<string,string>,host:string,pod_ip:string,container_name:string,docker_id:string,container_hash:string,container_image:string>" }
        ],
        "Location": "s3://eks-logs-archive-production-497836541334/prod-logs/",
        "InputFormat": "org.apache.hadoop.mapred.TextInputFormat",
        "OutputFormat": "org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat",
        "Compressed": false,
        "SerdeInfo": { "SerializationLibrary": "org.apache.hive.hcatalog.data.JsonSerDe" },
        "Parameters": {
            "compressionType": "none",
            "classification": "json"
        }
    },
    "PartitionKeys": [
        { "Name": "partition_0", "Type": "string" },
        { "Name": "partition_1", "Type": "string" },
        { "Name": "partition_2", "Type": "string" },
        { "Name": "partition_3", "Type": "string" }
    ],
    "TableType": "EXTERNAL_TABLE",
    "Parameters": {
        "projection.enabled": "true",
        "projection.partition_0.type": "integer",
        "projection.partition_0.range": "2024,2030",
        "projection.partition_1.type": "integer",
        "projection.partition_1.range": "01,12",
        "projection.partition_1.digits": "2",
        "projection.partition_2.type": "integer",
        "projection.partition_2.range": "01,31",
        "projection.partition_2.digits": "2",
        "projection.partition_3.type": "integer",
        "projection.partition_3.range": "00,23",
        "projection.partition_3.digits": "2",
        "storage.location.template": "s3://eks-logs-archive-production-497836541334/prod-logs/${partition_0}/${partition_1}/${partition_2}/${partition_3}"
    }
}

💬 Need a Quick Summary?

Hey! Don't have time to read everything? I get it. 😊
Click below and I'll give you the main points and what matters most on this page.
Takes about 5 seconds • Uses Perplexity AI