Using the Kinesis Agent to enrich application logs with EC2 metadata Introduction

Rick O'Sullivan
AWS
February 12, 2024

If you have ever spent time analyzing application logs across a fleet of servers, you will appreciate how challenging host identification can be. This is especially true for distributed applications where logs focus on application detail and omit necessary host information which is critical for identifying a troublesome host in a sea of servers.

Thankfully the Amazon Kinesis-Agent can be used to not only stream your log data to a central location for analysis but also can be configured to append host information by leveraging the Instance Metadata Service.

In this post, I will show you how to implement the dataProcessingOptions within the Kinesis-Agent to enrich your logs with information from the Instance Metadata Service. By implementing this solution, you can simplify log ingestion logic and reduce up-stream data transformation activities for effective log analysis.

Architecture

The following diagram represents the solution we will be implementing;

The solution involves installing and configuring the Kinesis Agent on each of the EC2 instances, which is then configured to monitor specific folders and stream new log data (matching a configured regex pattern) into AWS via Kinesis Firehose.

Prerequisites

To implement the solution as described above you will need access to an AWS account, have existing EC2 instances (im using Amazon Linux 2) and have the necessary permissions to create the following resources;

S3 Buckets
Kinesis Firehose
EC2 Instance Profiles
IAM Roles & Policies

Implementation

To implement the solution described above (Excluding EC2 Instances), you can use either a predefined Cloud Formation template or the AWS Cli.

Option 1 — Cloud Formation

If you’ve selected the Cloudformation option, please complete all steps below;

· Create a new stack in your AWS account using the template provided

· Complete all steps in the Install and configure the Kinesis Agent sections

· Add additional permissions to the EC2 Instance Profile to append our logs with Instance Metadata

Note: All sample resources created in this blog use the AWS [AccountId] as part of the resource naming convention. This ensures all resources are globally unique. If you are copying the samples below, remember to replace [AccountId] within the sample policies and cli commands.

Creating the Stack using Cloudformation

Download the Cloudformation template from the below repo

kinesis-IMDS/kinesis-imds-template.yaml at main · rickosaws/kinesis-IMDS
Template for enriching application logs with EC2 metadata blog — kinesis-IMDS/kinesis-imds-template.yaml at main ·…github.com

Open the AWS Console and select the Cloudformation
Create a new stack selecting ‘Upload a template file’ as the template source
Select the template file
Give the stack a name e.g kinesis-logs-imds-demo
Under Parameters type either ‘Yes’ or ‘No’ (default is yes) if you would like the template to create the Instance Profile for you
Accept all remaining defaults and acknowledge the message about required IAM permissions and creating resources
Submit the stack

Note: After the Stack completes, skip down the the Installing the Kinesis Agent section and continue from there.

Option 2 — AWS Cli

If you’ve selected the AWS Cli option, please complete all steps below;

· Create an S3 bucket to store our logs

· Create a Kinesis Firehose Endpoint to deliver streamed logs

· Install and configure the Kinesis Agent used to stream our logs

· Add additional permissions to the EC2 Instance Profile to append our logs with Instance Metadata

Creating the S3 Bucket

In our sample architecture we need a single S3 bucket to store our processed logs files.

· To create this s3 bucket execute the following AWS CLI Command, replacing the bucket name and region values as required.

aws s3api create-bucket — bucket [AccountId]-kinesis-logs — region [Region]

Note: If the bucket is to reside in a region outside us-east-1, remember to include the — create-bucket-configuration LocationConstraint parameter with the command.

Creating an IAM Role & Policies for Firehose

The next step is to create an IAM policy and role which we will assign to the Kinesis Firehose Delivery Stream. This permits Firehose to create the log files in our S3 Bucket.

Let’s begin by creating the policy document.

· Copy the following JSON into a new file on your computer called iam-policy.json. Replacing

{
  "Version": "2012–10–17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
      "s3:GetBucketLocation",
      "s3:GetObject",
      "s3:ListBucket",
      "s3:PutObject"
      ],
    "Resource": [
    "arn:aws:s3:::[AccountId]-kinesis-logs",
    "arn:aws:s3:::[AccountId]-kinesis-logs/*"
      ]
    }
  ]
}

· Next, execute the following AWS CLI Command, to create a new IAM Policy using the policy document we created above.

Remember to copy the IAM Policy ARN from the output of this command as we will need that later to attach the policy to the Role.

aws iam create-policy — policy-name kinesis-firehose-to-s3 — policy-document file://iam-policy.json

To create the IAM Role required, we need to firstly create a assume role policy document. This allows the Firehose service to assume the role as use it.

· Copy the following JSON into a new file on your computer called iam-assume-policy.json

{
  "Version": "2012–10–17",
  "Statement": [
    {
    "Effect": "Allow",
    "Principal": {
    "Service": "firehose.amazonaws.com"
    },
  "Action": "sts:AssumeRole"
   } 
  ]
}

· Execute the command below to create the new Role and assign the trust relationship

aws iam create-role --role-name kinesis-agent-logs-s3 --assume-role-policy-document file://iam-assume-policy.json

· Finally attach the policy to the IAM Role using the following command

aws iam attach-role-policy — role-name learnaws-glue-role — policy-arn arn:aws:iam:: [AccountId]:policy/kinesis-firehose-to-s3

Note: If you plan on separating your Kinesis Data Firehose and Amazon S3 destination in different accounts, please see Cross-Account Delivery to an Amazon S3 Destination

Creating the Kinesis Firehose Endpoint

Now that we have the prerequsities resources deployed into the account, we can create the Kinesis Firehose Stream, using the following command

aws firehose create-delivery-stream — delivery-stream-name kinesis-log-stream — delivery-stream-type DirectPut — extended-s3-destination-configuration ‘{“BucketARN”: “arn:aws:s3:::[AccountId]-kinesis-logs”, “RoleARN”: “arn:aws:iam::[AccountId]:role/kinesis-agent-logs-s3” }’

Installing the Kinesis Agent

The following section assumes you are using Amazon Linux 2 instances. If you are using other instance types, your commands may need to be adjusted.

1. To Install the Kinesis Agent v2.0.0 on an Amazon Linux 2 (ALS) Instance run the following command from the command line of your EC2 Instance

sudo yum install –y aws-kinesis-agent

If you would like to install the agent from another location such as Github or an Amazon S3 repository, please refer to link below

https://docs.aws.amazon.com/firehose/latest/dev/writing-with-agents.html#download-install

Configuring the Kinesis Agent

Before any logs are streamed to Kinesis, we need to configure the agent to monitor for log file changes and defined where to send logs once they are processed. In my example, I’m running httpd on my EC2 instance so the logs I am interested in are located in /var/log/httpd.

To configure the Kinesis Agent to send these logs to my new Firehose Endpoint, edit the etc/aws-kinesis/agent.json add paste the following configuration.

{
  "firehose.endpoint": "firehose.ap-southeast-2.amazonaws.com",
  "sts.endpoint": "sts.ap-southeast-2.amazonaws.com",
  "flows": [
    {
      "filePattern": "/var/log/httpd/*",
      "deliveryStream": "kinesis-log-stream"
    }
  ]
}

As you can see the configuration file is made up of 2 sections, the first details the global options which tells the agent where to deliver files. In my case I’m using the ap-southeast-2 region, so I have included the firehose.endpoint and sts.endpoint parameters accordingly.

The second section (contained in the flows object), defines the monitoring configuration on the EC2 instance.

The filePattern parameter defines the folder to monitor whereas the deliveryStream parameter tells the Kinesis agent which Firehose delivery Stream to use. You will notice in my configuration I have used the stream name we defined earlier.

1. To ensure the Amazon Kinesis agent user has access to the logs in monitored folder, lets execute using the following command:

sudo setfacl -m u:aws-kinesis-agent-user:rwx /var/log/httpd

2. Finally, let’s restart the Amazon Kinesis agent so that it starts streaming access logs to the Firehose stream, using the following command:

sudo service aws-kinesis-agent restart

If everything was configured correctly you should start to see files appear in the S3 bucket within 5–10 mins;

In my solution I have 2 EC2 Instances, so let’s repeat the Installing and Configuring the Kinesis Agent steps on the other EC2 Instance and confirm we are now seeing multiples files delivered to the S3 Bucket like the image below;

If you do not see log files appear in your S3 bucket within 10 mins, you can review the Amazon Kinesis agent logs in the following location to get more details /var/log/aws-kinesis-agent/aws-kinesis-agent.log

Adding Instance Metadata to our logs

In the last section, we installed and configured the Kinesis Agent to stream our log files to a Kinesis endpoint and then into S3, however at the moment these files do not contain any host identifiable information. In this section we will introduce the EC2 Instance Metadata service to enrich this data making these log files more useable.

To add the Instance Metadata information to your log files, you need to do two things. Firstly, add permissions to your Instance Profile to access the Instance Metadaata service and second, Update your Kinesis configuration to include the EC2Metadata option.

Adding additional permissions

To add Instance Metadata, the Kinesis-Agent needs to makes an API call to the EC2Metadata service rather than sending a HTTP request to the IMDS endpoint on http://169.254.168.254/, so to ensure your instance can access the Instance Metadata Service information, lets add the following policy statement to your Instance profile configuration;

{
  "Version": "2012–10–17",
  "Statement": [
    {
      "Sid": "VisualEditor0",
      "Effect": "Allow",
      "Action": [
        "kinesis:PutRecord",
        "kinesis:PutRecords"  
      ],
      "Resource": "arn:aws:kinesis:*:[AccountId]:stream/kinesis-log-stream"
    },
    {
      "Sid": "EC2MetadataAccess",
      "Effect": "Allow",
      "Action": [
        "ec2:DescribeInstances",
        "ec2:DescribeInstanceAttribute",
        "ec2:DescribeInstanceTypes",
        "ec2:DescribeInstanceStatus",
        "ec2:DescribeTags" 
      ],
      "Resource": "*"
    }
  ]
}

Updating the Kinesis Agent configuration

The second step is to update our Kinesis-Agent configuration file with the “dataProcessingOptions” parameter. This should be appended to the existing configurations within this file.

Add the following configuration to the agent.json file, which is located by default in the folder

"dataProcessingOptions": [
  { 
    "optionName": "ADDEC2METADATA",
    "logFormat": "COMMONAPACHELOG"
  }
]

As shown above, we are adding a new configuration object to our log file called dataProcessingOptions and setting two additional parameters. The first is the optionName which lets the Kinesis agent know we want to include Instance Metadata “ADDEC2METADATA” and the second is the logFormat, which should equal COMBINEDAPACHELOG.

For more information about agents and processing options, see Use the agent to Preprocess Data.

Finally restart the Kinesis Agent by issuing the following command

sudo service aws-kinesis-agent restart

Upon restarting your Kinesis-Agent service, you should see the following log entry in the /var/log/aws-kinesis-agent/aws-kinesis-agent.log file

(main) com.amazon.kinesis.streaming.agent.processing.processors.AddEC2MetadataConverter [INFO] Refreshing EC2 metadata

Note: The Kinesis Agent appends all available EC2Metadata into the log files so If you don’t want to display all of these parameters, then you must add additional optional parameters to specific the metadata to include. In my example I want all metadata

With everything now configured and running, your log files delivered to S3 should now contain the additional EC2Metadata information.

Summary

To Summarize, using the EC2Metadata service with your existing kinesis-agent configuration is an effective way to enhance your application log data with important Instance related information as they are streamed into AWS. The example in this post demonstrates the simplicity of configuring a data pre-processing step for including Instance metadata that will improve log file analysis and reduce post ingestion data transformation steps when aggregating application logs across your organization.

Author

Rick O'Sullivan

Rick is a Senior Media and Entertainment Solutions Architect based in Sydney, Australia. Rick spends his time working with Australia's largest Media and Publication customers helping them bring News, Sports and Drama to your home. Rick is a dedicated Father and Husband and in his spare time, builds GenAI applications, plays a few instruments ( poorly ) and dabbles with DJ'ing and video editing.

View all posts Senior Media and Entertainment Solutions Architect

Rick O'Sullivan

Rick is a Senior Media and Entertainment Solutions Architect based in Sydney, Australia. Rick spends his time working with Australia's largest Media and Publication customers helping them bring News, Sports and Drama to your home. Rick is a dedicated Father and Husband and in his spare time, builds GenAI applications, plays a few instruments ( poorly ) and dabbles with DJ'ing and video editing.