top of page

Create custom Metrics for AWS Glue Jobs.


LEVEL: INTERMEDIATE

 

As you know, CloudWatch lets you publish custom metrics from your applications. These are metrics that are not provided by the AWS services themselves.

Traditionally, custom metrics were published to CloudWatch by applications by calling CloudWatch’s PutMetricData API, most commonly through the use of AWS SDK for the language of your choice.

With the new CloudWatch Embedded Metric Format (EMF), you can simply embed the custom metrics in the logs that your application sends to CloudWatch, and CloudWatch will automatically extract the custom metrics from the log data. You can then graph these metrics in the CloudWatch console and even set alerts and alarms on them like other out-of-the-box metrics.

This works anywhere you publish CloudWatch logs from, EC2 instances, on-prem VMs, Docker/Kubernetes containers in ECS/EKS, Lambda functions, etc.


In this case, we center on custom metrics for AWS Glue Job execution. The final aim of this task is to create a Cloudwatch Alarm to identify if a Glue Job execution was successful or not. The proposed solution to this is the one shown in the following diagram.






Infraestructure

We will create the infrastructure and permissions needed with Terraform.


resource "aws_cloudwatch_event_rule" "custom_glue_job_metrics" {


name = "CustomGlueJobMetrics"

description = "Create custom metrics from glue job events"


is_enabled = true


event_pattern = jsonencode(

{

"source": [

"aws.glue"

],

"detail-type": [

"Glue Job State Change"

]

}

)

}



resource "aws_cloudwatch_event_target" "custom_glue_job_metrics" {


target_id = "CustomGlueJobMetrics"

rule = aws_cloudwatch_event_rule.custom_glue_job_metrics.name

arn = aws_lambda_function.custom_glue_job_metrics.arn


retry_policy {

maximum_event_age_in_seconds = 3600

maximum_retry_attempts = 0

}

}

resource "aws_lambda_function" "custom_glue_job_metrics" {


function_name = "CustomGlueJobMetrics"


filename = "python/handler.zip"

source_code_hash = filebase64sha256("python/handler.zip")

role = aws_iam_role.custom_glue_job_metrics.arn

handler = "handler.handler"

runtime = "python3.9"

timeout = 90


tracing_config {

mode = "PassThrough"

}

}

resource "aws_lambda_permission" "allow_cloudwatch" {


statement_id = "AllowExecutionFromCloudWatch"

action = "lambda:InvokeFunction"

function_name = aws_lambda_function.custom_glue_job_metrics.function_name

principal = "events.amazonaws.com"

source_arn = aws_cloudwatch_event_rule.custom_glue_job_metrics.arn

}



resource "aws_iam_role" "custom_glue_job_metrics" {


name = "CustomGlueJobMetrics"


assume_role_policy = jsonencode(

{

Version : "2012-10-17",

Statement : [

{

Effect : "Allow",

Principal : {

Service : "lambda.amazonaws.com"

},

Action : "sts:AssumeRole"

}

]

})

}

resource "aws_iam_role_policy" "custom_glue_job_metrics" {


name = "CustomGlueJobMetrics"

role = aws_iam_role.custom_glue_job_metrics.id


policy = jsonencode({

Version : "2012-10-17",

Statement : [

{

Effect : "Allow",

Action : [

"logs:CreateLogGroup",

"logs:CreateLogStream",

"logs:PutLogEvents"

],

Resource : "arn:aws:logs:*:*:*"