32.4 C
New York
Monday, June 30, 2025

Introducing AWS Glue Information Catalog utilization metrics for API utilization


We’re excited to announce AWS Glue Information Catalog utilization metrics. The utilization metrics is a brand new characteristic that gives native integration with Amazon CloudWatch. This characteristic supplies you with quick visibility into your AWS Glue Information Catalog API utilization patterns and tendencies.

AWS Glue Information Catalog is a centralized repository that shops metadata about your group’s datasets. With its unified interface that acts as an index, you possibly can retailer and question details about your knowledge sources, together with their location, codecs, schemas, and runtime metrics.

As you scale your lakehouse structure on Amazon Net Providers (AWS) and keep dependable knowledge operations, observability and monitoring turns into important to understanding and optimizing Information Catalog API usages.

With Information Catalog utilization metrics in CloudWatch, you possibly can obtain the next:

  • Monitor API name patterns at 1-minute intervals
  • Proactively request service quota enhance for API fee limits
  • Allow the CloudWatch pre-built anomaly detection characteristic to determine abnormalities in your API utilization
  • Perceive lakehouse utilization throughout greater than 50 APIs

On this publish, we show learn how to entry these metrics, present a step-by-step walkthrough, and arrange significant alarms.

Entry Information Catalog utilization metrics in Amazon CloudWatch console

To entry Information Catalog utilization metrics, full the next steps:

  1. Open Amazon CloudWatch console
  2. Below Metrics, select All metrics
  3. Within the search bar, enter Glue and select Enter
  4. Select Utilization > By AWS Useful resource, as proven within the following screenshot

  1. The Metrics part opens and shows totally different catalog utilization metrics that you may choose from to create dashboards and alarms, as proven within the following screenshot

Monitor CallCount metrics

Every Amazon CloudWatch metric for Information Catalog is of a kind API and set as CallCount. Which means for every API name on that particular useful resource (for instance, GetConnection API) shall be logged as one rely. These metrics can seamlessly combine into your current CloudWatch dashboards, or you should utilize them to create new ones. For proactive monitoring, you possibly can configure customized alarms that set off routinely when this API utilization exceeds your outlined thresholds, serving to you adjust to service limits.

Below the Graphed metrics tab, you possibly can present extra customizations to match your monitoring wants. Within the Particulars column, you possibly can create alarms and allow anomaly detection to determine uncommon patterns.

To assist with efficient API monitoring, CallCount metrics particularly concentrate on profitable API calls. This fashion, you’ve extra exact monitoring and may troubleshoot several types of API behaviors. The next screenshot exhibits the AWS Glue utilization metrics view for GetTables API.

Within the Statistics column, you possibly can view your API utilization past the default Sum, Min, and Max metrics. Now you can choose all kinds of statistical strategies to investigate your utilization patterns, as proven within the following screenshot.

Metrics and dimensions for Information Catalog utilization metrics

Information Catalog utilization metrics use the AWS/Utilization namespace and supply CallCount metrics. These metrics are revealed with the size Service, Useful resource, Sort and Class.

The CallCount metric doesn’t have a specified unit. Essentially the most helpful statistic for the metric is SUM, which represents the overall operation rely for the 1-minute interval. An necessary observe is that the metric worth is emitted at 1-minute intervals. Lowering the interval additional (for instance, to 1 second) received’t change the emittance interval.

Metrics

MetricDescription
CallCountThe variety of specified operations carried out in your account.

Dimensions

Dimension keyDimension worthDescription
ServiceAWS GlueThe identify of the AWS service containing the useful resource. For Information Catalog utilization metrics, the worth for this dimension is AWS Glue.
SortAPIThe kind of useful resource being tracked. Presently, when the Service dimension is AWS Glue, the one legitimate worth for Sort is API.
Useful resource<API identify>

The identify of the API operation. Legitimate values embody the next:

GetCatalogs, GetCatalog, GetDatabases, GetDatabase, GetTables, GetTable, GetTableVersion, GetTableVersions, SearchTables, GetPartitionIndexes, GetColumnStatisticsForTable, GetPartition, GetPartitions, BatchGetPartition, GetColumnStatisticsForPartition, GetConnection, GetConnections, GetUserDefinedFunction, GetUserDefinedFunctions, GetCatalogImportStatus, GetTableOptimizer, BatchGetTableOptimizer, ListTableOptimizerRuns, CreateCatalog, CreateDatabase, CreateTable, CreatePartitionIndex, CreatePartition, BatchCreatePartition, CreateConnection, CreateUserDefinedFunction, CreateTableOptimizer, UpdateCatalog, UpdateDatabase, UpdateTable, UpdateColumnStatisticsForTable, UpdatePartition, BatchUpdatePartition, UpdateColumnStatisticsForPartition, UpdateConnection, UpdateUserDefinedFunction, UpdateTableOptimizer, DeleteCatalog, DeleteDatabase, DeleteTable, BatchDeleteTable, DeleteTableVersion, DeletePartitionIndex, DeleteColumnStatisticsForTable, DeletePartition, BatchDeletePartition, DeleteColumnStatisticsForPartition, DeleteConnection, BatchDeleteConnection, DeleteUserDefinedFunction, DeleteTableOptimizer, TestConnection, ImportCatalogToGlue

ClassNoneThe category of useful resource being tracked. Information Catalog utilization metrics use this dimension with a worth of None.

Arrange CloudWatch alarms for Information Catalog utilization metrics

Information Catalog has outlined guidelines to handle atypical utilization patterns that restrict the shopper name fee on the granularity of requests per second. You’ll be able to generate CloudWatch alarms utilizing the CallCount metric in order that restrict will increase could be performed proactively. To configure a CloudWatch alarm with this threshold, full the next steps:

  1. On the CloudWatch metrics console, choose one of many obtainable metrics, as proven within the following screenshot. On this instance, we choose the useful resource GetTables. You’ll be able to choose a number of metrics to suit your use case.

  1. Select Graphed metrics.
  2. Select Sum as the first statistic.
  3. Set interval to 1 minute.

  1. Select Particulars and Create Alarm.

  1. For Threshold kind, select Anomaly Detection. You can too choose Static based mostly in your necessities and after you’ve decided a selected threshold worth.
  2. Set the Anomaly detection threshold to 2 (default). The brink worth is used to find out the conventional vary of values for the metric. The next worth produces a thicker band of regular values. For extra info on how CloudWatch anomaly detection works, check with How CloudWatch anomaly detection works.
  3. Select Subsequent.
  4. For Ship a notification to the next SNS subject, select Create new subject.
  5. For Create a brand new subject, enter your Amazon Easy Notification Service (Amazon SNS) subject identify.
  6. For E-mail endpoints that may obtain the notification, enter your e-mail deal with. On this instance, we’re going to create a brand new SNS subject. Nevertheless, you should utilize your current SNS subjects or use different choices akin to AWS Lambda or auto scaling motion.
  7. Select Create subject.

  1. Scroll down and select Subsequent.
  2. Enter an alarm identify and an outline and select Subsequent.
  3. Assessment all the main points you’ve entered and select Create alarm, as proven within the following screenshot.

By following these steps, you’ve efficiently configured a CloudWatch alarm utilizing anomaly detection that screens your Information Catalog utilization with the brink that you simply set. The alarm will set off when the CallCount metric exceeds the calculated threshold, sending notifications to your specified SNS subject and e-mail endpoints.

This proactive monitoring method prevents API fee restrict points and supplies a easy operation of your Information Catalog utilization. For extra info on utilizing CloudWatch alarms, check with Utilizing Amazon CloudWatch alarms.

Conclusion

AWS Glue Information Catalog utilization metrics is an efficient enhancement to your knowledge infrastructure monitoring capabilities. It addresses the rising want for detailed observability by means of Amazon CloudWatch in fashionable knowledge architectures constructed on prime of Information Catalog. You now have entry to extra granular statistics, transferring past easy most and common request metrics to complete efficiency indicators together with p99 percentiles. These metrics are emitted in 1-minute intervals, offering visibility into your knowledge catalog operations. Organizations can now proactively determine bottlenecks earlier than they have an effect on operations and effectively conduct capability planning by means of detailed utilization patterns.

From constructing monitoring dashboards to organising alerts, the native help with CloudWatch anomaly detection and versatile alarm configurations makes it simple to proactively monitor your lakehouse deployment and stop abnormalities in your lakehouse utilization. For extra info, check with Monitoring Information Catalog utilization metrics in Amazon CloudWatch within the AWS Glue documentation. We advocate testing and utilizing these metrics as a part of your fashionable monitoring and observability technique. We encourage you to share your suggestions with us.

Particular because of everybody who contributed to this launch: Vineet Sunkavalli, Shubham Bansal, Mike Kloss, Zarius Dubash.


In regards to the authors

David Zhang is an Analytics Options Architect specializing in designing and implementing large-scale knowledge infrastructure, ETL processes, and in depth knowledge administration methods. He helps clients modernize knowledge platforms on Amazon Net Providers (AWS). David can also be an energetic speaker at AWS occasions and contributor to technical content material and open supply initiatives. He enjoys enjoying volleyball, tennis, and basketball throughout his free time.

Noritaka Sekiyama is a Principal Large Information Architect with Amazon Net Providers (AWS) Analytics companies. He’s liable for constructing software program artifacts to assist clients. In his spare time, he enjoys biking on his highway bike.

Sandeep Adwankar is a Senior Product Supervisor at AWS. Primarily based within the California Bay Space, he works with clients across the globe to translate enterprise and technical necessities into merchandise that allow clients to enhance how they handle, safe, and entry knowledge.

Abhay Joshi is a Software program Growth Engineer at AWS Glue and AWS Lake Formation. He’s obsessed with constructing fault tolerant and dependable distributed methods at scale.

Related Articles

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Latest Articles