Scaling knowledge governance with Amazon DataZone: Covestro success story

07 November 2025

35

Covestro Deutschland AG, headquartered in Leverkusen, Germany, is a world chief in high-performance polymer supplies and elements. Covestro has established itself as a key participant within the chemical business, with 48 manufacturing websites worldwide, €14.2 billion 2024 gross sales, and 17,500 workers. Covestro’s core enterprise focuses on creating modern, sustainable options for merchandise utilized in varied elements of each day life. The corporate presents supplies for mobility, constructing and dwelling, electrical and electronics sectors, along with sports activities and leisure, well being, and the chemical business. The corporate’s merchandise, similar to polycarbonates, polyurethanes, coatings, adhesives, and specialty elastomers, are necessary elements in automotive, building, electronics, and medical gadget industries.

To assist this international operation and various product portfolio, Covestro adopted a strong knowledge administration answer. On this publish, we present you the way Covestro remodeled its knowledge structure by implementing Amazon DataZone and AWS Serverless Knowledge Lake Framework (SDLF), transitioning from a centralized knowledge lake to a knowledge mesh structure. Via this strategic shift, groups can share and eat knowledge whereas sustaining prime quality requirements by means of a consolidated knowledge market and enterprise metadata glossary. The consequence: streamlined knowledge entry, higher knowledge high quality, and stronger governance at scale that varied producer and client groups can use to run knowledge and analytics workloads at scale, enabling over 1,000 knowledge pipelines and attaining a 70% discount in time-to-market.

Enterprise and knowledge challenges

Previous to their transformation, Covestro operated with a centralized knowledge lake managed by a single knowledge platform staff that dealt with the information engineering duties. This centralized strategy created a number of challenges: bottlenecks in undertaking supply due to restricted engineering sources, difficult prioritization of use instances, and inefficient knowledge sharing processes. The setup usually resulted in pointless knowledge duplication, which in flip slowed down time-to-market for brand new analytics initiatives, elevated prices, and restricted the power of enterprise models to behave shortly on insights.The dearth of visibility into knowledge property created important operational challenges:

Groups couldn’t discover current datasets, usually recreating knowledge already saved elsewhere
No clear understanding of knowledge lineage or high quality metrics
Issue in figuring out who owned particular knowledge property or who to contact for entry
Absence of metadata and documentation about out there datasets
Departments shared little data about how they have been utilizing knowledge

These visibility points, mixed with the dearth of unified entry controls, led to:

Siloed knowledge initiatives throughout departments
Diminished belief in knowledge high quality
Inefficient use of sources
Delayed undertaking timelines
Missed alternatives for cross-functional collaboration and insights

A strategic answer: Why Amazon DataZone and SDLF?

The challenges Covestro confronted mirror deeper structural limitations of centralized knowledge architectures. As Covestro scaled, central knowledge groups usually turned bottlenecks, and lack of area context led to fragmented high quality, inconsistent requirements, and poor collaboration. As an alternative of centralizing management, a knowledge mesh provides possession to the groups who generate and perceive the information, whereas holding the governance and interoperability constant throughout the group. This makes it well-suited for Covestro’s surroundings, which requires agility, scalability, and cross-team collaboration.

AWS Serverless Knowledge Lake Framework (SDLF) is an answer to those challenges, offering a strong basis for knowledge mesh architectures. Conventional knowledge lake implementations usually centralize knowledge possession and governance, however with the versatile design of SDLF, organizations can construct decentralized knowledge domains that align with trendy knowledge mesh ideas. The framework offers domain-oriented groups with the infrastructure, safety controls, and operational patterns wanted to personal and handle their knowledge merchandise independently, whereas sustaining constant governance throughout the group. Via its modular structure and infrastructure as code templates, SDLF accelerates the creation of domain-specific knowledge merchandise, in order that Covestro’s groups can deploy standardized but customizable knowledge pipelines. This strategy helps the important thing pillars of knowledge mesh: domain-oriented decentralization, knowledge as a product, self-serve infrastructure, and federated governance, offering Covestro with a sensible path to beat the restrictions of conventional centralized architectures.

Amazon DataZone enhances the information mesh implementation by means of a unified expertise for locating and accessing knowledge throughout decentralized domains. As a knowledge administration service, Amazon DataZone helps organizations catalog, uncover, share, and govern knowledge throughout organizational boundaries. It offers a central governance layer the place organizations can set up knowledge sharing agreements, handle entry controls, and allow self-service knowledge entry whereas supporting safety and compliance. Whereas groups can use the SDLF framework to construct and function domain-specific knowledge merchandise, Amazon DataZone enhances it with a searchable catalog enriched with metadata, enterprise context, and utilization insurance policies, making knowledge merchandise simpler to seek out, belief, and reuse.

Via the sharing capabilities of Amazon DataZone, area groups can share their knowledge merchandise with different domains whereas sustaining granular entry controls and governance insurance policies, enabling cross-domain collaboration and knowledge reuse. This integration implies that area groups can publish their SDLF-managed datasets to an Amazon DataZone catalog, so approved customers throughout the group can uncover and entry them. Via the built-in governance capabilities constructed into Amazon DataZone, organizations can implement standardized knowledge sharing workflows, test knowledge high quality, and implement constant entry controls throughout their distributed knowledge system, strengthening their knowledge mesh structure with sturdy knowledge governance and democratization capabilities.Collectively, SDLF and Amazon DataZone present Covestro with a complete answer for implementing a contemporary knowledge mesh structure, enabling autonomous knowledge domains to function with constant governance, seamless knowledge sharing, and enterprise-wide knowledge discovery.

Resolution structure and implementation

The next structure illustrates the high-level design of the information mesh answer. The implementation used a complete AWS answer constructed on AWS providers to create a strong, scalable, and ruled knowledge mesh that serves a number of enterprise domains throughout the Covestro group.

Knowledge area basis: Serverless Knowledge Lake Framework

A key pillar of the implementation is the Serverless Knowledge Lake Framework (SDLF), which offers the foundational infrastructure and safety wanted to assist knowledge mesh methods. SDLF delivers the core constructing blocks for knowledge domains similar to Amazon S3 storage layers, built-in encryption with AWS KMS, IAM-based entry management, and infrastructure as code (IaC) automation. By utilizing these elements, Covestro can deploy decentralized, domain-owned knowledge merchandise quickly whereas sustaining constant governance throughout the enterprise.

The framework makes use of Amazon Easy Storage Service (Amazon S3) as the first knowledge storage layer, delivering nearly limitless scalability and eleven nines of sturdiness for various knowledge property. The proposed S3 bucket structure follows AWS Properly-Architected ideas, implementing a multi-tiered construction with distinct uncooked, staging, and analytics knowledge zones. This layered strategy helps totally different enterprise domains to keep up knowledge sovereignty (every area owns and controls its knowledge, whereas holding accessibility patterns organization-wide).

Safety is a elementary side in Covestro’s knowledge mesh implementation. SDLF mechanically implements encryption at relaxation and in transit throughout knowledge storage and processing elements. AWS Key Administration Service (AWS KMS) offers centralized key administration, whereas rigorously crafted AWS Id and Entry Administration (IAM) roles allow useful resource isolation.

Knowledge processing with AWS Glue

AWS Glue serves because the cornerstone of the information processing and transformation capabilities, providing serverless extract, rework, and cargo ETL providers that mechanically scale primarily based on workload calls for.

Covestro’s pre-existent centralized knowledge lake was fed by greater than 1,000 ingestion knowledge pipelines interacting with quite a lot of supply techniques. To assist the migration of current ingestion and processing pipelines, Covestro developed reusable blueprints that included the event and safety requirements outlined for the information mesh.Covestro launched standardized patterns that groups can deploy throughout a number of domains whereas offering the pliability wanted for domain-specific necessities. These blueprints assist various supply techniques, from conventional databases like Oracle, SQL Server, and MySQL to trendy software program as a service (SaaS) functions similar to SAP C4C.

In addition they developed specialised blueprints for processing, standardizing, and cleansing ingested uncooked knowledge. These blueprints retailer processed knowledge in Apache Iceberg format, mechanically saving metadata within the AWS Glue Knowledge Catalog and offering built-in capabilities to deal with schema evolution seamlessly.

Covestro depends on SDLF to shortly configure and deploy the blueprints as AWS Glue jobs contained in the area. With SDLF, groups deploy a knowledge pipeline by means of a YAML configuration file, and the orchestration and administration mechanisms of SDLF deal with the remaining. The answer contains complete monitoring capabilities constructed on Amazon DynamoDB, offering real-time visibility into knowledge pipeline well being and efficiency metrics (when groups deploy a pipeline by means of SDLF, the system mechanically integrates it with the monitoring setup).

Knowledge high quality with AWS Glue Knowledge High quality

To attain knowledge reliability throughout domains, Covestro prolonged the capabilities of SDLF to include AWS Glue Knowledge High quality into knowledge processing pipelines. This integration allows automated knowledge high quality checks as a part of the usual knowledge processing workflow. Due to the configuration-driven design of SDLF, knowledge producers can implement qc both utilizing beneficial guidelines, that are mechanically generated by means of knowledge profiling, or making use of their very own domain-specific guidelines.

The combination offers knowledge groups with the pliability to outline high quality expectations whereas sustaining consistency in how high quality checks are applied on the pipeline degree. The answer logs high quality analysis outcomes, offering visibility into the information high quality metrics for every knowledge product. These components are illustrated within the following determine.

Enterprise-ready entry management with AWS Lake Formation

AWS Lake Formation integration with the Knowledge Catalog helps the safety and entry management layer that makes the information mesh implementation enterprise-ready. Via Lake Formation, Covestro applied fine-grained entry controls that respect area boundaries whereas enabling managed cross-domain knowledge sharing.

The service’s integration with IAM implies that Covestro can implement role-based entry patterns that align with their organizational construction, so customers can entry the information they want whereas holding applicable safety boundaries.

Knowledge democratization with Amazon DataZone

Amazon DataZone features as the guts of the information mesh implementation. Deployed in a devoted AWS account, it offers the information governance, discovery, and sharing capabilities that have been lacking within the earlier centralized strategy. DataZone presents a unified, searchable catalog enriched with enterprise context, automated entry controls, and standardized sharing workflows that allow true knowledge democratization throughout the group.

Via Amazon DataZone, Covestro established a complete knowledge catalog that helps enterprise customers throughout totally different domains to find, perceive, and request entry to knowledge property with out requiring deep technical experience. The enterprise glossary performance helps constant knowledge definitions throughout domains, eliminating the confusion that always arises when totally different groups use totally different terminology for a similar ideas.

Knowledge product homeowners can use the mixing of Amazon DataZone integration with AWS Lake Formation to grant or revoke cross-domain entry to knowledge, streamlining the information sharing course of whereas supporting safety and compliance necessities.

Managing cross-domain knowledge pipeline dependencies

When implementing Covestro’s knowledge mesh structure on AWS, one of the crucial important challenges was orchestrating knowledge pipelines throughout a number of domains. The core query to handle was “How can Knowledge Area A decide when a required dataset from Knowledge Area B has been refreshed and is prepared for consumption?”.

In a knowledge mesh structure, domains preserve possession of their knowledge merchandise whereas enabling consumption by different domains. This distributed mannequin creates complicated dependency chains the place downstream pipelines should anticipate upstream knowledge merchandise to finish processing earlier than execution can start.

To handle this cross-domain dependency coordination, Covestro prolonged the SDLF with a customized dependency checker part that operates by means of each shared and domain-specific components.

The shared elements encompass two centralized Amazon DynamoDB tables situated in a hub AWS account: one accumulating profitable pipeline execution logs from the domains, and one other aggregating pipeline dependencies throughout the complete knowledge mesh.

These domains deploy native elements similar to a dependency-tracking Amazon DynamoDB desk and an AWS Step Features state machine. The state machine checks conditions utilizing centralized execution logs and integrates seamlessly as step one in each SDLF-deployed pipeline, with out extra configuration. The next diagram reveals the method described.

To forestall round dependencies that would create locks within the distributed orchestration system, Covestro applied a complicated detection mechanism utilizing Amazon Neptune. DynamoDB Streams mechanically replicate dependency modifications from area tables to the central registry, triggering an AWS Lambda perform that makes use of the Gremlin graph traversal language (utilizing pygremlin) to assemble, replace, and analyze a directed acyclic graph (DAG) of the pipeline relationships, with native Gremlin features detecting round dependencies and sending automated notifications, as illustrated within the following diagram. This course of constantly updates the graph to mirror any new pipeline dependencies or modifications throughout the information mesh.

Operational excellence by means of infrastructure as code

Infrastructure as code (IaC) practices utilizing AWS CloudFormation and the AWS Cloud Improvement Equipment (AWS CDK) considerably enhance the operational effectivity of the information mesh implementation. The infrastructure code is version-controlled in GitHub repositories, offering full traceability and collaboration capabilities for knowledge engineering groups. This strategy makes use of a devoted deployment account that makes use of AWS CodePipeline to orchestrate constant deployments throughout a number of knowledge mesh domains.

The centralized deployment mannequin helps that infrastructure modifications comply with a standardized steady integration and deployment (CI/CD) course of, the place code commits set off automated pipelines that validate, take a look at, and deploy infrastructure elements to the suitable area accounts. Every knowledge area resides in its personal separate set of AWS accounts (dev, qa, prod), and the centralized deployment pipeline respects these boundaries whereas enabling managed infrastructure provisioning.

IaC allows the information mesh to scale horizontally when onboarding new domains, supporting the upkeep of constant safety, governance, and operational requirements throughout the complete surroundings. Covestro provisions new domains shortly utilizing confirmed templates, accelerating time-to-value for enterprise groups.

Enterprise impression and technical outcomes

The implementation of the information mesh structure utilizing Amazon DataZone and SDLF has delivered important measurable advantages throughout Covestro’s group:

Accelerated knowledge pipeline growth

70% discount in time-to-market for brand new knowledge merchandise by means of standardized blueprints
Profitable migration of greater than 1,000 knowledge pipelines to the brand new structure
Automated pipeline creation with out handbook coding necessities
Standardized strategy and sharing throughout domains

Enhanced knowledge governance and high quality

Complete enterprise glossary implementation that helps constant terminology
Automated knowledge high quality checks built-in into pipelines
Finish-to-end knowledge lineage visibility throughout domains
Standardized metadata administration by means of Apache Iceberg integration

Improved knowledge discovery and entry

Self-service knowledge discovery portal by means of Amazon DataZone
Streamlined cross-domain knowledge sharing with applicable safety controls
Diminished knowledge duplication by means of improved visibility of current property
Environment friendly administration of cross-domain pipeline dependencies

Operational effectivity

Decreased central knowledge staff bottlenecks by means of domain-oriented possession
Diminished operational overhead by means of automated deployment processes
Improved useful resource utilization by means of elimination of redundant knowledge processing
Enhanced monitoring and troubleshooting capabilities

The brand new infrastructure has basically remodeled how Covestro’s groups work together with knowledge, enabling enterprise domains to function autonomously whereas upholding enterprise-wide requirements for high quality and governance. This has created a extra agile, environment friendly, and collaborative knowledge ecosystem that helps each present wants and future progress.

What’s subsequent

As Covestro’s knowledge platform continues to evolve, the main focus is now to assist area groups to successfully constructed knowledge merchandise for cross area analytics. In parallel, Covestro is actively working to enhance knowledge transparency with knowledge lineage in Amazon DataZone by means of OpenLineage to assist extra complete knowledge traceability throughout a various set of processing instruments and codecs.

Conclusion

On this publish, we confirmed you the way Covestro remodeled its knowledge structure transitioning from a centralized knowledge lake to a knowledge mesh structure, and the way this basis will show invaluable in supporting their journey towards changing into a extra data-driven group. Their expertise demonstrates how trendy knowledge architectures, when correctly applied with the proper instruments and frameworks, can rework enterprise operations and unlock new alternatives for innovation.

This implementation serves as a blueprint for different enterprises seeking to modernize their knowledge infrastructure whereas sustaining safety, governance, and scalability. It reveals that with cautious planning and the proper expertise selections, organizations can efficiently transition from centralized to distributed knowledge architectures with out compromising on management or high quality.

For extra on Amazon DataZone, see the Getting Began information. To study concerning the SDLF, see Deploy and handle a serverless knowledge lake on the AWS Cloud by utilizing infrastructure as code.

Scaling knowledge governance with Amazon DataZone: Covestro success story

Enterprise and knowledge challenges

A strategic answer: Why Amazon DataZone and SDLF?

Resolution structure and implementation

Knowledge area basis: Serverless Knowledge Lake Framework

Knowledge processing with AWS Glue

Knowledge high quality with AWS Glue Knowledge High quality

Enterprise-ready entry management with AWS Lake Formation

Knowledge democratization with Amazon DataZone

Managing cross-domain knowledge pipeline dependencies

Operational excellence by means of infrastructure as code

Enterprise impression and technical outcomes

What’s subsequent

Conclusion

Concerning the authors

Related Articles

How On-line Buying Apps Can Enhance Gross sales: The Final Information

Why Check Environments Fail—and What High Groups Do to Keep away from the Chaos

Cease Paving the Cowpath: Why Agentic-First Is the Solely Option to Construct for the Enterprise

LEAVE A REPLY Cancel reply

Latest Articles

How On-line Buying Apps Can Enhance Gross sales: The Final Information

Why Check Environments Fail—and What High Groups Do to Keep away from the Chaos

Cease Paving the Cowpath: Why Agentic-First Is the Solely Option to Construct for the Enterprise

Organizational Context for AI Coding Brokers with Dennis Pilarinos

The 5 Pillars of Software program Assurance in System Acquisition