8.4 C
New York
Wednesday, November 19, 2025

Cross-account lakehouse governance with Amazon S3 Tables and SageMaker Catalog


Organizations more and more face challenges when analyzing information saved throughout a number of AWS accounts and storage codecs. Information groups usually want to question each conventional Amazon Easy Storage Service (Amazon S3) objects and Apache Iceberg tables, resulting in pricey information duplication, potential inconsistencies, and complicated permission administration throughout accounts.

To deal with these challenges, you’ll be able to mix Amazon S3 Tables, which gives native Apache Iceberg help inside S3, with Amazon SageMaker Catalog for unified information governance. This answer helps safe cross-account information entry with out duplicating datasets or compromising safety controls.

On this put up, we stroll you thru a sensible answer for safe, environment friendly cross-account information sharing and evaluation. You’ll discover ways to arrange cross-account entry to S3 Tables utilizing federated catalogs in Amazon SageMaker, carry out unified queries throughout accounts with Amazon Athena in Amazon SageMaker Unified Studio, and implement fine-grained entry controls on the column degree utilizing AWS Lake Formation.

This put up helps you determine correct governance and safety controls for S3 Tables in a multi-account surroundings, enabling safe and environment friendly cross-account information entry.

Answer overview

We stroll you thru implementing a three-account lakehouse governance structure the place you’ll be able to securely share information. As proven within the following diagram, Account A serves as your information producer with S3 Tables, Account B acts as your central governance hub with SageMaker Catalog, and Account C represents your information customers. We’ll display step-by-step the best way to configure cross-account entry and implement governance controls so customers can uncover and question information from each S3 tables and conventional S3 buckets.

Prerequisite and Arrange

On this put up, we deal with the best way to do the cross account arrange and the best way to onboard S3 Tables. All three accounts are in the identical AWS Area. To implement this answer, you will have three particular person accounts (A, B, C). The setup within the accounts ought to appear like the next:

  • Account A (Producer): Create an Amazon S3 Desk on the account.
  • Account B (Central governance and producer): That is one other account the place you could have information in Amazon S3 buckets catalog through Glue Catalog. You’ll onboard these into area portal.
  • Account C (Shopper account): Establish an account the place you could have customers question information utilizing Athena to observe alongside.

The next are the high-level implementation steps for this answer:

Step 1: Configure cross-account affiliation for governance.
Step 2: Create three Venture Profiles in Account B pointing to tables in Account A, B, and C.
Step 3: Create three Tasks.
Step 4: Arrange permissions for Tasks in AWS Lake Formation.
Step 5: In Account B, create Datasource to attach S3 Desk from Account A and Glue Catalog Tables from Account B.
Step 6: Publish and Subscribe to asset.
Step 7: Question S3 desk (Account A) and S3 (Account B) information collectively in SQL editor (Account C).

Step 1

A. Configure cross-account affiliation for governance

On this part, we affiliate Account A and C within the Governance account B.

  1. Open the SageMaker Unified Studio console in Account B.
  2. Navigate to Domains, choose your area, then select the Account associations tab.
  3. Select Request affiliation and enter the Account IDs for Account A and Account C.
  4. Submit the affiliation request and confirm the accounts seem with “Requested” standing.

B. Allow Blueprints in your area in Accounts A, B, and C

The LakeHouseDatabase blueprint allows SageMaker Unified Studio to securely handle, question, and share information from S3, Redshift, and different sources utilizing open requirements—so on this step, you allow it in Accounts A, B, and C to help unified information entry and collaboration.

  1. In Account A, within the SageMaker console, navigate to your area and choose the Blueprints tab.
  2. Choose the LakeHouseDatabase blueprint and select Allow.
  3. Conserving the Permissions and assets part on the default settings, select Allow Blueprint.
  4. Again on the blueprints display screen, choose the Tooling blueprint and select Allow.
  5. Conserving the Permissions and assets part on the default settings, configure the Networking part with the specified VPC and subnet configurations.
  6. Select Allow Blueprint.
  7. Repeat Step1.B and allow the identical blueprints in Account B to make S3 information publishable and Account C so customers can question the info utilizing Athena.

Step 2: Create Venture Profiles in Account B

Use the documentation to create three venture profiles in Account B utilizing the ‘LakeHouseDatabase’ Blueprint, with every profile configured for Accounts A, B, and C respectively. For this put up, we use the next naming conference:

  • datalake-project-profile-s3tables (for Account A)
  • datalake-project-profile (for Account B)
  • datalake-project-profile-consumer (for Account C)

Step 3: Create three Tasks for accounts A, B, and C

  1. Utilizing the documentation, create one Venture in every account. For this put up, we use the next naming conference:
    • ‘producer-s3tables’ – That is configured for Account A
    • ‘producer-s3’ – That is configured for Account B
    • ‘shopper’ – That is configured for Account C
  2. After creating the Venture, find and make observe of the Venture position ARN listed beneath Venture particulars on the venture overview web page.

Step 4: Arrange permissions for Tasks in AWS Lake Formation

In Account A, onboard the S3 desk in SageMaker Lakehouse and grant permissions to the venture position:

  1. Within the AWS Lake Formation console, select Permissions, select Information permissions, after which select Grant.
  2. Select Principals, choose IAM customers and roles, then choose the position generated by the venture producer-s3tables in Step 3.
  3. In LF-Tags or catalog assets, select Named information catalog assets, choose the S3 desk catalog from the Catalogs checklist.
  4. In Catalog permissions, configure the Catalog permissions and grantable permissions. Select Grant to use the next permissions.

In Account A, we repeat these steps for grant permissions to the database:

  1. Within the AWS Lake Formation console, select Permissions, select Information permissions, after which select Grant.
  2. Select Principals, choose IAM customers and roles, then choose the position generated by the venture producer-s3tables in Step 3.
  3. In LF-Tags or catalog assets, select Named information catalog assets, select each the S3 desk catalog and database from their respective dropdown lists.
  4. Configure database permissions and grantable permissions. Select Grant to use the next permissions.

In Account A, repeat these steps for grant permissions to the desk within the database:

  1. Within the AWS Lake Formation console, select Permissions, select Information permissions, after which select Grant.
  2. Select Principals, choose IAM customers and roles, then choose the position generated by the venture producer-s3tables in Step 3.
  3. In LF-Tags or catalog assets, select Named information catalog assets, select each the S3 desk catalog, database, and S3 desk from their respective dropdown lists.
  4. Configure desk permissions and grantable permissions. Select Grant to use the next permissions.

Repeat Step 4 in Accounts B to onboard S3 to SageMaker Lakehouse and grant the required permissions to the position created by your venture for Account B.

Step 5: Create Datasource and onboard S3 Desk from Account A and Glue Catalog Tables from Account B

To allow unified entry and cross-account analytics with information lineage monitoring, you’ll join your SageMaker Unified Studio venture to S3 tables from each accounts:

  1. Navigate to your venture in SageMaker Unified Studio, choose Information sources beneath the Venture catalog part and select Create information supply.
  2. Enter a reputation, description, and choose AWS Glue because the Information supply kind. Underneath Information choice, specify the S3 desk catalog identify.
  3. On this put up, we’ll hold the Publishing setting and Metadata settings because the default configuration.
  4. Select the run desire as Run on demand to manually provoke information supply runs.
  5. Configure any non-obligatory connection settings, akin to importing information lineage or organising information high quality choices. Evaluation your configuration and create the info supply.
  6. As soon as created, run the info supply to import the Glue belongings into your venture’s stock.
  7. Add asset filter to limit shopper entry, On the Asset filters tab, select Add asset filter.
  8. Choose Column because the filter kind, select the columns for shopper entry, and create the asset filter.
  9. Choose the belongings created and select Publish belongings to the SageMaker Unified Studio catalog to make them discoverable by different customers.
  10. Use the documentation so as to add Glue catalog as information supply for S3.

Step 6: Subscribe to the asset from Shopper account in Account C

In Account C, allow the buyer groups to find, request, and subscribe to these belongings for safe, ruled information sharing and collaboration throughout tasks.

  1. In SageMaker Unified Studio, choose the buyer venture.
  2. Use the Uncover menu (prime navigation) and go to Catalog.
  3. Browse or seek for the printed asset (S3 tables from Account A).
  4. Choose the specified asset (S3 tables from Account A) and select Subscribe.
  5. Within the subscription pop-up:
    1. Select the goal venture for asset entry.
    2. Present a brief justification for the entry request.
  6. Submit the subscription request.
  7. Repeat step 6 to allow the buyer (Account C) groups to find belongings in Account B.

Approve or reject a subscription request

  1. In Account A, open the SageMaker Unified Studio portal.
  2. Underneath Venture catalog, Subscription requests, Incoming requests tab find and think about the subscription request.
  3. Evaluation the requester and justification.
  4. Select the choice to approve with row and column filters. For this put up, we use the filter that we created earlier.
  5. Repeat step 6 to allow the buyer (Account C) groups to find belongings in Account B.

Step 7: Analyze S3 desk and S3 information collectively in question editor

Account C (shopper) now has full entry to the buyer information in S3 from Account B, and the daily_sales_by_customer information in S3 tables from Account A with restricted columns. Each datasets include a typical column Customer_id.

To generate mixed insights, belongings from Account A and Account B will be queried and joined on Customer_id.

  1. In SageMaker Unified Studio (shopper venture in Account C), go to the Construct part and choose Question Editor.
  2. Run the next SQL question to hitch the belongings from Account B and Account A on the widespread column Customer_id, enabling unified cross-account analytics.
    SELECT
        c.c_last_name,
        c.c_first_name,
        d.*
    FROM "awsdatacatalog"."glue_db_cqmfkub9co3rqh"."buyer" c
    JOIN "awsdatacatalog"."glue_db_cqmfkub9co3rqh"."daily_sales_by_customer" d
        ON c.c_customer_id = d.customer_id
    LIMIT 10;

This method permits combining filtered, ruled information from a number of accounts right into a single question for complete insights.

Clear up

To keep away from ongoing fees, clear up the assets created throughout this walkthrough. Full these steps within the specified order to facilitate correct useful resource deletion. You would possibly want so as to add respective delete permissions for databases, desk buckets, and tables in case your IAM person or position doesn’t have already got them.

  1. Delete any created IAM roles or insurance policies.
  2. Delete all of the tasks you created within the SageMaker Unified Studio area.
  3. Delete the SageMaker Unified Studio area you created.

Conclusion

On this put up, we explored how Amazon SageMaker Catalog integrates with S3 Tables to supply complete information governance in cross-account environments. We demonstrated how information publishers can onboard S3 Tables to SageMaker Lakehouse whereas information customers can effectively search, request entry, and leverage accredited datasets for analytics and AI improvement.

The mixing between SageMaker Catalog, S3 Tables, and AWS AWS Lake Formation creates a unified governance framework that eliminates information silos whereas sustaining sturdy safety controls. Via automated subscription workflows and fine-grained entry permissions, organizations can implement self-service information entry with out compromising compliance or information high quality.


Concerning the authors

Sneha Rao

Sneha Rao

Sneha is a Options Architect at AWS who helps strategic enterprise clients design architectures on the cloud. She’s enthusiastic about creating inclusive studying experiences that make complicated applied sciences approachable and impactful. Outdoors of labor, Sneha enjoys portray, exploring native espresso outlets, and occurring outside adventures together with her Cavapoo, Taz.

Deepmala Agarwal

Deepmala Agarwal

Deepmala is enthusiastic about serving to clients construct out scalable, distributed, and data-driven options on AWS. When not at work, Deepmala likes spending time with household, strolling, listening to music, watching films, and cooking!

Viral Thakkar

Viral Thakkar

Viral is a Software program Engineer at AWS, engaged on Amazon DataZone with a major deal with distributed methods and information governance with deep experience in constructing large-scale information analytics and pipelining options. He’s enthusiastic about tackling complicated distributed methods challenges whereas additionally creating instruments and automatic scripts that simplify day-to-day workflows and enhance productiveness.

Santhosh Padmanabhan

Santhosh Padmanabhan

Santhosh is a Software program Growth Supervisor at AWS, main the Amazon DataZone engineering staff. His staff designs, builds, and operates companies specializing in information, machine studying, and AI governance. With deep experience in constructing distributed information methods at scale, Santhosh performs a key position in advancing AWS’s information governance capabilities.

Abbas Makhdum

Abbas Makhdum

Abbas is Head of Product Advertising and marketing for Amazon SageMaker Catalog at AWS, the place he leads go-to-market technique and launches for information and AI governance options. With deep experience throughout information, AI, and analytics, Abbas has additionally authored a e-book on information governance with O’Reilly. He’s enthusiastic about serving to organizations unlock enterprise worth by making information and AI extra accessible, clear, and ruled.

Related Articles

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Latest Articles