AWS
Amazon SageMaker
Data Governance
Atlan
Metadata Management
AWS Machine Learning

Unifying Governance: A Guide to SageMaker Unified Studio and Atlan Integration

D
Data & AI Insights CollectiveDec 22, 2025
7 min read

In the rapidly evolving landscape of 2025, data fragmentation remains the silent killer of enterprise AI initiatives. While technical teams are building sophisticated models in Amazon SageMaker Unified Studio, business stakeholders are often managing governance and compliance in platforms like Atlan. When these two worlds don't communicate, metadata drifts, trust erodes, and AI projects stall.

This guide explores the technical bridge between these environments, enabling you to synchronize metadata, unify business glossaries, and maintain a single source of truth across your entire data stack. By the end of this post, you will understand how to deploy a bidirectional integration that ensures every asset in your SageMaker environment is governed, discoverable, and trusted within Atlan.

The High Cost of Fragmented Metadata

Enterprises today operate in a hybrid reality. Data scientists and ML engineers require the high-performance environment of Amazon SageMaker Unified Studio to iterate on models and process massive datasets. Simultaneously, data stewards and business analysts use Atlan to define business terms, track lineage, and ensure regulatory compliance.

Without a unified bridge, you face several critical risks:

  • Documentation Drift: A dataset's description in the SageMaker Catalog may not match its definition in the corporate glossary.
  • Discovery Barriers: Technical assets created in SageMaker remain invisible to business users who need them for reporting or decision-making.
  • Governance Inconsistency: Security classifications or sensitivity labels applied in one platform fail to propagate to the other, creating compliance gaps.

Integrating these platforms solves these issues by creating a continuous connection. This ensures that whether a user is looking at a data product in SageMaker or a glossary term in Atlan, they are seeing the same, synchronized reality.

Comparing Governance Capabilities

Before diving into the setup, it is helpful to understand how these two platforms complement each other. While there is overlap, their primary focus areas differ, making integration essential for a complete governance strategy.

FeatureAmazon SageMaker Unified StudioAtlan Metadata Workspace
Primary AudienceData Scientists, ML Engineers, DevelopersData Stewards, Analysts, Business Leaders
Asset FocusProjects, Notebooks, Models, Data ProductsTables, Columns, BI Dashboards, Glossaries
Governance StyleTechnical & Operational GovernanceBusiness & Collaborative Governance
Metadata TypeTechnical (Schema, Lineage, Runtime)Active (Social, Business Context, Usage)
Key StrengthDeep integration with AWS compute and AICross-platform visibility and collaboration

Phase 1: The Integration Architecture

The integration follows a phased approach, focusing first on establishing a secure, scalable, and reliable metadata synchronization. This phase leverages standard APIs from both platforms to exchange glossary terms, asset descriptions, and classifications.

Key Capabilities

  1. Secure IAM Handshake: The connection uses AWS Identity and Access Management (IAM) roles to ensure least-privilege access. No long-lived credentials are exchanged; instead, Atlan assumes a specific role in your AWS account.
  2. Bidirectional Sync: Changes made in SageMaker can flow to Atlan, and business definitions refined in Atlan can flow back to SageMaker.
  3. Preservation of Hierarchy: The integration respects parent-child relationships in your glossaries, ensuring your taxonomy remains intact across environments.

Technical Setup: Preparing the AWS Environment

The first step in this integration is establishing trust between your Atlan tenant and your AWS account. This is handled via a CloudFormation template that creates the necessary IAM infrastructure.

1. Obtain the Atlan Node Instance ARN

Before starting the AWS configuration, you must contact your Atlan administrator to get the Amazon Resource Name (ARN) of the Atlan Account Node Instance IAM role. This ARN acts as the "trusted entity" that will be allowed to assume a role in your environment.

2. Deploy the CloudFormation Stack

You will use a YAML template to provision the IAM role and associated policies. This role grants Atlan the specific permissions required to publish metadata to the SageMaker Catalog and read existing assets.

# Representative snippet of the IAM Trust Policy Statement: - Effect: Allow Principal: AWS: "arn:aws:iam::ATLAN_ACCOUNT_ID:role/AtlanNodeInstanceRole" Action: "sts:AssumeRole" Condition: StringEquals: "sts:ExternalId": "YOUR_UNIQUE_EXTERNAL_ID"

When deploying the stack, you must provide three critical parameters:

  • AtlanNodeInstanceRoleArn: The ARN you obtained from your Atlan admin.
  • SMUSDomainId: The unique ID of your SageMaker Unified Studio domain.
  • SMUSProjectsToSync: The specific project IDs you want to include in the synchronization.

3. Capture the Output ARN

Once the stack status reaches CREATE_COMPLETE, go to the Outputs tab. Copy the ARN of the newly created IAM role. You will need this for the Atlan-side configuration.

Technical Setup: Configuring the Atlan Workflow

With the AWS infrastructure in place, you can now configure the Atlan SageMaker Unified Studio connector. This process defines how often metadata is synced and which users have administrative control over the connection.

Step-by-Step Atlan Configuration

  1. Launch the Connector: In your Atlan tenant, navigate to the Marketplace and search for "AWS SageMaker Unified Studio."
  2. Authentication: Provide the IAM Role ARN you copied from the CloudFormation output. Select your specific AWS Region (e.g., us-east-1).
  3. Connection Management: Assign Connection Admins. These users will have the authority to edit metadata filters, manage persona-based policies, and troubleshoot the sync.
  4. Glossary Mapping: Choose a "Top-level glossary" in Atlan to act as the container for your SageMaker assets. This is where the SageMaker projects, domains, and data products will appear.
  5. Preflight Checks: Always run the Quick test before the first full synchronization. This validates that the IAM role has the correct permissions to access the SageMaker APIs.

Understanding the Bidirectional Workflow

The real power of this integration lies in its bidirectional nature. It isn't just a one-way dump of data; it is a live conversation between two platforms.

Technical Metadata Flow (SageMaker to Atlan)

When a data scientist creates a new data product or a project in SageMaker Unified Studio, the metadata (including column descriptions and schema) is captured by the Atlan connector. This information is then available in Atlan for business users to discover. This eliminates the need for manual documentation in two places.

Business Context Flow (Atlan to SageMaker)

Conversely, when a data steward adds a business definition or a sensitivity tag to a term in Atlan, that context is pushed back into the SageMaker Catalog. When the data scientist opens their project in Unified Studio, they see the updated business context directly within their technical environment.

Best Practices for 2025 Metadata Governance

To get the most out of this integration, you should implement these advanced strategies:

  • Automate the Sync Schedule: While on-demand sync is useful for testing, production environments should use a scheduled workflow (e.g., every 4-6 hours) to ensure documentation never falls behind.
  • Leverage Metadata Forms: Use SageMaker Unified Studio's metadata forms to capture custom attributes that are specific to your industry (e.g., "Model Fairness Score" or "PII Category"). These will propagate to Atlan seamlessly.
  • Implement Least Privilege: Periodically audit the IAM role created by the CloudFormation stack. Ensure that only the necessary SageMaker projects are included in the SMUSProjectsToSync parameter.
  • Monitor Sync Logs: Atlan provides detailed logs for every workflow run. Monitor these for failed asset ingestions, which often indicate permission changes or schema drifts in the underlying data sources.

Tecyfy Takeaway

Unifying governance between Amazon SageMaker Unified Studio and Atlan is no longer an optional luxury; it is a requirement for scalable AI. By bridging these two platforms, you empower your technical teams to build with speed while giving your business teams the visibility they need to maintain trust.

Actionable Next Steps:

  1. Audit your current metadata: Identify which SageMaker projects are high-priority for business discovery.
  2. Deploy the IAM Role: Use the CloudFormation template to establish the secure handshake between AWS and Atlan.
  3. Set up a Pilot Sync: Connect a single SageMaker project to Atlan and verify that glossary terms and descriptions flow correctly in both directions.
  4. Scale to the Enterprise: Once the pilot is successful, expand the integration to all production domains to create a truly unified data catalog.

Share this article