Databricks January 2026: GPT-5.1 Codex and Agent Skills
A collaborative team of Data Engineers, Data Analysts, Data Scientists, AI researchers, and industry experts delivering concise insights and the latest trends in data and AI.
Overview of the January 2026 Updates
Databricks has started 2026 by significantly narrowing the gap between raw data engineering and production-grade AI. The January 2026 release cycle introduces specialized OpenAI models, matures the "Agent Bricks" ecosystem, and refines the ingestion layer through Lakeflow Connect. For those of you managing complex data estates, the most notable shift isn't just the inclusion of new models, but the move toward more efficient, incremental data handling and modular AI assistant capabilities.
In this technical breakdown, you will find details on the new GPT-5.1 Codex models, the general availability of the Knowledge Assistant, and the architectural shifts in Databricks Runtime 18.0. Whether you are a data engineer looking to optimize Salesforce ingestion or a machine learning engineer building specialized AI agents, these updates change how you interact with the Data Intelligence Platform.
OpenAI GPT-5.1 Codex: Specialized Code Intelligence
Mosaic AI Model Serving now supports the latest specialized models from OpenAI: GPT-5.1 Codex Max and GPT-5.1 Codex Mini. These are not general-purpose LLMs; they are purpose-built for the software development lifecycle. By hosting these models directly within the Databricks environment, the platform provides a more integrated experience for developers using Foundation Model APIs.
Max vs. Mini: Choosing the Right Model
The choice between Codex Max and Codex Mini depends on your specific use case and latency requirements.
- GPT-5.1 Codex Max: This model is designed for high-reasoning tasks. If you are performing large-scale refactoring of legacy ETL pipelines or generating complex unit tests for Spark applications, Max provides the necessary depth. It excels at understanding long-range dependencies in codebases.
- GPT-5.1 Codex Mini: This is the optimized, low-latency version. It is ideal for real-time code completion (IDE integration) or simple documentation generation where speed is more critical than deep architectural reasoning.
Access and Compliance
You can access these models via the Foundation Model APIs using a pay-per-token billing model. This is a significant shift from the older provisioned throughput requirements, allowing you to scale costs directly with usage. However, it is important to remember that while Databricks hosts the models, you remain responsible for adhering to OpenAI’s Acceptable Use Policy. This is particularly relevant when using AI to generate security-sensitive code or handling regulated data.
Extending the Assistant with Agent Skills
One of the most practical additions this month is the ability to create custom skills for the Databricks Assistant. In the past, the Assistant was limited to the context provided by Databricks. Now, you can extend its capabilities to handle domain-specific tasks unique to your organization.
The Open Agent Skills Standard
Databricks has adopted the Agent Skills standard. This is an open specification that allows you to define tools and functions that an AI agent can call. When you create a skill, you are essentially giving the Assistant a new "tool" in its toolbox. For example, you could create a skill that allows the Assistant to:
- Query a specific internal API to check data lineage.
- Trigger a specific Airflow DAG or Databricks Job.
- Format data according to a proprietary internal schema.
These skills are automatically loaded in "agent mode" when the Assistant detects that a user's request aligns with the skill’s definition. This modularity means you don't have to build a new chatbot from scratch; you simply enhance the one already integrated into your workspace.
Agent Bricks: Knowledge Assistant Goes GA
Agent Bricks is the Databricks framework for operationalizing AI agents. The Knowledge Assistant, a key component of this framework, is now generally available (GA) in select US regions. This tool is designed to solve the common "RAG (Retrieval-Augmented Generation) at scale" problem.
Production-Grade RAG
Most developers can build a basic RAG demo, but moving it to production is difficult due to citation accuracy and data freshness. The Knowledge Assistant simplifies this by providing:
- High-quality responses with citations: Every answer includes links back to the source documents stored in your Lakehouse.
- Streamlined operationalization: It handles the vector database synchronization and embedding logic behind the scenes.
Currently, this GA status applies to workspaces without Enhanced Security and Compliance (ESC) features. If your workspace requires ESC, Databricks has indicated that GA for those environments is coming soon. This is a critical distinction for those in highly regulated industries like finance or healthcare.
Lakeflow Connect: Efficient Data Ingestion
Data ingestion is often the most expensive and brittle part of the data pipeline. The January updates to Lakeflow Connect focus on two areas: incremental ingestion for Salesforce and row-level filtering.
Incremental Salesforce Formula Fields
Salesforce formula fields are notoriously difficult to ingest because they are calculated at runtime within Salesforce. Traditionally, Lakeflow Connect had to take a full snapshot of these fields during every run, which is computationally expensive and slow.
Databricks has introduced a Beta feature that allows for incremental ingestion of formula fields. By enabling this, you only ingest the changes rather than the entire table. For large Salesforce objects (like Opportunity or Account), this can lead to a massive reduction in both API calls to Salesforce and processing time within Databricks.
Row Filtering for Managed Connectors
Another significant performance booster is the introduction of row filtering for Google Analytics, Salesforce, and ServiceNow connectors. This functions like a SQL WHERE clause at the source.
| Feature | Previous Behavior | New Behavior (Beta) |
|---|---|---|
| Data Volume | Ingests all records from the source table. | Ingests only records matching specific criteria. |
| Cost | High egress and storage costs for irrelevant data. | Reduced costs by filtering data before it hits the Lakehouse. |
| Privacy | Requires post-ingestion masking for sensitive rows. | Prevents sensitive rows from ever being ingested. |
This is particularly useful for multi-tenant environments where you may only want to ingest data related to a specific region or department.
Infrastructure and Runtime 18.0
On the infrastructure side, Databricks Runtime 18.0 is now GA. While the full release notes for 18.0 contain a deep list of library updates, the move to GA signifies that this version is now recommended for production workloads. It typically includes the latest optimizations for Photon, the vectorized execution engine, and updated versions of Apache Spark.
Unified Lakebase Interface
Databricks is also consolidating its UI. The new Lakebase App (accessible via the app switcher) now hosts both Lakebase Provisioned and Lakebase Autoscaling management. Previously, you had to navigate to the Compute tab to manage these resources. This shift suggests that Databricks is moving toward a more "app-centric" interface, where specific administrative tasks are siloed into dedicated tools rather than buried in general compute settings.
Tecyfy Takeaway
The January 2026 updates show Databricks doubling down on two fronts: specialized AI and ingestion efficiency. To make the most of these changes, you should consider the following actions:
- Audit Ingestion Pipelines: If you use Salesforce or ServiceNow, test the new row filtering and incremental formula features in a staging environment to identify potential cost savings.
- Experiment with Assistant Skills: Identify repetitive domain-specific tasks your team performs and build an "Agent Skill" using the open standard to automate them within the Databricks UI.
- Evaluate Codex for DevEx: If your team is struggling with legacy code migration, test GPT-5.1 Codex Max via the Foundation Model API; its specialized training in code logic often outperforms general-purpose models for complex refactoring.
- Transition to Runtime 18.0: Start planning the upgrade of your production jobs to DBR 18.0 to take advantage of the latest performance patches and security updates.
