A Declarative Approach to Enterprise Data Integration

📌
This post isn’t a hands-on walkthrough, but a reflection on integration challenges I’ve seen in enterprise settings and a closer look at a promising alternative: declarative data integration with Orbital and Taxi.

In many enterprises today, integration work isn’t a one-time effort—it’s an ongoing part of the infrastructure. There are databases and SaaS platforms to read from, APIs to call, and downstream systems to populate: CRMs, MAPs, data lakes, and dashboards.

The glue that connects these systems tends to accumulate over time. It starts with a cron job or a microservice. Over time, it becomes a web of pipelines that are hard to evolve, monitor, or reason about. And when something changes upstream—a new schema, a paginated API—it’s back to patching scripts, adjusting schedules, and redeploying services.

Areas of Strain in Many Enterprises

Here are a few patterns that come up repeatedly:

1. Multi-source data for internal and external use

An organization has a mix of internal systems (like order databases, customer records) and external vendors (shipment APIs, CRMs, ad platforms). Each team requires a slightly different view of this data, either for reporting, enrichment, or integration with third-party systems.

Over time, logic for joins and transformations gets hardcoded into scheduled ETL jobs. And because each destination requires data in a slightly different shape, duplication inevitably occurs.

2. Vendor APIs that round-trip data

Internal teams send data to vendor platforms (loyalty, marketing, billing), and vendors return metrics or enrichments via REST APIs. Integration teams are responsible for both sides.

However, every vendor has its conventions, such as pagination, rate limits, authentication, and naming conventions. Teams end up maintaining a patchwork of fetchers, retries, and custom field mappings to keep data flowing smoothly.

3. Stale dashboards and blocked analysts

Sales and BI teams often wait on nightly pipelines to populate dashboards that could be live—if only the data weren't trapped in disconnected systems.

Even when the data exists, it’s fragmented. An analyst might need product metadata from the ERP, usage data from an API, and customer records from the warehouse. Getting all of it means waiting on multiple jobs to run or asking an engineer to wire it together.


What Orbital & Taxi Bring to the Table

Rather than hand-rolling glue code or batch pipelines, Orbital and Taxi provide a declarative, on-demand integration fabric.

Core building blocks:

  • Connector definitions (YAML): Describe authentication, paging rules, retries, and JSON ↔ model mappings for REST, SQL, streaming, or message queues.
  • Taxi schemas: Define models with semantic types and metadata (e.g., CustomerId, OrderDate) across disparate systems in a shared, evolvable contract.
  • Federated queries: Declare the structure of the result you want—joins, filters, reshaping—without hardcoding where or how the data comes from.

Under the hood, Orbital:

  • Plans and parallelizes calls across sources, running SQL queries and API calls concurrently where possible.
  • Handles paging, retries, back-off, and circuit-breaking for external APIs.
  • Caches responses using in-memory or Redis/Hazelcast, honoring TTLs and Cache-Control headers.
  • Streams data using Server-Sent Events (SSE) or WebSockets—returning the first rows to the client while the rest is still loading.
  • Traces execution across steps, allowing you to view latency per connector, the causes of failures, and retry attempts.

This transition shifts integration from a complex, operationally heavy approach to a declarative and observable one. Schema changes upstream only require updating a model, not rewriting and redeploying a service.


Three Common Scenarios

1. Multi-Source Reporting

Traditional: Three ETL pipelines load into a staging warehouse. A fourth job joins them and populates a dashboard.
With Orbital: A federated query fetches ERP, web, and API data in parallel, joins in memory, and returns JSON on demand.

2. Vendor API Orchestration

Traditional: Dozens of vendor-specific jobs, each with its own fetch loop and retry logic.
With Orbital: One declarative query pushes and fetches across all vendors in a single step, with error-handling and back-pressure built in.

3. Self-Service BI

Traditional: Analysts request changes; engineers tweak jobs or pipelines. Dashboards refresh nightly at best.
With Orbital: Analysts hit a pre-defined endpoint that returns live results, with schema tweaks handled through Taxi.


Why This Matters for Ongoing Ops

You still need to understand your systems and the data they generate. This isn’t “no-code,” and it’s not magic. But by moving integration into a declarative layer, you reduce the cost of change and the weight of infrastructure.

Operational Concern Traditional Approach With Orbital + Taxi
Data freshness Cron-based jobs (hourly/daily) On-demand query execution
Schema changes Requires code updates Update a Taxi model or connector
Error handling Custom retry/backoff logic Built-in across connectors
Monitoring App-level metrics per service Centralized trace & observability
New destinations New service or pipeline Reuse existing query with new shape

Operational Expense Savings

By replacing scheduled, code-driven pipelines with a declarative engine, many teams have seen:

Cost Driver Legacy Approach Declarative Engine OPEX Impact
Engineering hours 1–2 days per new pipeline Minutes to define schemas + query 70–90% reduction
On-call + alerts Dozens of microservices to monitor One service with built-in health checks 50–75% fewer incidents
Cloud compute + storage VMs per job, staging tables Shared runtime, minimal staging 30–60% lower spend
Support + maintenance Repeated logic across repos Single engine upgrade path 40–80% less effort

How Orbital + Taxi Compare to Other Tools

Orbital and Taxi aren’t the only tools in the integration space, but they bring a distinct approach. Where others focus on pipelines, orchestration, or schema sync, Orbital and Taxi focus on real-time, federated access and semantic modeling across diverse systems.

Here’s how they compare to other related tools and platforms:

Tool / Stack Key Focus Compared to Orbital + Taxi
Airbyte, Meltano Batch ETL pipelines Focused on syncing into warehouses, not real-time federation
Apollo, Hasura, StepZen Unified API access across services Mostly GraphQL-specific; less general-purpose across protocols
Dagster, Prefect, Temporal Orchestration and workflow management Great for reliability and retries, but you write the glue logic
dbt SQL-based modeling inside the warehouse Works post-ingestion; can’t query APIs or live systems directly
Finos Legend, Kensho Semantic modeling for regulated industries Focuses on semantics, less on dynamic query execution
Retool, Baserow, NocoDB Low-code access to data Oriented toward UI and CRUD, not full integration pipelines

Orbital + Taxi stand out by offering:

  • Live access across REST, SQL, gRPC, and streams
  • Declarative query planning with retries, caching, and streaming
  • Semantic schemas that survive schema drift and connect systems cleanly

This makes them especially suited for enterprise integration scenarios where freshness, flexibility, and schema governance all matter.

Final Thought

There’s no shortage of ways to move data around. What’s more challenging is maintaining clarity and control as complexity increases. Orbital and Taxi won’t replace thoughtful design, but they provide teams with a cleaner way to describe the data they need, letting a shared engine handle the rest.

For organizations navigating multiple systems, vendor APIs, and integration sprawl, this model provides an opportunity to simplify without compromising flexibility.

It’s not about getting rid of infrastructure. It’s about making less of it yours to carry.


Getting Started

  1. Inventory your current jobs or microservices.
  2. Model each source with a connector definition and Taxi schema.
  3. Compose your federated queries—start with your highest-value report or integration.
  4. Deploy the open-source engine (via Docker or Kubernetes) and point teams at the new endpoints.
  5. Evaluate latency and freshness improvements, then roll into production.

By shifting from “write more code” to “declare your data,” teams recover engineering time, simplify operations, and deliver fresher insights—without overhauling every repository or owning dozens of tiny services.

Further Reading