A data pipeline takes data from a source system (SaaS app, database, event stream, file drop) and moves it through stages ingest, land, transform, model, serve, consume until it arrives somewhere a person or system can use it (a dashboard, an ML model, an API).
Orchestration ties the stages together; observability catches breakage before it hits the dashboard. This page walks through each.
The Seven Stages
|
Stage |
Job |
|
Source |
Where the data lives (SaaS, DB, event, file) |
|
Ingest |
Move data into the platform |
|
Landing / raw |
Store source-faithful copy |
|
Transform |
Clean, deduplicate, type-cast |
|
Model |
Join into business-meaningful tables |
|
Serve |
Make available to consumers |
|
Consume |
BI, ML, applications, exports |
Source Systems
Salesforce, HubSpot, Shopify, Stripe, NetSuite, application databases, event streams, file drops, third-party data feeds.
Every business has a dozen or more. Source-system shape changes; pipelines need to be resilient to schema evolution.
Ingestion
Connectors (Fivetran, Airbyte, Stitch), CDC streams (Debezium), event ingestion (Kafka, Kinesis), or custom Python. Choose by source-system type, latency requirement, and operational maturity.
Landing / Raw Zone
Source-faithful copy of the data same shape as it arrived, minimal transformation. The landing zone is your insurance: if downstream transforms break, you can rebuild from raw without re-ingesting from source. For how this fits the broader architecture, see What is Data Warehousing.
Transformation
Clean (handle nulls, normalize formats), deduplicate, type-cast, apply business rules. Traditionally done in scripts; in 2026, dominantly done with dbt or similar in-warehouse SQL transformations.
Modeling
Join cleaned tables into business-meaningful entities customer, order, product, subscription. Apply dimensional modeling (Kimball), wide tables, or Data Vault depending on use case.
The modeled layer is what analysts and ML actually query. See What Data Warehousing Allows Organizations to Achieve for why this layer matters.
Serving
Expose modeled tables via the warehouse for BI; via a feature store for ML; via reverse-ETL back into operational systems; via APIs for applications. Serving depends on the consumer.
Consumption
BI dashboards (Looker, Power BI, Tableau, Mode), ML training and inference, application embedded analytics, executive reports, data exports. The end of the pipeline is where the business actually uses the data.
Orchestration and Observability
Orchestration (Airflow, Dagster, Prefect) schedules and chains the stages, handles dependencies, and reruns failures.
Observability (Monte Carlo, Great Expectations, Soda, custom dbt tests) catches data quality and freshness issues before consumers do. A solid data governance framework is what makes this stick long-term.
Without both, pipelines silently fail and dashboards lie. Centric builds reliable data pipelines through its data engineering and warehousing service.
Frequently Asked Questions
How does a data pipeline work?
Seven stages source, ingest, land, transform, model, serve, consume tied together by orchestration and watched by observability.
What is a landing zone?
A source-faithful copy of incoming data. It’s the insurance layer that lets you rebuild downstream tables without re-ingesting from source.
What tools are used at each stage?
Ingestion: Fivetran / Airbyte / custom. Transform / Model: dbt + SQL. Orchestration: Airflow / Dagster. Observability: Monte Carlo / Great Expectations / dbt tests. (Tooling matters less than discipline.)
How often should pipelines run?
Depends on use case daily for most BI; intra-day or near-real-time for operational dashboards and ML; streaming for transactional / fraud / monitoring. See What is a Data Pipeline for a deeper breakdown.
Conclusion
A working data pipeline is invisible to users. A broken one is screaming. Building the visible-when-broken pipeline takes deliberate engineering orchestration, observability, modeled layers, and the landing-zone insurance behind it all.
The pipeline is what determines whether your analytics is honest; invest in it like it matters, because it does. At Centric, we build pipelines that are reliable by design not by luck.
