Published — 8 min read
The enterprise data landscape has fragmented in ways that make connectivity both more important and more complex than at any previous point. The average enterprise now operates 200-plus data sources: cloud applications, on-premises databases, streaming platforms, third-party data providers, IoT sensors, and legacy systems that were never designed to share data. The question is not whether to connect them — the value of cross-source analytics is not in dispute. The question is how to connect them without creating an integration layer that costs more to maintain than the insights it produces are worth.
The connector debt problem is real. A large retail organization recently counted 340 active data pipelines. Of those, 87 had no documented owner. 130 had not been modified in over 18 months and were of uncertain current relevance. 45 were documented as "working but we're not sure what they're used for." The data engineering team was spending 60% of their time on connector maintenance rather than building new analytical capability. The connectors had become the product instead of the means to the product.
Five years ago, building custom connectors was a defensible engineering decision. The commercial connector market was fragmented, coverage was uneven, and many enterprise systems lacked well-documented APIs that third-party connectors could target reliably. Today, the calculus has shifted decisively.
Fivetran, Airbyte, and Stitch have collectively built and maintained connectors to hundreds of sources. These connectors handle the unglamorous but critical details that custom connectors frequently get wrong: API version changes, rate limit handling, incremental sync strategies, schema evolution detection, and error recovery. A commercial connector to Salesforce handles the Salesforce API updates that Salesforce releases multiple times per year; a custom connector does this only if someone on the data engineering team remembers to check and update it.
The economic argument for building is strongest only when: the data source has no commercial connector available; the source has access patterns that commercial connectors cannot support; or compliance requirements mandate that no third-party system handles the data in transit. Outside these specific situations, buying connector infrastructure and redirecting the engineering time to data modeling and analytics is consistently the better trade.
Not all connectors carry equal operational risk. The first step in a principled connector strategy is classifying connectors by their impact profile, which determines how much investment in reliability and monitoring is warranted.
Tier 1: Business-Critical. These connectors feed data that directly affects financial reporting, customer-facing decisions, or operational processes with revenue impact. Salesforce opportunity data feeding revenue forecasting, transaction data from the payment processing system, inventory data feeding supply chain decisions. These connectors need: sub-hour SLA for data delivery, 24/7 monitoring with on-call alerting, quarterly review of connector health and coverage, and documented failover procedures.
Tier 2: Analytical. These connectors provide data used in analytical reporting and decision support, where delays of hours are tolerable but day-level gaps affect reporting quality. Marketing attribution data, product analytics events, customer support tickets. These need: SLA of a few hours for data delivery, business-hours monitoring, monthly health review, and documented recovery procedures but not on-call escalation.
Tier 3: Exploratory. These connectors bring in data used for ad-hoc analysis, research, and experimental use cases. They may have gaps of days without business impact. These need: best-effort monitoring, quarterly review to determine if they are still being used, and a sunset policy for connectors that have not been queried in 90 days.
Classifying all connectors by tier and enforcing different maintenance standards per tier reduces total engineering overhead substantially. The error is treating all connectors with the same (uniformly high) maintenance standard, which is exhausting, or with the same (uniformly low) standard, which leads to silent failures in business-critical pipelines.
The most common cause of connector failure is not infrastructure outage; it is schema evolution. Source systems change their data structures: new fields are added, fields are renamed, data types change, tables are split or merged. Connectors that do not handle schema evolution gracefully fail silently — continuing to run while writing incorrect or incomplete data to the destination.
Schema evolution handling falls into three approaches: fail-on-change (the connector detects schema changes and fails, requiring manual intervention), adapt-on-change (the connector detects schema changes and adapts automatically by adding new columns, renaming mappings, or updating type conversions), and version-on-change (the connector creates a new destination table for each major schema version, preserving historical data under its original schema).
Fail-on-change is the safest approach and the most operationally expensive: every source schema change requires human intervention. Adapt-on-change is convenient but dangerous: automatic adaptation can silently introduce data quality issues if the adaptation logic does not correctly handle all change types. Version-on-change is the most principled approach for analytical workloads, but requires a downstream transformation layer that can merge different schema versions into a unified analytical model.
The practical recommendation is: use adapt-on-change for additive changes (new fields appear in the data and are added to the destination schema) and fail-on-change for breaking changes (field removal, type changes, table restructuring). Additive changes are typically safe to handle automatically; breaking changes require a human to understand the business impact and update downstream transformations accordingly.
When data flows through 200 connectors from 200 sources, governing that data requires infrastructure. Manual governance approaches — relying on engineers to document what data flows where and how it is used — do not scale. The governance requirements that become mandatory at connector scale are:
Data lineage tracking. For any field in the analytical layer, the ability to trace its origin through connectors to source systems. When a financial report is questioned, lineage tracing enables rapid identification of where the number came from and what transformations it passed through. Without lineage, investigations that should take minutes take days.
PII detection and classification. Automated scanning of connector output for fields that may contain personally identifiable information. When a new connector brings in a CRM dataset, automated PII detection identifies fields like email, phone, name, and IP address before they propagate throughout the analytical layer without appropriate access controls.
Access control at connector output. The destination schemas populated by connectors need row-level and column-level access controls that enforce data sensitivity classifications. A connector that ingests employee compensation data should route to a destination schema accessible only to authorized HR and finance roles, not to the general analytical namespace.
Connector usage tracking. Which connectors are actually being used, by whom, and for what queries? Connectors with zero query activity in 90 days are candidates for sunset. Connectors with high query volume are candidates for SLA upgrades. Usage data makes the connector portfolio visible as an operational resource rather than an invisible maintenance burden.
Every connector will fail. API credentials expire. Rate limits are hit during high-load periods. Source systems undergo maintenance windows that are not communicated. The question is not whether connectors will fail but whether downstream analytics will degrade gracefully when they do.
The principle of last-known-good data applies directly to connector architecture. When a Tier 1 connector fails, analytical dashboards should continue serving the most recent successful data with a freshness indicator showing when the data was last updated, rather than going blank or surfacing error messages. Reporting with stale data is better than reporting nothing while users assume the data is current; the freshness indicator makes staleness explicit rather than invisible.
Retry logic with exponential backoff is standard practice for connector failure recovery. What is less commonly implemented but equally important is retry budgets: maximum retry counts after which the system stops attempting recovery and escalates to human attention. Connectors that retry indefinitely can mask systemic problems and exhaust resources needed by other connectors. Failing fast after a defined retry budget and alerting operations is preferable to silent, endless retry loops.
The connector portfolio is infrastructure, not a project. It requires ongoing investment to remain reliable as source systems evolve, business requirements change, and the connector inventory grows. Organizations that treat it as infrastructure and staff it accordingly consistently extract more value from their data than those that treat connectors as solved problems.
Explore how Dataova's connector management layer provides unified monitoring, lineage tracking, and schema evolution handling across all 200+ pre-built connectors.