v0.3.0

Data Quality

Define quality rules, evaluate data, and track quality scores.

Overview

Data Quality lets you define rules that continuously evaluate your data and surface issues before they reach downstream consumers.

Rule Types

Dimension What It Measures Example Rule
Completeness Missing or null values “temperature must not be null”
Accuracy Values within expected range “temperature between -50 and 200”
Consistency Cross-field agreement “start_time < end_time”
Validity Format and type correctness “serial_number matches regex ^[A-Z]{2}\d{6}$”
Timeliness Data freshness “last record within 5 minutes”

Creating Rules

  1. Navigate to Analyze > Data Quality
  2. Click Create Rule
  3. Select the target data source or pipeline
  4. Choose the dimension and define the condition
  5. Set severity (info, warning, critical)
  6. Save

Evaluation

Rules are evaluated on a schedule or triggered by pipeline events:

  • Scheduled: Runs at a configured interval (e.g., every 5 minutes)
  • On-write: Evaluates when new data arrives via a pipeline

Each evaluation produces a pass/fail result per rule with details on failing records.

Quality Scores

Each data source receives an aggregate quality score (0-100%) based on:

  • Percentage of passing rules
  • Severity weighting (critical rules count more)
  • Recent trend (improving or degrading)

View scores on the Data Quality dashboard with historical trend charts.

Quality Alerts

Attach alert actions to quality rules:

  1. Open a rule
  2. Under Alerts, configure a notification channel
  3. When the rule fails, an alert fires to the configured channel

See Alerts for channel configuration details.

Monitoring

The Data Quality overview page shows:

  • Overall quality score per data source
  • Failing rules with severity indicators
  • Trend over time (last 24h, 7d, 30d)
  • Drill-down to individual rule results
esc