Skip to main content

Data Quality Validation

The Data Quality Validation activity runs a set of configurable data quality checks against one or more staging tables and writes the results — including error and warning counts — to workflow context variables.

Purpose

Use the Data Quality Validation activity to:

  • Validate staging data before loading it into the model
  • Catch data quality issues (nulls, out-of-range values, unexpected distributions) early in the pipeline
  • Gate downstream processing on data quality: fail the workflow if errors exceed a threshold
  • Produce an audit trail of data quality results in the workflow run log

Configuration

Staging Tables

One or more staging tables to check. For each table you specify:

  • Staging Table — select the table from the dropdown
  • Settings — click the gear icon to open the data quality rule editor for that table

Each table can have its own independent set of rules.

Context Keys

Three workflow context variables are populated after the activity runs:

KeyDefault NameDescription
Activity Run IDDataQualityValidationWorkflowActivity_ActivityRunIdContextKeyThe ID of this activity run (useful for linking to log entries)
Number of ErrorsDataQualityValidationWorkflowActivity_NoOfErrorsContextKeyCount of rules that exceeded the error threshold
Number of WarningsDataQualityValidationWorkflowActivity_NoOfWarningsContextKeyCount of rules that exceeded the warning threshold

Rename these keys to something meaningful (e.g. DQErrors, DQWarnings) when you have multiple validation activities in the same workflow.

Data Quality Rules

Each rule is configured per column within a staging table. Rules check the percentage of rows that meet a given condition. The result is compared against an error threshold — if the percentage exceeds the threshold, it counts as an error.

Each rule result is logged to the workflow run log in the format:

Data Quality Check = <Rule> -> <TableName>.<ColumnName>[<Value>] = <Percent>%

Results above the error threshold are logged at Error level. Results below the error threshold are logged at Warning level.

Behavior

  1. For each configured staging table, all rules are evaluated.
  2. Each rule result is logged.
  3. Error and warning counts are accumulated across all tables.
  4. All three context key values are written to the workflow context.
  5. The activity always returns Success — it does not automatically fail the workflow if errors are found.

The activity itself always succeeds. To act on validation failures, read {{var:DQErrors}} in a downstream If activity and route to a Stop or notification if the count is above zero.

Usage Patterns

Validate Before Loading

Remote Data Load ──> Data Quality Validation ──> If ({{var:DQErrors}} > 0)
└── Tables: StagingActuals ├─[Yes]──> Email ("Validation failed") ──> Stop
└─[No]───> ETL ("Import_Actuals")

Gate on Zero Errors

Data Quality Validation
└── Errors Key: ValidationErrors


If ({{var:ValidationErrors}} == 0)
├─[Yes]──> Continue pipeline
└─[No]───> Log Message + Stop

Usage Notes

  • At least one staging table must be configured — the workflow will fail validation (not save) if the list is empty.
  • Rules are configured per table per column in the settings panel. The specific rule types (null checks, range checks, etc.) are configured there.
  • The activity does not automatically gate the workflow — you must add an If activity downstream to act on the error count.

Best Practices

  • Always follow Data Quality Validation with an If check on the error count. Letting silently invalid data through to the model is worse than a workflow failure.
  • Rename the context keys to short, descriptive names (DQErrors, DQWarnings) to make downstream If conditions readable.
  • Log the error count in a Log Message activity for permanent audit visibility even when counts are zero.

JSON Reference

{
"discriminator": "DataQualityValidationWorkflowActivity",
"activityId": "<uuid>",
"name": "Data Quality Validation",
"positionX": 0,
"positionY": 0,
"advanceRule": 2,
"stagingTables": [
{
"tableName": "StagingActuals",
"settings": null
}
],
"activityRunIdContextKey": "DQActivityRunId",
"noOfErrorsContextKey": "DQErrors",
"noOfWarningsContextKey": "DQWarnings"
}
PropertyTypeDescription
stagingTablesarrayCorresponds to the Staging Tables list. Array of { "tableName": string, "settings": array|null } objects defining which tables to validate and their rules.
activityRunIdContextKeystringCorresponds to the Activity Run ID context key. Workflow context key where the activity run ID is stored.
noOfErrorsContextKeystringCorresponds to the Number of Errors context key. Workflow context key where the error count is stored.
noOfWarningsContextKeystringCorresponds to the Number of Warnings context key. Workflow context key where the warning count is stored.