Data Quality Validation
The Data Quality Validation activity runs a set of configurable data quality checks against one or more staging tables and writes the results — including error and warning counts — to workflow context variables.
Purpose
Use the Data Quality Validation activity to:
- Validate staging data before loading it into the model
- Catch data quality issues (nulls, out-of-range values, unexpected distributions) early in the pipeline
- Gate downstream processing on data quality: fail the workflow if errors exceed a threshold
- Produce an audit trail of data quality results in the workflow run log
Configuration
Staging Tables
One or more staging tables to check. For each table you specify:
- Staging Table — select the table from the dropdown
- Settings — click the gear icon to open the data quality rule editor for that table
Each table can have its own independent set of rules.
Context Keys
Three workflow context variables are populated after the activity runs:
| Key | Default Name | Description |
|---|---|---|
| Activity Run ID | DataQualityValidationWorkflowActivity_ActivityRunIdContextKey | The ID of this activity run (useful for linking to log entries) |
| Number of Errors | DataQualityValidationWorkflowActivity_NoOfErrorsContextKey | Count of rules that exceeded the error threshold |
| Number of Warnings | DataQualityValidationWorkflowActivity_NoOfWarningsContextKey | Count of rules that exceeded the warning threshold |
Rename these keys to something meaningful (e.g. DQErrors, DQWarnings) when you have multiple validation activities in the same workflow.
Data Quality Rules
Each rule is configured per column within a staging table. Rules check the percentage of rows that meet a given condition. The result is compared against an error threshold — if the percentage exceeds the threshold, it counts as an error.
Each rule result is logged to the workflow run log in the format:
Data Quality Check = <Rule> -> <TableName>.<ColumnName>[<Value>] = <Percent>%
Results above the error threshold are logged at Error level. Results below the error threshold are logged at Warning level.
Behavior
- For each configured staging table, all rules are evaluated.
- Each rule result is logged.
- Error and warning counts are accumulated across all tables.
- All three context key values are written to the workflow context.
- The activity always returns Success — it does not automatically fail the workflow if errors are found.
The activity itself always succeeds. To act on validation failures, read
{{var:DQErrors}}in a downstream If activity and route to a Stop or notification if the count is above zero.
Usage Patterns
Validate Before Loading
Remote Data Load ──> Data Quality Validation ──> If ({{var:DQErrors}} > 0)
└── Tables: StagingActuals ├─[Yes]──> Email ("Validation failed") ──> Stop
└─[No]───> ETL ("Import_Actuals")
Gate on Zero Errors
Data Quality Validation
└── Errors Key: ValidationErrors
│
▼
If ({{var:ValidationErrors}} == 0)
├─[Yes]──> Continue pipeline
└─[No]───> Log Message + Stop
Usage Notes
- At least one staging table must be configured — the workflow will fail validation (not save) if the list is empty.
- Rules are configured per table per column in the settings panel. The specific rule types (null checks, range checks, etc.) are configured there.
- The activity does not automatically gate the workflow — you must add an If activity downstream to act on the error count.
Best Practices
- Always follow Data Quality Validation with an If check on the error count. Letting silently invalid data through to the model is worse than a workflow failure.
- Rename the context keys to short, descriptive names (
DQErrors,DQWarnings) to make downstream If conditions readable. - Log the error count in a Log Message activity for permanent audit visibility even when counts are zero.
JSON Reference
{
"discriminator": "DataQualityValidationWorkflowActivity",
"activityId": "<uuid>",
"name": "Data Quality Validation",
"positionX": 0,
"positionY": 0,
"advanceRule": 2,
"stagingTables": [
{
"tableName": "StagingActuals",
"settings": null
}
],
"activityRunIdContextKey": "DQActivityRunId",
"noOfErrorsContextKey": "DQErrors",
"noOfWarningsContextKey": "DQWarnings"
}
| Property | Type | Description |
|---|---|---|
stagingTables | array | Corresponds to the Staging Tables list. Array of { "tableName": string, "settings": array|null } objects defining which tables to validate and their rules. |
activityRunIdContextKey | string | Corresponds to the Activity Run ID context key. Workflow context key where the activity run ID is stored. |
noOfErrorsContextKey | string | Corresponds to the Number of Errors context key. Workflow context key where the error count is stored. |
noOfWarningsContextKey | string | Corresponds to the Number of Warnings context key. Workflow context key where the warning count is stored. |