Skip to main content

ETL (Extract, Transform, Load)

The ETL activity executes a configured ETL import definition to load data from staging tables into the application model.

Purpose

Use the ETL activity to:

  • Load data from staging tables into the model
  • Apply transformation rules during data import
  • Validate data before loading into the model
  • Automate data loading processes in workflows

Overview

The ETL activity uses pre-configured Import Definitions that specify:

  • Source staging table
  • Destination model tables
  • Column mappings
  • Transformation rules
  • Validation rules

This eliminates the need to write custom stored procedures for each data loading scenario.

Configuration

Import Definition

  • Select Import Definition: Choose from pre-configured ETL import definitions
  • Import definitions are created and configured separately in the ETL configuration section

Validation Options

Skip Validation

  • Disabled (default): All validation rules are enforced

    • Rows with validation errors are rejected
    • ETL process fails if errors are found
    • Errors are logged to the ETL log table
  • Enabled: Validation errors are ignored

    • Rows with errors are skipped
    • Valid rows are loaded into the model
    • Invalid rows are logged but don't stop the process

Import Definitions

Import definitions specify the data transformation logic, including:

Column Mappings

  • Which staging table columns map to which model columns
  • Data type conversions
  • Default values

Field Types

  • Account Name: Maps to account master data by name
  • Account Key: Maps to account by key/code
  • Account Caption: Maps to account by description
  • Dimension Members: Similar mapping for other dimensions
  • Regular Columns: Direct data mapping

Validation Rules

  • Required field validation
  • Data type validation
  • Referential integrity checks
  • Business rule validation

ETL Logging

App.ETLLog Table

All ETL execution results are logged to the App.ETLLog table:

Logged Information:

  • Execution timestamp
  • Import definition used
  • Records processed
  • Records succeeded
  • Records failed
  • Error details for failed records
  • Validation issues

Common Errors Logged:

  • "Account not found: [AccountName]"
  • "Invalid date format in row [X]"
  • "Required field [FieldName] is missing"
  • "Duplicate key violation"

Accessing Logs

Query the ETL log table to:

  • Review import results
  • Debug data issues
  • Identify missing master data
  • Track import history
SELECT * FROM App.ETLLog 
WHERE ImportDefinitionID = [ID]
ORDER BY ExecutionDate DESC

Common Use Cases

Standard Data Loading

1. Data Load Activity
└─ User uploads CSV/Excel to staging table

2. ETL Activity
└─ Import Definition: ClientDataImport
└─ Skip Validation: [unchecked]
└─ Loads valid data, logs errors

3. If errors exist
└─ Send notification with error details

Permissive Loading (Skip Invalid Rows)

1. Data Load Activity
└─ Upload file to staging

2. ETL Activity
└─ Import Definition: TransactionImport
└─ Skip Validation: [checked]
└─ Loads valid rows, skips invalid

3. Log Message
└─ Report: "Loaded {SuccessCount} records, skipped {ErrorCount}"

Multiple Staging Tables

1. Load Accounts → ETL (AccountImport)
2. Load Transactions → ETL (TransactionImport)
3. Load Balances → ETL (BalanceImport)

Validation-Then-Load Pattern

1. Data Load

2. ETL with Skip Validation: [unchecked]
└─ Attempt to validate all data

3. If (validation fails)
├─ Send error report
├─ Clear State
└─ Loop back to Data Load

4. If (validation passes)
└─ Continue with approval/processing

Skip Validation Use Cases

When to Enable Skip Validation

Use when:

  • Data quality issues are expected and acceptable
  • Invalid rows should be skipped, not cause failure
  • Loading "best effort" data
  • Partial data is better than no data
  • Manual cleanup of errors will happen later

Example scenarios:

  • Historical data migration with known quality issues
  • Third-party data feeds with occasional bad records
  • Incremental loads where some records might be duplicates
  • Testing and development environments

When to Disable Skip Validation

Use when:

  • Data quality is critical
  • All records must be valid
  • Process should stop on errors for investigation
  • Regulatory compliance requires complete data
  • Master data must be complete before transaction loading

Example scenarios:

  • Financial transaction loading
  • Regulatory reporting data
  • Master data updates
  • Critical business processes

Example Configuration

Strict Validation

ETL Activity:
Import Definition: FinancialTransactions
Skip Validation: [unchecked]

Result: Any error stops the process
Use for: Critical financial data

Permissive Loading

ETL Activity:
Import Definition: OptionalDataImport
Skip Validation: [checked]

Result: Valid rows loaded, invalid rows skipped
Use for: Non-critical supplementary data

Error Handling Patterns

Validation with Notification

1. ETL Activity (Skip Validation: unchecked)

2. If (ETL succeeds)
└─ Continue workflow

3. If (ETL fails)
├─ Query App.ETLLog for errors
├─ Send email with error details
└─ Stop workflow

Permissive with Reporting

1. ETL Activity (Skip Validation: checked)

2. Query App.ETLLog for skipped records

3. If (errors exist)
└─ Send warning email with skipped records

4. Continue workflow regardless

Import Definition Configuration

Import definitions are created by:

  • Developers or data architects
  • Configured in the ETL configuration section
  • Specify all mapping and transformation logic

Key Configuration Elements:

  • Source staging table schema
  • Target model table
  • Column mapping rules
  • Validation rules
  • Error handling preferences

Best Practices

  • Test First: Test import definitions with sample data
  • Log Review: Regularly review App.ETLLog for patterns
  • Master Data First: Load master data (accounts, dimensions) before transactions
  • Validation Strategy: Choose skip validation based on data criticality
  • Error Handling: Always plan for validation failures
  • Staging Cleanup: Clear staging tables after successful loads
  • Documentation: Document import definition purpose and mapping rules

Master Data Dependencies

Common Pattern: Missing Master Data

Problem:

ETL loads transactions, but referenced accounts don't exist in model
Result: Validation errors "Account [NAME] not found"

Solution 1: Load Master Data First

1. Load Accounts → ETL (AccountMaster)
2. Load Transactions → ETL (Transactions)

Solution 2: Auto-Create (If Configured)

Import Definition configured to:
└─ Auto-create missing accounts (if business rules allow)
└─ Or map to default account

From Transcript:

"If an account is not found, perhaps add it to the account dimension. This would be part of the import definition. I don't think it's been done."

This suggests auto-creation is not yet implemented but is a potential future enhancement.

Workflow Integration Example

Complete Data Load Workflow

1. Workflow Parameters
└─ Period, DataType

2. Data Load Activity
└─ User uploads file to staging

3. ETL Activity
└─ Import Definition: [Selected based on DataType]
└─ Skip Validation: false

4. If (ETL succeeds)
├─ Log Message: "Import successful"
├─ Send notification
└─ Continue to calculations

5. If (ETL fails)
├─ Query ETL log
├─ Email error report
├─ Clear State
└─ Loop back to Data Load

Performance Considerations

  • Large data volumes may take significant time
  • Consider staging table size limits
  • Monitor ETL log table growth
  • Plan for cleanup of old log entries
  • Use batch processing for very large datasets

Troubleshooting

ETL Fails with Validation Errors

  1. Query App.ETLLog for specific errors
  2. Check master data exists (accounts, dimensions)
  3. Verify staging data format matches import definition
  4. Review data quality in source file
  5. Consider enabling Skip Validation temporarily

Some Rows Not Loading

  1. Check App.ETLLog for skipped rows
  2. Verify Skip Validation setting
  3. Review validation rules in import definition
  4. Check for data type mismatches
  5. Verify required fields are populated

No Errors But Data Missing

  1. Verify import definition is correct
  2. Check staging table has data
  3. Review column mappings
  4. Ensure data types match
  5. Check for transformation logic issues

Comparison to Stored Procedures

ApproachETL ActivityStored Procedure
ConfigurationVisual, reusableCode-based
ValidationBuilt-inManual implementation
Error LoggingAutomaticManual implementation
ReusabilityHighMedium
FlexibilityStructuredUnlimited
MaintenanceEasierRequires SQL knowledge

Use ETL activities for standard data loading patterns. Use stored procedures for complex custom logic.

  • Data Load: Uploads files to staging tables (typically precedes ETL)
  • Stored Procedure: For custom transformation logic
  • Clear State: To retry ETL after corrections
  • Log Message: To log ETL results
  • Email: To notify stakeholders of ETL results

Future Enhancements (Transcript Notes)

Potential improvements discussed:

  • Auto-creation of missing master data
  • More sophisticated validation rules
  • Better error reporting
  • Batch processing improvements

Summary

The ETL activity provides a powerful, configurable way to load data from staging to model without writing custom stored procedures. Key benefits:

  • Reusable: Import definitions can be used across workflows
  • Validated: Built-in validation with error logging
  • Flexible: Skip validation option for different scenarios
  • Auditable: All operations logged to App.ETLLog
  • Maintainable: Visual configuration easier than SQL

Best suited for standard data loading patterns where transformation rules can be pre-configured.