ETL (Extract, Transform, Load)
The ETL activity executes a configured ETL import definition to load data from staging tables into the application model.
Purpose
Use the ETL activity to:
- Load data from staging tables into the model
- Apply transformation rules during data import
- Validate data before loading into the model
- Automate data loading processes in workflows
Overview
The ETL activity uses pre-configured Import Definitions that specify:
- Source staging table
- Destination model tables
- Column mappings
- Transformation rules
- Validation rules
This eliminates the need to write custom stored procedures for each data loading scenario.
Configuration
Import Definition
- Select Import Definition: Choose from pre-configured ETL import definitions
- Import definitions are created and configured separately in the ETL configuration section
Validation Options
Skip Validation
-
Disabled (default): All validation rules are enforced
- Rows with validation errors are rejected
- ETL process fails if errors are found
- Errors are logged to the ETL log table
-
Enabled: Validation errors are ignored
- Rows with errors are skipped
- Valid rows are loaded into the model
- Invalid rows are logged but don't stop the process
Import Definitions
Import definitions specify the data transformation logic, including:
Column Mappings
- Which staging table columns map to which model columns
- Data type conversions
- Default values
Field Types
- Account Name: Maps to account master data by name
- Account Key: Maps to account by key/code
- Account Caption: Maps to account by description
- Dimension Members: Similar mapping for other dimensions
- Regular Columns: Direct data mapping
Validation Rules
- Required field validation
- Data type validation
- Referential integrity checks
- Business rule validation
ETL Logging
App.ETLLog Table
All ETL execution results are logged to the App.ETLLog table:
Logged Information:
- Execution timestamp
- Import definition used
- Records processed
- Records succeeded
- Records failed
- Error details for failed records
- Validation issues
Common Errors Logged:
- "Account not found: [AccountName]"
- "Invalid date format in row [X]"
- "Required field [FieldName] is missing"
- "Duplicate key violation"
Accessing Logs
Query the ETL log table to:
- Review import results
- Debug data issues
- Identify missing master data
- Track import history
SELECT * FROM App.ETLLog
WHERE ImportDefinitionID = [ID]
ORDER BY ExecutionDate DESC
Common Use Cases
Standard Data Loading
1. Data Load Activity
└─ User uploads CSV/Excel to staging table
2. ETL Activity
└─ Import Definition: ClientDataImport
└─ Skip Validation: [unchecked]
└─ Loads valid data, logs errors
3. If errors exist
└─ Send notification with error details
Permissive Loading (Skip Invalid Rows)
1. Data Load Activity
└─ Upload file to staging
2. ETL Activity
└─ Import Definition: TransactionImport
└─ Skip Validation: [checked]
└─ Loads valid rows, skips invalid
3. Log Message
└─ Report: "Loaded {SuccessCount} records, skipped {ErrorCount}"
Multiple Staging Tables
1. Load Accounts → ETL (AccountImport)
2. Load Transactions → ETL (TransactionImport)
3. Load Balances → ETL (BalanceImport)
Validation-Then-Load Pattern
1. Data Load
2. ETL with Skip Validation: [unchecked]
└─ Attempt to validate all data
3. If (validation fails)
├─ Send error report
├─ Clear State
└─ Loop back to Data Load
4. If (validation passes)
└─ Continue with approval/processing
Skip Validation Use Cases
When to Enable Skip Validation
Use when:
- Data quality issues are expected and acceptable
- Invalid rows should be skipped, not cause failure
- Loading "best effort" data
- Partial data is better than no data
- Manual cleanup of errors will happen later
Example scenarios:
- Historical data migration with known quality issues
- Third-party data feeds with occasional bad records
- Incremental loads where some records might be duplicates
- Testing and development environments
When to Disable Skip Validation
Use when:
- Data quality is critical
- All records must be valid
- Process should stop on errors for investigation
- Regulatory compliance requires complete data
- Master data must be complete before transaction loading
Example scenarios:
- Financial transaction loading
- Regulatory reporting data
- Master data updates
- Critical business processes
Example Configuration
Strict Validation
ETL Activity:
Import Definition: FinancialTransactions
Skip Validation: [unchecked]
Result: Any error stops the process
Use for: Critical financial data
Permissive Loading
ETL Activity:
Import Definition: OptionalDataImport
Skip Validation: [checked]
Result: Valid rows loaded, invalid rows skipped
Use for: Non-critical supplementary data
Error Handling Patterns
Validation with Notification
1. ETL Activity (Skip Validation: unchecked)
2. If (ETL succeeds)
└─ Continue workflow
3. If (ETL fails)
├─ Query App.ETLLog for errors
├─ Send email with error details
└─ Stop workflow
Permissive with Reporting
1. ETL Activity (Skip Validation: checked)
2. Query App.ETLLog for skipped records
3. If (errors exist)
└─ Send warning email with skipped records
4. Continue workflow regardless
Import Definition Configuration
Import definitions are created by:
- Developers or data architects
- Configured in the ETL configuration section
- Specify all mapping and transformation logic
Key Configuration Elements:
- Source staging table schema
- Target model table
- Column mapping rules
- Validation rules
- Error handling preferences
Best Practices
- Test First: Test import definitions with sample data
- Log Review: Regularly review App.ETLLog for patterns
- Master Data First: Load master data (accounts, dimensions) before transactions
- Validation Strategy: Choose skip validation based on data criticality
- Error Handling: Always plan for validation failures
- Staging Cleanup: Clear staging tables after successful loads
- Documentation: Document import definition purpose and mapping rules
Master Data Dependencies
Common Pattern: Missing Master Data
Problem:
ETL loads transactions, but referenced accounts don't exist in model
Result: Validation errors "Account [NAME] not found"
Solution 1: Load Master Data First
1. Load Accounts → ETL (AccountMaster)
2. Load Transactions → ETL (Transactions)
Solution 2: Auto-Create (If Configured)
Import Definition configured to:
└─ Auto-create missing accounts (if business rules allow)
└─ Or map to default account
From Transcript:
"If an account is not found, perhaps add it to the account dimension. This would be part of the import definition. I don't think it's been done."
This suggests auto-creation is not yet implemented but is a potential future enhancement.
Workflow Integration Example
Complete Data Load Workflow
1. Workflow Parameters
└─ Period, DataType
2. Data Load Activity
└─ User uploads file to staging
3. ETL Activity
└─ Import Definition: [Selected based on DataType]
└─ Skip Validation: false
4. If (ETL succeeds)
├─ Log Message: "Import successful"
├─ Send notification
└─ Continue to calculations
5. If (ETL fails)
├─ Query ETL log
├─ Email error report
├─ Clear State
└─ Loop back to Data Load
Performance Considerations
- Large data volumes may take significant time
- Consider staging table size limits
- Monitor ETL log table growth
- Plan for cleanup of old log entries
- Use batch processing for very large datasets
Troubleshooting
ETL Fails with Validation Errors
- Query App.ETLLog for specific errors
- Check master data exists (accounts, dimensions)
- Verify staging data format matches import definition
- Review data quality in source file
- Consider enabling Skip Validation temporarily
Some Rows Not Loading
- Check App.ETLLog for skipped rows
- Verify Skip Validation setting
- Review validation rules in import definition
- Check for data type mismatches
- Verify required fields are populated
No Errors But Data Missing
- Verify import definition is correct
- Check staging table has data
- Review column mappings
- Ensure data types match
- Check for transformation logic issues
Comparison to Stored Procedures
| Approach | ETL Activity | Stored Procedure |
|---|---|---|
| Configuration | Visual, reusable | Code-based |
| Validation | Built-in | Manual implementation |
| Error Logging | Automatic | Manual implementation |
| Reusability | High | Medium |
| Flexibility | Structured | Unlimited |
| Maintenance | Easier | Requires SQL knowledge |
Use ETL activities for standard data loading patterns. Use stored procedures for complex custom logic.
Related Activities
- Data Load: Uploads files to staging tables (typically precedes ETL)
- Stored Procedure: For custom transformation logic
- Clear State: To retry ETL after corrections
- Log Message: To log ETL results
- Email: To notify stakeholders of ETL results
Future Enhancements (Transcript Notes)
Potential improvements discussed:
- Auto-creation of missing master data
- More sophisticated validation rules
- Better error reporting
- Batch processing improvements
Summary
The ETL activity provides a powerful, configurable way to load data from staging to model without writing custom stored procedures. Key benefits:
- Reusable: Import definitions can be used across workflows
- Validated: Built-in validation with error logging
- Flexible: Skip validation option for different scenarios
- Auditable: All operations logged to App.ETLLog
- Maintainable: Visual configuration easier than SQL
Best suited for standard data loading patterns where transformation rules can be pre-configured.