AI Regression
The AI Regression activity trains or runs a regression model using ML.NET on data from a staging table, then writes predictions to an output staging table.
Purpose
Use the AI Regression activity to:
- Predict a numeric value (e.g. revenue, cost, quantity) from a set of input features
- Train a regression model on historical data and save it for future use
- Apply a previously trained model to new data to generate predictions
Modes
| Mode | Description |
|---|---|
| Train Model | Trains a new regression model on the input staging table and saves the .onnx model file to the configured file system path |
| Run Model | Loads a previously trained model from the file system and applies it to the input staging table, writing predictions to the output staging table |
Configuration
Mode
Select Train Model or Run Model.
Input Staging Table
The staging table containing the training or prediction data. Column names must not contain spaces, hyphens, semicolons, or brackets.
File System Path
- Train mode: The folder where the trained model file will be saved (as
<ActivityName>.onnx). - Run mode: The full path to the
.onnxmodel file to load.
Algorithm (Train mode only)
The regression algorithm to use:
| Algorithm | Description |
|---|---|
| FastTreeTweedie | Gradient boosted trees with Tweedie loss — good for right-skewed targets (default) |
| FastTree | Standard gradient boosted decision trees |
| LightGbm | LightGBM — fast for large datasets |
Output Staging Table (Run mode)
The staging table where predictions are written. Includes all original columns plus a PredictedValue column.
Transforms
A pipeline of pre-processing transforms applied before the model step. Transforms are configured in the visual transform editor. Common transforms include:
- Concatenate — combine multiple numeric columns into a single feature vector
- Drop Columns — remove columns not needed for training
- Convert Type — convert a column to a numeric type
- Normalize — normalize a column to mean 0, variance 1
- One Hot Encoding — encode a categorical column as a numeric vector
- Train Model — the final step that trains or applies the model
Row Filter (optional)
Filter the input data by a column condition before training or inference. Useful for selecting a subset of rows (e.g. only rows where Year = 2024).
Behavior (Train mode)
- Input data is loaded from the staging table.
- The transform pipeline is applied.
- The model is trained with 6-fold cross-validation. Metrics (MAE, MSE, RMSE, R²) are logged.
- The trained model is saved as
<ActivityName>.onnxin the configured folder.
Behavior (Run mode)
- Input data is loaded from the staging table.
- The transform pipeline is applied.
- The model is loaded from the
.onnxfile. - Predictions are generated and written to the output staging table with a
PredictedValuecolumn. - A sample of 3 prediction rows is logged for verification.
Output Schema (Run mode)
The output staging table contains all non-vector columns from the input data, plus:
| Column | Description |
|---|---|
PredictedValue | The regression model's predicted value for each row |
Usage Notes
- The feature column (used for prediction) and the estimated column (target variable, used in training) are configured within the Train Model transform step in the transform editor.
- Cross-validation metrics are logged at train time. Review them to assess model quality before using the model in production.
- The model file persists across workflow runs. Retrain periodically as data distributions change.
Best Practices
- Always log or review the cross-validation R² before relying on a trained model. A low R² indicates the model is not predictive.
- Separate training and inference workflows — train occasionally (e.g. monthly), run predictions on each data load.
- Use Drop Columns and Normalize transforms to improve model accuracy before the Train Model step.
The exact feature column and estimated column configuration within the Train Model transform step should be verified against the current UI, as this is configured visually inside the transforms editor.
JSON Reference
{
"discriminator": "FastTreeTweedieRegressionWorkflowActivity",
"activityId": "<uuid>",
"name": "AI Regression",
"positionX": 0,
"positionY": 0,
"advanceRule": 2,
"inputStagingTable": "StagingInput",
"outputStagingTable": "StagingOutput",
"fsoPath": "/Models/Regression",
"mode": 0,
"transforms": [],
"filter": null,
"estimatedColumnName": "Revenue",
"featureColumnName": "Features",
"algorithm": 0
}
| Property | Type | Description |
|---|---|---|
inputStagingTable | string | Corresponds to the Input Staging Table field. The staging table containing training or prediction data. |
outputStagingTable | string | Corresponds to the Output Staging Table field. The staging table where predictions are written (used in Run mode). |
fsoPath | string | Corresponds to the File System Path field. Folder path (Train mode) or full file path to the .onnx model (Run mode). |
mode | integer | Corresponds to the Mode field. 0 = TrainModel, 1 = RunModel. |
transforms | array | Corresponds to the Transforms editor. Array of transform objects defining the pre-processing pipeline. |
filter | object | null | Corresponds to the Row Filter field. An optional filter applied to input rows before training or inference. null means no filter. |
estimatedColumnName | string | The target (label) column name for regression — the column to predict. |
featureColumnName | string | The feature vector column name used as model input (typically the output of a Concatenate transform). |
algorithm | integer | Corresponds to the Algorithm field. 0 = FastTreeTweedie, 1 = FastTree, 2 = LightGbm. |