Skip to main content

AI Regression

The AI Regression activity trains or runs a regression model using ML.NET on data from a staging table, then writes predictions to an output staging table.

Purpose

Use the AI Regression activity to:

  • Predict a numeric value (e.g. revenue, cost, quantity) from a set of input features
  • Train a regression model on historical data and save it for future use
  • Apply a previously trained model to new data to generate predictions

Modes

ModeDescription
Train ModelTrains a new regression model on the input staging table and saves the .onnx model file to the configured file system path
Run ModelLoads a previously trained model from the file system and applies it to the input staging table, writing predictions to the output staging table

Configuration

Mode

Select Train Model or Run Model.

Input Staging Table

The staging table containing the training or prediction data. Column names must not contain spaces, hyphens, semicolons, or brackets.

File System Path

  • Train mode: The folder where the trained model file will be saved (as <ActivityName>.onnx).
  • Run mode: The full path to the .onnx model file to load.

Algorithm (Train mode only)

The regression algorithm to use:

AlgorithmDescription
FastTreeTweedieGradient boosted trees with Tweedie loss — good for right-skewed targets (default)
FastTreeStandard gradient boosted decision trees
LightGbmLightGBM — fast for large datasets

Output Staging Table (Run mode)

The staging table where predictions are written. Includes all original columns plus a PredictedValue column.

Transforms

A pipeline of pre-processing transforms applied before the model step. Transforms are configured in the visual transform editor. Common transforms include:

  • Concatenate — combine multiple numeric columns into a single feature vector
  • Drop Columns — remove columns not needed for training
  • Convert Type — convert a column to a numeric type
  • Normalize — normalize a column to mean 0, variance 1
  • One Hot Encoding — encode a categorical column as a numeric vector
  • Train Model — the final step that trains or applies the model

Row Filter (optional)

Filter the input data by a column condition before training or inference. Useful for selecting a subset of rows (e.g. only rows where Year = 2024).

Behavior (Train mode)

  1. Input data is loaded from the staging table.
  2. The transform pipeline is applied.
  3. The model is trained with 6-fold cross-validation. Metrics (MAE, MSE, RMSE, R²) are logged.
  4. The trained model is saved as <ActivityName>.onnx in the configured folder.

Behavior (Run mode)

  1. Input data is loaded from the staging table.
  2. The transform pipeline is applied.
  3. The model is loaded from the .onnx file.
  4. Predictions are generated and written to the output staging table with a PredictedValue column.
  5. A sample of 3 prediction rows is logged for verification.

Output Schema (Run mode)

The output staging table contains all non-vector columns from the input data, plus:

ColumnDescription
PredictedValueThe regression model's predicted value for each row

Usage Notes

  • The feature column (used for prediction) and the estimated column (target variable, used in training) are configured within the Train Model transform step in the transform editor.
  • Cross-validation metrics are logged at train time. Review them to assess model quality before using the model in production.
  • The model file persists across workflow runs. Retrain periodically as data distributions change.

Best Practices

  • Always log or review the cross-validation R² before relying on a trained model. A low R² indicates the model is not predictive.
  • Separate training and inference workflows — train occasionally (e.g. monthly), run predictions on each data load.
  • Use Drop Columns and Normalize transforms to improve model accuracy before the Train Model step.
Needs Review

The exact feature column and estimated column configuration within the Train Model transform step should be verified against the current UI, as this is configured visually inside the transforms editor.

JSON Reference

{
"discriminator": "FastTreeTweedieRegressionWorkflowActivity",
"activityId": "<uuid>",
"name": "AI Regression",
"positionX": 0,
"positionY": 0,
"advanceRule": 2,
"inputStagingTable": "StagingInput",
"outputStagingTable": "StagingOutput",
"fsoPath": "/Models/Regression",
"mode": 0,
"transforms": [],
"filter": null,
"estimatedColumnName": "Revenue",
"featureColumnName": "Features",
"algorithm": 0
}
PropertyTypeDescription
inputStagingTablestringCorresponds to the Input Staging Table field. The staging table containing training or prediction data.
outputStagingTablestringCorresponds to the Output Staging Table field. The staging table where predictions are written (used in Run mode).
fsoPathstringCorresponds to the File System Path field. Folder path (Train mode) or full file path to the .onnx model (Run mode).
modeintegerCorresponds to the Mode field. 0 = TrainModel, 1 = RunModel.
transformsarrayCorresponds to the Transforms editor. Array of transform objects defining the pre-processing pipeline.
filterobject | nullCorresponds to the Row Filter field. An optional filter applied to input rows before training or inference. null means no filter.
estimatedColumnNamestringThe target (label) column name for regression — the column to predict.
featureColumnNamestringThe feature vector column name used as model input (typically the output of a Concatenate transform).
algorithmintegerCorresponds to the Algorithm field. 0 = FastTreeTweedie, 1 = FastTree, 2 = LightGbm.