AI Regression

The AI Regression activity trains or runs a regression model using ML.NET on data from a staging table, then writes predictions to an output staging table.

Purpose

Use the AI Regression activity to:

Predict a numeric value (e.g. revenue, cost, quantity) from a set of input features
Train a regression model on historical data and save it for future use
Apply a previously trained model to new data to generate predictions

Modes

Mode	Description
Train Model	Trains a new regression model on the input staging table and saves the `.onnx` model file to the configured file system path
Run Model	Loads a previously trained model from the file system and applies it to the input staging table, writing predictions to the output staging table

Configuration

Mode

Select Train Model or Run Model.

Input Staging Table

The staging table containing the training or prediction data. Column names must not contain spaces, hyphens, semicolons, or brackets.

File System Path

Train mode: The folder where the trained model file will be saved (as <ActivityName>.onnx).
Run mode: The full path to the .onnx model file to load.

Algorithm (Train mode only)

The regression algorithm to use:

Algorithm	Description
FastTreeTweedie	Gradient boosted trees with Tweedie loss — good for right-skewed targets (default)
FastTree	Standard gradient boosted decision trees
LightGbm	LightGBM — fast for large datasets

Output Staging Table (Run mode)

The staging table where predictions are written. Includes all original columns plus a PredictedValue column.

Transforms

A pipeline of pre-processing transforms applied before the model step. Transforms are configured in the visual transform editor. Common transforms include:

Concatenate — combine multiple numeric columns into a single feature vector
Drop Columns — remove columns not needed for training
Convert Type — convert a column to a numeric type
Normalize — normalize a column to mean 0, variance 1
One Hot Encoding — encode a categorical column as a numeric vector
Train Model — the final step that trains or applies the model

Row Filter (optional)

Filter the input data by a column condition before training or inference. Useful for selecting a subset of rows (e.g. only rows where Year = 2024).

Behavior (Train mode)

Input data is loaded from the staging table.
The transform pipeline is applied.
The model is trained with 6-fold cross-validation. Metrics (MAE, MSE, RMSE, R²) are logged.
The trained model is saved as <ActivityName>.onnx in the configured folder.

Behavior (Run mode)

Input data is loaded from the staging table.
The transform pipeline is applied.
The model is loaded from the .onnx file.
Predictions are generated and written to the output staging table with a PredictedValue column.
A sample of 3 prediction rows is logged for verification.

Output Schema (Run mode)

The output staging table contains all non-vector columns from the input data, plus:

Column	Description
`PredictedValue`	The regression model's predicted value for each row

Usage Notes

The feature column (used for prediction) and the estimated column (target variable, used in training) are configured within the Train Model transform step in the transform editor.
Cross-validation metrics are logged at train time. Review them to assess model quality before using the model in production.
The model file persists across workflow runs. Retrain periodically as data distributions change.

Best Practices

Always log or review the cross-validation R² before relying on a trained model. A low R² indicates the model is not predictive.
Separate training and inference workflows — train occasionally (e.g. monthly), run predictions on each data load.
Use Drop Columns and Normalize transforms to improve model accuracy before the Train Model step.

Needs Review

The exact feature column and estimated column configuration within the Train Model transform step should be verified against the current UI, as this is configured visually inside the transforms editor.

JSON Reference

{
  "discriminator": "FastTreeTweedieRegressionWorkflowActivity",
  "activityId": "<uuid>",
  "name": "AI Regression",
  "positionX": 0,
  "positionY": 0,
  "advanceRule": 2,
  "inputStagingTable": "StagingInput",
  "outputStagingTable": "StagingOutput",
  "fsoPath": "/Models/Regression",
  "mode": 0,
  "transforms": [],
  "filter": null,
  "estimatedColumnName": "Revenue",
  "featureColumnName": "Features",
  "algorithm": 0
}

Property	Type	Description
`inputStagingTable`	string	Corresponds to the Input Staging Table field. The staging table containing training or prediction data.
`outputStagingTable`	string	Corresponds to the Output Staging Table field. The staging table where predictions are written (used in Run mode).
`fsoPath`	string	Corresponds to the File System Path field. Folder path (Train mode) or full file path to the `.onnx` model (Run mode).
`mode`	integer	Corresponds to the Mode field. `0` = TrainModel, `1` = RunModel.
`transforms`	array	Corresponds to the Transforms editor. Array of transform objects defining the pre-processing pipeline.
`filter`	object \| null	Corresponds to the Row Filter field. An optional filter applied to input rows before training or inference. `null` means no filter.
`estimatedColumnName`	string	The target (label) column name for regression — the column to predict.
`featureColumnName`	string	The feature vector column name used as model input (typically the output of a Concatenate transform).
`algorithm`	integer	Corresponds to the Algorithm field. `0` = FastTreeTweedie, `1` = FastTree, `2` = LightGbm.

Purpose​

Modes​

Configuration​

Mode​

Input Staging Table​

File System Path​

Algorithm (Train mode only)​

Output Staging Table (Run mode)​

Transforms​

Row Filter (optional)​

Behavior (Train mode)​

Behavior (Run mode)​

Output Schema (Run mode)​

Usage Notes​

Best Practices​

JSON Reference​