AI Anomaly Detector

The AI Anomaly Detector activity trains a Randomized PCA anomaly detection model on data from a staging table and writes anomaly scores and labels to an output staging table.

Purpose

Use the AI Anomaly Detector activity to:

Identify rows in a dataset that are statistically anomalous compared to the rest of the data
Flag unusual transactions, data points, or entities for review
Produce an anomaly score per row that can be used for filtering or ranking

Algorithm

Uses Randomized PCA (Principal Component Analysis) from ML.NET. The algorithm:

Projects the data into a lower-dimensional principal component space.
Computes how far each row is from the "normal" subspace.
Assigns a score (higher = more anomalous) and a PredictedLabel (true = anomaly).

Configuration

Input Staging Table

The staging table containing the data to analyse. All columns used for anomaly detection must be numeric.

Output Staging Table

The staging table where results are written, including all original columns plus the anomaly detection outputs.

Feature Column

The name of the (vector) column that contains the features to use for anomaly detection. This is typically the output column of a Concatenate transform in the transform pipeline.

Rank

The number of principal components to use. Higher rank captures more variance but is slower.

Default: 5
Minimum: 1

Ensure Zero Mean

When enabled (default), the data is centred to zero mean before PCA. Recommended unless the data has already been normalised.

Transforms

A pre-processing pipeline applied before the anomaly detection step. Typically used to:

Concatenate multiple numeric columns into a single feature vector
Normalize values before detection
Drop non-numeric or irrelevant columns

Row Filter (optional)

Filter the input data before processing.

Output Schema

The output staging table contains all non-vector columns from the input, plus:

Column	Description
`Score`	Anomaly score — higher values indicate more anomalous rows
`PredictedLabel`	Boolean — `true` if the row is classified as an anomaly

A sample of 3 rows (with score and label) is logged to the workflow run log for inspection.

Usage Notes

Unlike the regression activity, the Anomaly Detector has no Train/Run distinction — it trains and applies the model in a single pass each time it runs.
The algorithm is unsupervised — it does not use labelled anomaly data. It learns "normal" from the bulk of the input data.
A high Rank value relative to the number of features or rows may cause performance issues.

Best Practices

Use Concatenate in the transforms to combine relevant numeric columns into a single feature vector before the anomaly detection step.
Review a sample of flagged rows (PredictedLabel = true) to calibrate whether the sensitivity is appropriate for your use case.
Normalise input features before anomaly detection to prevent columns with large absolute values from dominating the anomaly score.

Needs Review

The exact threshold that separates "anomaly" from "normal" in the PredictedLabel is controlled by the ML.NET algorithm internals. Document whether there is a configurable sensitivity parameter exposed in the UI.

JSON Reference

{
  "discriminator": "RandomizedPcaAnomalyWorkflowActivity",
  "activityId": "<uuid>",
  "name": "AI Anomaly Detector",
  "positionX": 0,
  "positionY": 0,
  "advanceRule": 2,
  "inputStagingTable": "StagingInput",
  "outputStagingTable": "StagingOutput",
  "fsoPath": "",
  "mode": 0,
  "transforms": [],
  "filter": null,
  "featureColumnName": "Features",
  "ensureZeroMean": true,
  "rank": 5
}

Property	Type	Description
`inputStagingTable`	string	Corresponds to the Input Staging Table field. The staging table containing the data to analyse.
`outputStagingTable`	string	Corresponds to the Output Staging Table field. The staging table where anomaly detection results are written.
`fsoPath`	string	File system path. Not used by this activity but present as an inherited field.
`mode`	integer	`0` = TrainModel, `1` = RunModel. Not applicable — this activity always trains and applies in a single pass.
`transforms`	array	Corresponds to the Transforms editor. Array of transform objects defining the pre-processing pipeline.
`filter`	object \| null	Corresponds to the Row Filter field. An optional filter applied to input rows before processing. `null` means no filter.
`featureColumnName`	string	Corresponds to the Feature Column field. The vector column containing the features used for anomaly detection.
`ensureZeroMean`	boolean	Corresponds to the Ensure Zero Mean field. When `true`, data is centred to zero mean before PCA.
`rank`	integer	Corresponds to the Rank field. Number of principal components to use. Default: 5.

Purpose​

Algorithm​

Configuration​

Input Staging Table​

Output Staging Table​

Feature Column​

Rank​

Ensure Zero Mean​

Transforms​

Row Filter (optional)​

Output Schema​

Usage Notes​

Best Practices​

JSON Reference​