AI Anomaly Detector
The AI Anomaly Detector activity trains a Randomized PCA anomaly detection model on data from a staging table and writes anomaly scores and labels to an output staging table.
Purpose
Use the AI Anomaly Detector activity to:
- Identify rows in a dataset that are statistically anomalous compared to the rest of the data
- Flag unusual transactions, data points, or entities for review
- Produce an anomaly score per row that can be used for filtering or ranking
Algorithm
Uses Randomized PCA (Principal Component Analysis) from ML.NET. The algorithm:
- Projects the data into a lower-dimensional principal component space.
- Computes how far each row is from the "normal" subspace.
- Assigns a score (higher = more anomalous) and a
PredictedLabel(true = anomaly).
Configuration
Input Staging Table
The staging table containing the data to analyse. All columns used for anomaly detection must be numeric.
Output Staging Table
The staging table where results are written, including all original columns plus the anomaly detection outputs.
Feature Column
The name of the (vector) column that contains the features to use for anomaly detection. This is typically the output column of a Concatenate transform in the transform pipeline.
Rank
The number of principal components to use. Higher rank captures more variance but is slower.
- Default: 5
- Minimum: 1
Ensure Zero Mean
When enabled (default), the data is centred to zero mean before PCA. Recommended unless the data has already been normalised.
Transforms
A pre-processing pipeline applied before the anomaly detection step. Typically used to:
- Concatenate multiple numeric columns into a single feature vector
- Normalize values before detection
- Drop non-numeric or irrelevant columns
Row Filter (optional)
Filter the input data before processing.
Output Schema
The output staging table contains all non-vector columns from the input, plus:
| Column | Description |
|---|---|
Score | Anomaly score — higher values indicate more anomalous rows |
PredictedLabel | Boolean — true if the row is classified as an anomaly |
A sample of 3 rows (with score and label) is logged to the workflow run log for inspection.
Usage Notes
- Unlike the regression activity, the Anomaly Detector has no Train/Run distinction — it trains and applies the model in a single pass each time it runs.
- The algorithm is unsupervised — it does not use labelled anomaly data. It learns "normal" from the bulk of the input data.
- A high
Rankvalue relative to the number of features or rows may cause performance issues.
Best Practices
- Use Concatenate in the transforms to combine relevant numeric columns into a single feature vector before the anomaly detection step.
- Review a sample of flagged rows (
PredictedLabel = true) to calibrate whether the sensitivity is appropriate for your use case. - Normalise input features before anomaly detection to prevent columns with large absolute values from dominating the anomaly score.
The exact threshold that separates "anomaly" from "normal" in the PredictedLabel is controlled by the ML.NET algorithm internals. Document whether there is a configurable sensitivity parameter exposed in the UI.
JSON Reference
{
"discriminator": "RandomizedPcaAnomalyWorkflowActivity",
"activityId": "<uuid>",
"name": "AI Anomaly Detector",
"positionX": 0,
"positionY": 0,
"advanceRule": 2,
"inputStagingTable": "StagingInput",
"outputStagingTable": "StagingOutput",
"fsoPath": "",
"mode": 0,
"transforms": [],
"filter": null,
"featureColumnName": "Features",
"ensureZeroMean": true,
"rank": 5
}
| Property | Type | Description |
|---|---|---|
inputStagingTable | string | Corresponds to the Input Staging Table field. The staging table containing the data to analyse. |
outputStagingTable | string | Corresponds to the Output Staging Table field. The staging table where anomaly detection results are written. |
fsoPath | string | File system path. Not used by this activity but present as an inherited field. |
mode | integer | 0 = TrainModel, 1 = RunModel. Not applicable — this activity always trains and applies in a single pass. |
transforms | array | Corresponds to the Transforms editor. Array of transform objects defining the pre-processing pipeline. |
filter | object | null | Corresponds to the Row Filter field. An optional filter applied to input rows before processing. null means no filter. |
featureColumnName | string | Corresponds to the Feature Column field. The vector column containing the features used for anomaly detection. |
ensureZeroMean | boolean | Corresponds to the Ensure Zero Mean field. When true, data is centred to zero mean before PCA. |
rank | integer | Corresponds to the Rank field. Number of principal components to use. Default: 5. |