Create a Statistical Model (Datasets)
The new functionality implemented in Version 5 allows Users to easily create a statistical model to find anomalies in data containing a large number of dimension values. This process can surface specific anomalies across thousand of records and dimensions, allowing reports and metrics to be easily created with this reduced information.
For an example, we will create a Stats Model from an existing Dataset containing Sales data from our test database. We want to find any values of 'Cost of Product', 'Sales Amount' or 'Gross Profit' that fall outside of 2 standard deviations from a 30-day moving average. We want to perform this evaluation across these dimensions: Country, Channel, and Product sub-category.
You can only create Stats Models on Datasets that 'keep history', aka Snapshot Datasets, and you need to use the 'All Data' View.
Security requirements for Power Users;
1. Creating Statistical Models from the Dataset Viewer
Stats Models can only be created from the 'All Data' Views
In our example, the All data View of the Dataset provides the following:
- Cost of product
- Sales amount
- Gross profit amount
2. Actions > Build Stats Model will open the Stats Model editor
- Name - default is 'Statistical Model' appended to your Dataset Name - Name may be modified.
- Date column - choose a Date from the Results field or use 'Snapshot Date' if your Dataset is snapshotted.
- Moving Average Interval - select a moving average range that will produce statistically relevant data.
- Upper and Lower Control Limits provide calculations used in I-MR statistics. Provide the multiplier used to generate upper and lower control boundaries.
- The generated Stats Model returns only records that fall outside of a specified number of standard deviations. Lower numbers will include more results, higher numbers fewer records but significantly anomalous results. Set your Volatility limit to control the number of records returned in your stats model.
- You can compute volatility for the either the current(latest) Calendar date or for all Calendar values.
- Dimension value is defaulted but can be changed or added to by clicking on [+ Dimension]
- Measure values is also defaulted but can be changed or added to by clicking on [+ Field]
- Dimensions - We have selected 3 filters. We have also selected Include Totals for each dimension to generate stats for every unique combination of filter values against all other filter values including aggregated totals. In this example, a Total value will be calculated for All Countries, All Channels, All Product-subcategories, as well as all channels and products for each country, all countries and products for each channel, etc.
- Measures - select those measures you are interested in
- You can add any number of computed fields via [+ Derived Field]
- Check to Include a generated column that concatenates all Measure Names greater than your specified volatility limit (in this example, all measures with values greater than 2 standard deviations from 30-day moving average)
- If you have checked #4, you may elect to alternately display one row for each separate measure with an anomaly instead of concatenating.
- Save to create your Stat Dataset - note that this may take some time and that you can exit this page while this processing occurs
6. Review Results of this Model will quickly highlight those elements with Anomalies
- For email Marketing in Australia, Chardonnay has anomalies in all 3 Measures - Total Cost, Total Sales Amount, Total Gross Profit
- For Website Visits in Australia, the Gift Set has only one measure showing an anomaly - Total Gross Profit
- And Germany is showing anomalies for Champagne in two channels
7. Save this new Stat Model as a new View
You can create Rules against the Stats dataset and save these as views (in example, Stats Records where the total sales are over 1 Million, etc.)
- Click [Save as View]
- Add meaningful Name for your View on the Pop-up
- Select Public if you want others to be able to View and Use this Stat Model, otherwise it this view will only be available to you
- Statistical Models are saved as Views
- All editing of Statistical Models must be done from the Dataset Viewer as Stat Datasets will not appear in Content > Dataset menu
Selecting Action > Build Report will take you directly to a defaulted report that you can edit, or simply Publish and Enable to display on your Homepage
For details on creating elements from Statistical Datasets refer to this article for more information: Sourcing Reports / Metrics / Dimensions from "Existing Datasets - SQL"
We can simply edit this newly created report to just display the fields we are interested in and then [Enable & publish]
Then the Report can be Viewed, Burst, or Shared
For details on Constructing Derived fields see: Derived fields for Data Processing (Overview / Create / Aggregate) and Expression Syntax for Derived Fields Formulas