Create a Dataset from any Data Source

Datasets serve as a staging layer between Data Sources and Metric Insights elements (Metrics and Reports).

The ultimate goal of Dataset functionality is to separate data loading, staging and discovery from data display and distribution. A Dataset (and its Dataset Views) can be used as the source for multiple Reports and Metrics, allowing a single source for many different elements and use-cases.

To understand how our Security model interacts with Datasets, see Datasets in the Controlling Access within Metric Insights manual.

[6.3.1] Datasets can be moved between the available Data Storages.

Click the Move icon next to the Data Storage on Data tab and follow the prompts to complete the operation.

Video Tutorial

1. Access New or Content menu > Datasets

1.1. New Menu >Dataset > Create New

1.2. Content menu > Datasets > New dataset
Access Admin > Datasets

Below the grid, click [+ New dataset]

1.3. Add Dataset from a Dataset view (Materialize a Dataset)

New feature in Release 6.2.2 is the ability to Create a new Dataset from a Dataset View via the Actions dropdown.

Dataset: Daily sales (Demo) - Google Chrome

The new Dataset is automatically created and enabled. Continue with the steps below to modify any of the  defaults as needed.

One new Dataset may be created from a View.  After one is created, you will no longer see the option on the Action Dropdown.

2. [Info tab] Define the basics

[Info tab] Define the basics
  1. Measured: select the measurement interval that applies to the level of aggregation that you want in your result set.
  2. Collecting: new Datasets are always disabled by default to make sure that you can take time to configure them properly before enabling. This setting is duplicated at the top of the screen.
  3. Name: provide a unique name for your dataset. Preferably, the dataset name should explain what kind of data it contains.

Move to the Data tab to define the source of data and how often it should be updated.

3. [Data tab] Configure data collection

3.1. [Data tab] for a Plug-in (Tableau example)

PREREQUISITES:

  • Data Source: Metric Insights must have a working data source connection to Tableau. If you have not yet configured a data source connection, see instructions for your particular BI Tool here: Connecting To Data Sources.

3.1.1. First find the Tableau Report you want to further use as a Dataset

At the Tableau dashboard pay attention to:

  1.  Worksheet name: You will need to remember this when defining your Data Source
  2. You can filter elements sourced from your BI Tool, see Setting Filters below. You can also include some of these filter values into your fetch command. Both of these will enable MI to load data selectively, choosing only the values you really need instead of fetching all the data from this worksheet.

3.1.2. [Configuration tab] This is the 5.6 version, your display may differ

  1. Data Source: Select the Tableau plug-in serving as a Data Source for this Dataset. For more info, see: Establish Connectivity to Tableau Server
  2. Set the Data Collection Trigger which is going to initiate updating information in a Metric or Dimension Values. If there is no option matching your requirements, scroll down to the bottom of the drop-down list and click Add New Data Collection Trigger.
  3. Select the Tableau Worksheet from the drop-down list.

3.1.3. Setting Filters

For more details, see examples in  Pre-filtering BI tools (External Reports)

  1. Select Manage Filters
  2. The Add Tableau Filter pop-up opens
  3. You must exactly match the Filter Names used in Tableau - see First find the Tableau Report you want to further use as a Dataset above or select the on-screen Help link.
  4. You can enter the Filter Values manually or by mapping to existing Filters in Dimensions or other Datasets
  5. The Values grid will be populated with your choice
  6. Save the changes made and optionally add more Filters

3.1.4. Optionally you can limit the values

By default, filters are set to 'All'. To change this setting:

  1. Click Edit (a Pencil icon)
  2. The Tableau Filter Values pop-up opens
  3. You have 3 options to choose from:
    • 'Use All Values'
    • 'Use Only Selected Values'
    • 'Ignore Filter'
  4. In case you need to include only certain values, select them in the list.

3.1.5. Plugin Command: [Option 1] Visual Editor

  1. Select the Visual Editor option.
  2. Select Columns
  3. The Tableau Query Builder pop-up opens
  4. Select the parameters you would like to include to your Plugin command
  5. Save your selection.

3.1.6. Plugin Command: [Option 2] Manual Entry

  1. In the Plugin command box choose the Command option.
  2. Write your command. For a hint on plugin commands, refer to Plugin commands

3.1.7. Select "Validate" to preview data

3.2. [Data tab] for SQL (Database example)
[Data tab] Configure data collection
  1. Data Source: Choose a data fetch method from drop-down list. A SQL-based Data Source is used in this example. For more details on other available data sources and specific fetch method requirements, see Understanding Data Sources
  2. Data Collection Schedule: select how often data should be recollected for this Dataset to ensure that it contains relevant data.
  3. Write a SQL Statement defining the data to be extracted from the database. You may use :measurement_time in your statement to bind in a date or series of date values.  See also:  How do I use the :measurement_time bind variable?
  4. Validate your statement. If you have included :measurement_time parameter in your statement, it is required to specify Measurement Time for Parameter Substitution.
  5. If the statement is valid, a Validation Rows Preview section pops-up at the bottom of the screen. You can also open it by clicking Show validation rows under the SQL Statement box.

4. Optionally, customize the display of your Dataset values

4.1. SQL example

[New in 5.0.4] Optionally, customize the display of your Dataset values
  1. Click Save to display the Dataset Columns table.
  2. Dataset Columns: This table can be used to rename column names and define precision for numeric fields if needed. In the given example, there are a few fields that could use better Column Names and let's say we would like the Total_unit_count to be a whole number rather than the default value of 2 decimal points.
  3. Click the Gear for each field you would like to rename
  4. On the resulting pop-up:
    • Type in a new Column Name
    • If the data element is a floating-point integer, you can also change the number of decimals to display using the drop-down list on the Precision field.

4.2. BI Tool example

  1. Select Modify Columns
  2. Query Builder opens
  3. Apply Overrides and Aggregations as needed

4.3. Special and accented characters

Special and accented characters (Éé Êê Èè Ëë Ââ Àà Ää Ïï Îî Üü Ûû Ùù Ÿÿ Çç Ôô Öö ẞß Ææ Œœ)

Regardless of the Dataset's Data Source, Metric Insights supports special and accented characters. Please note, that after the command is validated, and data is distributed to columns and the Dataset Columns table is shown, the special characters may be converted to the underscore symbol [ _ ] in the Reference Name column. This behavior is only characteristic for the Dataset Editor and doesn't cause issues in the Viewer.

5. Advanced settings

Advanced settings
  1. Snapshot Datasets are associated with keeping Dataset history and having the ability to compare Dataset instances over time.
    • If this the Dataset is not defined as a Snapshot dataset (this field is set to 'no'), then only the most recent instance of the Dataset will be retained
    • If it is a Snapshot Dataset (the history is going to be kept, for more details refer to: Snapshot Datasets: Comparing Instances), then additional settings will be exposed (see below)
  2. This field is shown for Snapshot Datasets only.
    1. If Can historical instances be backfilled? field is set to 'no', then only one instance of the Dataset will be computed at run time and it is required to set the value for the ':measurement_time' variable in the field below. It's important to note that while only one instance of the Dataset will be computed at run time, a new instance of the Dataset will be computed at each succeeding refresh interval. Since history is kept (the Snapshot Dataset? field is set to 'yes'), all instances will be retained. This technique can be used to create 'snapshots' of your underlying data at fixed time intervals.
    2. If this field is set to 'yes', then multiple instances of the Dataset can be computed at run time and the ':measurement_time' variable is defined automatically by the system.
  3. This field is shown for Snapshot Datasets in the Advanced tab.
    1. Set Data Retention Period

 This example shows a Daily Report keeping three years of data (Daily: 1095 = 3*365 (3 years)).

See also: How do I use the: measurement_time bind variable?

At the upper right corner of the screen, click Enable and Publish.

6. Enable and Publish to open the Dataset Viewer

You are redirected to the Dataset Viewer

Your Dataset is now ready for use.

7. What would you like to do next?