Configure Search to Include Table and Column Metadata from Databricks Unity Catalog

With the introduction of Analytics Marketplace in v7.1.0, discovery and search capabilities across multiple data catalog platforms has been significantly expanded.

You can search for the associated database table assets without leaving the Metric Insights UI and without recreating them as objects in Metric Insights. Global Search will return the assets with the list of the available metadata and links to proceed to the asset in the data catalog tool.

PREREQUISITES:

1. Create Dataset from All Table Assets

For more details on how to create Datasets, refer to Understanding Datasets.

For more details on how to create a Dataset form Databricks Unity Catalog, refer to Create Dataset from Databricks Unity Catalog.

  1. Create a Dataset from the All Table Assets object under Analytic Marketplace.
  2. Note the Dataset ID to be used in Step 2.

2. Create Dataset from All Table Assets Dataset

  1. Data Source: select the Data Source used as Data Storage when creating Dataset from the Databricks Unity Catalog All Table Assets object;
  2. Query: the query is intended to:
    • Set Object Type Column to "Table" and Content Type Column to "Databricks";
    • Transform standard Databricks Unity Catalog fields, such as Catalog, Schema Name, Table Type, etc., into JSON objects.
    • Transform user-added fields into JSON objects.
      • This step will ensure that metadata is searchable and appears in search properly formatted.
      • The query must include the ID of Dataset created from the Databricks Unity Catalog All Table Assets object in "FROM dataset_datasetID".
  3. Make sure the Dataset Column Length for "columns" is set high enough. This may be necessary if your table contains many columns or very detailed descriptions.  
Example Query
select JSON_OBJECT('Catalog', catalog) AS "Catalog"
, JSON_OBJECT('Schema Name', schema_name) AS "Schema Name"
, table_name
, "databricks" as content_type
, "Table" as mi_object_type
, JSON_OBJECT('Table Type', type) AS "Table Type" 
, owner
, comment
, JSON_OBJECT('Created By', created_by
, 'Created At', created_at
) AS Created 
, JSON_OBJECT('Updated By', updated_by
, 'Updated At', updated_at
) AS Updated
, catalog_url
, JSON_OBJECT('Number of Columns', columns_number) AS "Number of Columns" 
, columns as Columns
, tags as Tags
from   dataset_XXX

Acceptable JSON Formats:

{"Pii Data": ""}
{"Data Steward": "Clyde, Curt", "Business Owner": "Clint", "Technical Owner": "Kyle"}
[{"label":"Catalog","value":"Gifts"}]
[{"name":"Product Type","description":"High level grouping of our products."}]

Access Admin > System > Search Setup

  1. Under Datasets to include in Search, [+Add Dataset].
  2. Dataset & View: choose the Dataset created in Step Create Dataset from All Table Assets Dataset.
  3. [Save]

4. Map Dataset Columns

Use the following mapping example with Allow click through to external tool enabled.

Field Dataset Column
Title Column table_name
Description Column comment
External Catalog Type Databricks
External Catalog Asset URL catalog_url
Content Type Column connection_type
Object Type Column mi_object_type
Business Owner/Technical Owner/Data Steward* owner

*If retrieving email addresses from the data catalog system, they can be mapped to Metric Insights’ native owner fields. For example, the "owner" field from Databricks Unity Catalog can be mapped to the Business Owner field. Otherwise, owner fields can instead be pulled in as JSON objects.

For more details on configuring search across Datasets and Access, refer to Configure Search Across Specific Dataset (v7.1.0+).

5. Add Glossary and Custom Fields Mapping

  1. Under Dataset Columns for Search, [+Add Dataset Columns].
  2. Dataset Column: choose a custom field column from the Dataset.
  3. Column data is formatted in JSON: enable. Must be enabled for all added columns, if multiple.
  4. Icon: optionally, select an icon.
  5. [Save] and add as many as needed.
  6. [Back to Search Setup] and [Update All Search Indexes] at the upper right.

6. Verify Result