Advanced Search Tuning: Understanding Rate Calculation
The total rate of the search result comprises three types of score:
- Token score: weight of a word and its relation to a phrase
- Field score; e.g., the Element name field is rated higher then the Element description field
- Table-relation score; e.g., an Element's Dimension Value is rated higher than the one found by a Tag
DEFINITION OF "TOKEN": A token is a unique word or combination of characters in a text.
During processing, text is segmented into words, punctuation, and unique character sets by applying rules specific to each language. For example, punctuation at the end of a sentence is ignored whereas U.K. remains one token.
Token Score
Each lemma (a basic form of a word) receives 100 points by default, unless it is a stop word.
Stop words receive 50 points, but in case a phrase includes only stop words, each of these words is assigned 100 points. Fifty additional points are divided by the number of words in the search argument, and the result is added to reach the final score.
Consider the following example:
- There is an element and a related Dataset:
- Element(name=”Sales metric”, description=”Canada sales”)
- Dataset(name=”Daily sales in Canada”)
- The token score is calculated during indexation and is as follows:
- Element name [Sales metric] →
- sales[100 + 50 / 2], metric[100 + 50 / 2] →
- sales[125], metric[125]
- Element description [Canada sales] →
- canada[100 + 50 / 2], sales[100 + 50 / 2] →
- canada[125], sales[125]
- Dataset name [Daily sales in Canada] →
- daily[100 + 50 / 4], sales[100 + 50 / 4], in[50 + 50 / 4], canada[100 + 50 / 4] →
- daily[112.5], sales[112.5], in[62.5], canada[112.5]
- Element name [Sales metric] →
Result Score
Tokens that satisfy the search argument are identified during the search. After that, the program:
- Summarizes token scores
- Calculates the rate based on the order of words (tokens) in a phrase and adds it to the result
- Multiplies the result by a Field score and divides it by 100 (field scores are used as coefficients)
- Multiplies the result by the table relation score and divides it by 100
The score is calculated as illustrated in Steps 1- 4 that follow.
1. Summarize Token Scores
The query “Canada daily sales” returns the following tokens:
- Token score for Element name [Sales metric] = sales[125] = 125
- Token score for Element description [Canada Sales] = canada[125] + sales[125] = 250
- Token score for Dataset name [Daily sales in Canada] = daily[112.5] + sales[112.5] + canada[112.5] = 337.5
2. Calculate the Rate Based on Word Order
- Element Name Canada daily sales vs Sales metric → 0 points (only one match – "sales") = 125
- Element Description Canada daily sales vs Canada sales → 1 point ("canada" goes before sales, but there are other words between them) = 251
- Dataset Name Canada daily sales vs Daily sales in Canada → 2 points ("canada" is at the end of the phrase, but “sales” goes directly after “daily”) = 337.5 + 2 = 339.5
3. Multiply by Field Score
Field rates can be changed under Admin>System>Search Setup>Advanced Search Tuning to adjust rankings according to the needs of your organization.
For the purpose of this article, default values will be used.
- Score for Element name [100] = 125 * 100 / 100 = 125
- Score for Element description [50] = 251 * 50 / 100 = 125.5
- Score for Dataset name [100] = 339.5 * 100 / 100 = 339.5
4. Multiply by Table-Relation Score:
- Score for Element → Element (for Element name) [100] = 125 * 100 / 100 = 125
- Score for Element → Element (for Element description) [100] = 125.5 * 100 / 100 = 125.5
- Score for Dataset → Element (for Dataset name) [50] = 339.5 * 50 / 100 = 169.75
- Score for Dataset → Dataset (for Dataset name) [100] = 339.5 * 100 / 100 = 339.5
The search returns two entities: the element and Dataset. The element has 3 scores: 125, 125.5, and 169.75. Thus, the program will consider that the Element is found by Dataset name since it has the highest score (169.75). The Dataset is found by its Name.
Search results are as follows:
- Dataset [339.5]
- Element [169.75]
Additional Rules
When entities have the same score, additional ordering rules are applied:
- Certified elements have higher rankings than non-certified
- “Metric” and “Multi-Metric” element types have more weight than other types
- Elements with higher engagement rates have higher rankings
- Elements are ranked based on their internal ID (in ascending order)
[6.4.2] In case with partial search, sublemmas are used. If several sublemmas appear in the same word, this does not add extra score to the search result. For example, in case of a search for "sal der", results "salamander", "sale", "commander" will have the same score.