Today, I needed to find the most popular capitalization of user entered product brands. Previously, I’d just used lowercase brands, something like this:
However, this was returning “whole foods” instead of “Whole Foods”, which was considered more desirable.
Given this data:
Here is how it works:
The first part of the query gives us the total count for each brand, grouped by lower case, and for each unique capitalization of that brand provides a count and row number.
In this query, the rows with row_number == 1 are all the unique capitalizations
Of those, we want the ones with the highest count. Again, we partition with another common table expression. This time, ordering by the count per unique capitalization to ensure that the first row for each group was the most popular
From here, we just pick the first row for each brand and get the most popular capitalization
Simple enough eh?