hierarchical faceting with Elasticsearch
Categories:
Mastering Hierarchical Faceting with Elasticsearch
Explore how to implement and optimize hierarchical faceting in Elasticsearch for powerful, drill-down search experiences, enhancing data exploration and user interaction.
Hierarchical faceting is a powerful feature in search applications that allows users to refine their search results by navigating through categories organized in a tree-like structure. Unlike flat facets, which present all options at once, hierarchical facets enable a drill-down experience, revealing subcategories only after a parent category has been selected. This approach significantly improves usability for large datasets with complex categorization, such as e-commerce product catalogs or document management systems.
Understanding Hierarchical Faceting
In a hierarchical facet, each level of the hierarchy represents a more specific classification. For example, in an e-commerce store, a user might first select 'Electronics', then 'Computers', and finally 'Laptops'. Each selection filters the results and updates the available sub-facets. Elasticsearch, with its robust aggregation framework, provides several ways to implement this functionality effectively. The key is to model your data and queries in a way that supports this nested filtering and aggregation.
flowchart TD A[User Search Query] --> B{Initial Facets (e.g., Category)}; B --> C["Select 'Electronics'"]; C --> D{Update Results & Sub-Facets (e.g., Type)}; D --> E["Select 'Computers'"]; E --> F{Update Results & Sub-Sub-Facets (e.g., Brand)}; F --> G["Select 'Laptops'"]; G --> H[Final Filtered Results];
Flow of a user interacting with hierarchical facets
Data Modeling for Hierarchical Facets
Effective hierarchical faceting starts with proper data modeling. There are primarily two common approaches: storing the full path as a single field or using an array of categories. Each has its advantages and disadvantages depending on the specific use case and desired query flexibility.
keyword
field for each level of the hierarchy or a single keyword
field storing the full path, rather than relying solely on nested
types for faceting, which can be more resource-intensive.Implementation Strategies in Elasticsearch
Elasticsearch offers powerful aggregation capabilities that are central to building hierarchical facets. We'll explore two primary methods: using path-based aggregations and using parent-child aggregations with filter
and terms
aggregations.
Strategy 1: Path-Based Faceting
This strategy involves storing the full hierarchical path of a category in a single field, often as a delimited string (e.g., Electronics > Computers > Laptops
). This allows for straightforward filtering and aggregation.
PUT /products
{
"mappings": {
"properties": {
"category_path": {
"type": "keyword"
},
"category_level1": {
"type": "keyword"
},
"category_level2": {
"type": "keyword"
},
"category_level3": {
"type": "keyword"
}
}
}
}
POST /products/_doc/1
{
"name": "Gaming Laptop",
"category_path": "Electronics > Computers > Laptops",
"category_level1": "Electronics",
"category_level2": "Computers",
"category_level3": "Laptops"
}
Mapping and example document for path-based hierarchical data
To aggregate on these paths, you can use terms
aggregations. When a user selects a facet, you filter the results and then re-run the aggregations to show the next level of facets.
GET /products/_search
{
"size": 0,
"aggs": {
"level1_categories": {
"terms": {
"field": "category_level1",
"size": 10
}
}
}
}
// After selecting 'Electronics'
GET /products/_search
{
"size": 0,
"query": {
"term": {
"category_level1": "Electronics"
}
},
"aggs": {
"level2_categories": {
"terms": {
"field": "category_level2",
"size": 10
}
}
}
}
Aggregating on hierarchical levels using terms
aggregation
Strategy 2: Using filter
and terms
Aggregations for Dynamic Hierarchies
This approach is more flexible and can be used when categories are stored in an array or when you want to dynamically build the hierarchy based on user selections. It involves using filter
aggregations to narrow down the scope for subsequent terms
aggregations.
PUT /products_dynamic
{
"mappings": {
"properties": {
"categories": {
"type": "keyword"
}
}
}
}
POST /products_dynamic/_doc/1
{
"name": "Ultra HD Monitor",
"categories": ["Electronics", "Displays", "Monitors"]
}
Mapping and example document for dynamic hierarchical data
To implement hierarchical faceting with this model, you'd typically use a series of nested filter
and terms
aggregations. Each filter
aggregation would represent a selected parent category, and the terms
aggregation within it would find the next level of subcategories.
GET /products_dynamic/_search
{
"size": 0,
"aggs": {
"all_categories": {
"terms": {
"field": "categories",
"size": 10
}
},
"electronics_filter": {
"filter": {
"term": {
"categories": "Electronics"
}
},
"aggs": {
"electronics_subcategories": {
"terms": {
"field": "categories",
"size": 10,
"exclude": ["Electronics"]
}
}
}
}
}
}
Dynamic hierarchical aggregation using filter
and terms
exclude
in terms
aggregations, be mindful of performance implications for very large numbers of terms. For complex exclusions, a script
aggregation might be more flexible but also more resource-intensive.Advanced Considerations and Best Practices
Beyond the basic implementation, several factors can optimize your hierarchical faceting experience.
Performance Optimization
- Fielddata vs. Doc Values: Ensure your category fields are mapped as
keyword
to leverage doc values, which are optimized for aggregations and sorting, rather than fielddata, which is memory-intensive. - Caching: Elasticsearch aggressively caches aggregation results. Design your queries to maximize cache hits by using consistent query structures.
- Pruning: For very deep hierarchies, consider only showing the most relevant next level of facets rather than pre-calculating all possible paths.
User Experience
- Breadcrumbs: Always provide breadcrumbs to show the user's current position within the hierarchy.
- Facet Counts: Display the number of matching documents for each facet option.
- Clear All/Reset: Offer an easy way for users to clear all selected facets and start over.
graph TD A[Data Modeling] --> B{Keyword Fields vs. Nested}; B --> C[Path-based]; B --> D[Array of Categories]; C --> E[Terms Aggregation]; D --> F[Filter + Terms Aggregation]; E --> G[Performance Optimization]; F --> G; G --> H[User Experience]; H --> I[Hierarchical Faceting Implemented];
Decision flow for implementing hierarchical faceting