hierarchical faceting with Elasticsearch

Learn hierarchical faceting with elasticsearch with practical examples, diagrams, and best practices. Covers lucene, full-text-search, elasticsearch development techniques with visual explanations.

Mastering Hierarchical Faceting with Elasticsearch

Diagram illustrating a tree-like structure representing hierarchical facets in Elasticsearch

Explore how to implement and optimize hierarchical faceting in Elasticsearch for powerful, drill-down search experiences, enhancing data exploration and user interaction.

Hierarchical faceting is a powerful feature in search applications that allows users to refine their search results by navigating through categories organized in a tree-like structure. Unlike flat facets, which present all options at once, hierarchical facets enable a drill-down experience, revealing subcategories only after a parent category has been selected. This approach significantly improves usability for large datasets with complex categorization, such as e-commerce product catalogs or document management systems.

Understanding Hierarchical Faceting

In a hierarchical facet, each level of the hierarchy represents a more specific classification. For example, in an e-commerce store, a user might first select 'Electronics', then 'Computers', and finally 'Laptops'. Each selection filters the results and updates the available sub-facets. Elasticsearch, with its robust aggregation framework, provides several ways to implement this functionality effectively. The key is to model your data and queries in a way that supports this nested filtering and aggregation.

flowchart TD
    A[User Search Query] --> B{Initial Facets (e.g., Category)};
    B --> C["Select 'Electronics'"];
    C --> D{Update Results & Sub-Facets (e.g., Type)};
    D --> E["Select 'Computers'"];
    E --> F{Update Results & Sub-Sub-Facets (e.g., Brand)};
    F --> G["Select 'Laptops'"];
    G --> H[Final Filtered Results];

Flow of a user interacting with hierarchical facets

Data Modeling for Hierarchical Facets

Effective hierarchical faceting starts with proper data modeling. There are primarily two common approaches: storing the full path as a single field or using an array of categories. Each has its advantages and disadvantages depending on the specific use case and desired query flexibility.

Implementation Strategies in Elasticsearch

Elasticsearch offers powerful aggregation capabilities that are central to building hierarchical facets. We'll explore two primary methods: using path-based aggregations and using parent-child aggregations with filter and terms aggregations.

Strategy 1: Path-Based Faceting

This strategy involves storing the full hierarchical path of a category in a single field, often as a delimited string (e.g., Electronics > Computers > Laptops). This allows for straightforward filtering and aggregation.

PUT /products
{
  "mappings": {
    "properties": {
      "category_path": {
        "type": "keyword"
      },
      "category_level1": {
        "type": "keyword"
      },
      "category_level2": {
        "type": "keyword"
      },
      "category_level3": {
        "type": "keyword"
      }
    }
  }
}

POST /products/_doc/1
{
  "name": "Gaming Laptop",
  "category_path": "Electronics > Computers > Laptops",
  "category_level1": "Electronics",
  "category_level2": "Computers",
  "category_level3": "Laptops"
}

Mapping and example document for path-based hierarchical data

To aggregate on these paths, you can use terms aggregations. When a user selects a facet, you filter the results and then re-run the aggregations to show the next level of facets.

GET /products/_search
{
  "size": 0,
  "aggs": {
    "level1_categories": {
      "terms": {
        "field": "category_level1",
        "size": 10
      }
    }
  }
}

// After selecting 'Electronics'
GET /products/_search
{
  "size": 0,
  "query": {
    "term": {
      "category_level1": "Electronics"
    }
  },
  "aggs": {
    "level2_categories": {
      "terms": {
        "field": "category_level2",
        "size": 10
      }
    }
  }
}

Aggregating on hierarchical levels using terms aggregation

Strategy 2: Using filter and terms Aggregations for Dynamic Hierarchies

This approach is more flexible and can be used when categories are stored in an array or when you want to dynamically build the hierarchy based on user selections. It involves using filter aggregations to narrow down the scope for subsequent terms aggregations.

PUT /products_dynamic
{
  "mappings": {
    "properties": {
      "categories": {
        "type": "keyword"
      }
    }
  }
}

POST /products_dynamic/_doc/1
{
  "name": "Ultra HD Monitor",
  "categories": ["Electronics", "Displays", "Monitors"]
}

Mapping and example document for dynamic hierarchical data

To implement hierarchical faceting with this model, you'd typically use a series of nested filter and terms aggregations. Each filter aggregation would represent a selected parent category, and the terms aggregation within it would find the next level of subcategories.

GET /products_dynamic/_search
{
  "size": 0,
  "aggs": {
    "all_categories": {
      "terms": {
        "field": "categories",
        "size": 10
      }
    },
    "electronics_filter": {
      "filter": {
        "term": {
          "categories": "Electronics"
        }
      },
      "aggs": {
        "electronics_subcategories": {
          "terms": {
            "field": "categories",
            "size": 10,
            "exclude": ["Electronics"]
          }
        }
      }
    }
  }
}

Dynamic hierarchical aggregation using filter and terms

Advanced Considerations and Best Practices

Beyond the basic implementation, several factors can optimize your hierarchical faceting experience.

Performance Optimization

  • Fielddata vs. Doc Values: Ensure your category fields are mapped as keyword to leverage doc values, which are optimized for aggregations and sorting, rather than fielddata, which is memory-intensive.
  • Caching: Elasticsearch aggressively caches aggregation results. Design your queries to maximize cache hits by using consistent query structures.
  • Pruning: For very deep hierarchies, consider only showing the most relevant next level of facets rather than pre-calculating all possible paths.

User Experience

  • Breadcrumbs: Always provide breadcrumbs to show the user's current position within the hierarchy.
  • Facet Counts: Display the number of matching documents for each facet option.
  • Clear All/Reset: Offer an easy way for users to clear all selected facets and start over.
graph TD
    A[Data Modeling] --> B{Keyword Fields vs. Nested};
    B --> C[Path-based];
    B --> D[Array of Categories];
    C --> E[Terms Aggregation];
    D --> F[Filter + Terms Aggregation];
    E --> G[Performance Optimization];
    F --> G;
    G --> H[User Experience];
    H --> I[Hierarchical Faceting Implemented];

Decision flow for implementing hierarchical faceting