What does JOLT [&] syntax means?

Learn what does jolt [&] syntax means? with practical examples, diagrams, and best practices. Covers json, apache-nifi, jolt development techniques with visual explanations.

Understanding JOLT's '&' Syntax for Dynamic JSON Transformations

Abstract representation of JSON data being transformed with dynamic elements

Explore the powerful '&' syntax in JOLT transformations, a key feature for dynamically referencing input data during JSON restructuring. Learn how to use it effectively with practical examples.

JOLT (JSON to JSON Transformation Language) is a powerful tool for transforming JSON data, often used in data processing pipelines like Apache NiFi. While JOLT offers various operations, the & syntax is particularly crucial for dynamic transformations. It allows you to reference values from the input JSON path as keys or values in the output, making your transformations highly flexible and adaptable to varying input structures.

The Core Concept of '&'

At its heart, the & symbol in JOLT acts as a placeholder for a value from the input JSON. When JOLT processes a shift or default operation, it evaluates the & symbol based on its position within the input path being matched. This enables you to dynamically construct output keys or values using data from the input itself, rather than hardcoding them.

flowchart TD
    A[Input JSON] --> B{JOLT Spec Evaluation}
    B --> C{"Match Input Path (e.g., 'data.items[#].id')"}
    C --> D{"Encounter '&' in Spec (e.g., 'output.&')"}
    D --> E["Resolve '&' to Input Value (e.g., 'id' value)"]
    E --> F[Construct Output Key/Value Dynamically]
    F --> G[Output JSON]
    style A fill:#f9f,stroke:#333,stroke-width:2px
    style G fill:#bbf,stroke:#333,stroke-width:2px

Flow of JOLT's '&' syntax resolution

Basic Usage: Referencing Input Keys

The most common use case for & is to take a key from the input JSON and use it as a key in the output. Consider a scenario where you have a list of items, and you want to promote a specific field's value to become the key for each item in the output.

[
  {
    "id": "item123",
    "name": "Product A",
    "value": 100
  },
  {
    "id": "item456",
    "name": "Product B",
    "value": 200
  }
]
[
  {
    "operation": "shift",
    "spec": {
      "*": {
        "id": "&1.id",
        "name": "&1.name",
        "value": "&1.value"
      }
    }
  }
]

The above spec is incorrect for the desired outcome. The &1 refers to the value of the first wildcard match. If we want to use the id as the key in the output, we need to structure the spec differently. Let's say we want to transform the array into an object where the id is the key and the rest of the object is the value.

[
  {
    "operation": "shift",
    "spec": {
      "*": {
        "id": "@(1,id)",
        "name": "@(1,name)",
        "value": "@(1,value)"
      }
    }
  }
]

Wait, that's not quite right either. The & syntax is for values from the input path. To use an input value as an output key, we combine & with the @ (current value) or $ (parent value) operators. Let's refine this. If we want to use the id as the key, and the entire object as the value, we'd do this:

[
  {
    "operation": "shift",
    "spec": {
      "*": {
        "id": "@(1,id)",
        "name": "@(1,name)",
        "value": "@(1,value)",
        "@": "&1.@(1,id)" 
      }
    }
  }
]

This spec is still a bit convoluted. The most straightforward way to use an input value as an output key is to match the value itself and then use & to refer to it. Let's simplify the goal: transform an array of objects into a single object where the id field of each original object becomes the key, and the entire original object (excluding the id itself if desired) becomes the value.

[
  {
    "operation": "shift",
    "spec": {
      "*": {
        "id": "@(1,id)",
        "name": "@(1,name)",
        "value": "@(1,value)",
        "@": "&1.@(1,id)" 
      }
    }
  }
]

Let's try a more direct approach for using id as the key. The & refers to the name of the field being matched. If we want the value of a field to be the key, we use @(field_name) or $(field_name) to get the value, and then & to place it. The & refers to the key of the current level. If we want to use the value of a field as a key, we need to match that value. This is where & combined with $ or @ becomes powerful.

[
  {
    "operation": "shift",
    "spec": {
      "*": {
        "id": {
          "@": "&(1,id)"
        },
        "name": "@(1,name)",
        "value": "@(1,value)"
      }
    }
  }
]

This is still not quite right. The & refers to the key of the current level. To use the value of id as a key, we need to match the id field and then use its value. The correct way to achieve this is often by matching the id field and then using & to refer to the value of that id field, combined with @ to get the entire object. Let's use a simpler example to illustrate &.

Understanding & with Wildcards and Array Indices

The & syntax becomes especially useful when dealing with arrays and wildcards. It allows you to capture parts of the input path that were matched by wildcards (*) or array indices ([]).

{
  "data": {
    "items": [
      {
        "id": "A1",
        "value": 10
      },
      {
        "id": "B2",
        "value": 20
      }
    ]
  }
}
[
  {
    "operation": "shift",
    "spec": {
      "data": {
        "items": {
          "*": {
            "id": "output.&1.id",
            "value": "output.&1.value"
          }
        }
      }
    }
  }
]

In this spec, &1 refers to the value matched by the first wildcard (*) from the right. In this case, it's the array index. So, for the first item (index 0), &1 becomes 0, and for the second item (index 1), &1 becomes 1.

{
  "output": {
    "0": {
      "id": "A1",
      "value": 10
    },
    "1": {
      "id": "B2",
      "value": 20
    }
  }
}

Dynamic Key Generation with & and @

To use a value from the input as an output key, you often combine & with the @ (current value) or $ (parent value) operators. Let's revisit the idea of using an id field's value as an output key.

[
  {
    "id": "user_101",
    "name": "Alice",
    "email": "alice@example.com"
  },
  {
    "id": "user_102",
    "name": "Bob",
    "email": "bob@example.com"
  }
]
[
  {
    "operation": "shift",
    "spec": {
      "*": {
        "id": {
          "@1": "users.&"
        },
        "name": "users.&1.name",
        "email": "users.&1.email"
      }
    }
  }
]

In this spec, "@1": "users.&" is the key. When id is matched, @1 refers to the value of the id field (e.g., user_101). The & on the right side of the colon then takes this value (user_101) and uses it as the key under users. The &1 in users.&1.name refers to the key of the current level, which is the id value we just promoted. This allows us to build a structure where the id value is the key, and the rest of the object's fields are nested under it.

{
  "users": {
    "user_101": {
      "name": "Alice",
      "email": "alice@example.com"
    },
    "user_102": {
      "name": "Bob",
      "email": "bob@example.com"
    }
  }
}

Advanced & Usage: Combining with $ and #

JOLT's & can be combined with other path operators for even more complex transformations. For instance, $ refers to the value of a parent field, and # refers to the size of an array.

{
  "products": [
    {
      "code": "P001",
      "details": {
        "name": "Laptop",
        "price": 1200
      }
    },
    {
      "code": "P002",
      "details": {
        "name": "Mouse",
        "price": 25
      }
    }
  ]
}
[
  {
    "operation": "shift",
    "spec": {
      "products": {
        "*": {
          "details": {
            "name": "catalog.&2.@(1,code).productName",
            "price": "catalog.&2.@(1,code).productPrice"
          }
        }
      }
    }
  }
]

In this example, &2 refers to the key products (the second wildcard/key from the right). @(1,code) refers to the value of the code field one level up (i.e., P001 or P002). This spec dynamically creates keys like catalog.products.P001.productName.

{
  "catalog": {
    "products": {
      "P001": {
        "productName": "Laptop",
        "productPrice": 1200
      },
      "P002": {
        "productName": "Mouse",
        "productPrice": 25
      }
    }
  }
}

The & syntax is a cornerstone of dynamic JOLT transformations, allowing you to create flexible and robust JSON processing logic. By understanding how it interacts with wildcards, array indices, and other path operators, you can unlock the full potential of JOLT for complex data restructuring tasks.