Issue with the IS NULL function in BigQuery
Categories:
Understanding and Troubleshooting IS NULL in BigQuery

Explore the nuances of the IS NULL
function in Google BigQuery, including common pitfalls with empty strings, arrays, and structs, and learn best practices for accurate null value detection.
In Google BigQuery, the IS NULL
operator is fundamental for identifying records where a specific column or expression has no assigned value. While seemingly straightforward, its behavior can sometimes lead to unexpected results, especially when dealing with different data types like empty strings, empty arrays, or structs. This article delves into the intricacies of IS NULL
in BigQuery, clarifies its application across various data types, and provides practical examples to help you accurately detect and handle null values in your datasets.
The Basics of IS NULL
At its core, IS NULL
checks if an expression evaluates to the SQL NULL
value. This is distinct from an empty string (''
), an empty array ([]
), or an empty struct ({}
), which are considered non-null values in BigQuery. Understanding this distinction is crucial for writing accurate queries and avoiding data misinterpretations.
SELECT
column_name,
column_name IS NULL AS is_null_check
FROM
`your_project.your_dataset.your_table`
WHERE
column_name IS NULL;
Basic usage of IS NULL
to filter for null values.
Distinguishing NULL from Empty Values
A common source of confusion arises when users expect IS NULL
to catch empty strings or empty arrays. BigQuery, adhering to standard SQL principles, treats these as distinct, non-null entities. An empty string is a string of zero length, an empty array contains no elements, and an empty struct has no fields. None of these are NULL
.
flowchart TD A[Data Value] --> B{Is it NULL?} B -- Yes --> C[IS NULL = TRUE] B -- No --> D{Is it an Empty String/Array/Struct?} D -- Yes --> E[IS NULL = FALSE] D -- No --> F[IS NULL = FALSE]
Decision flow for IS NULL
evaluation in BigQuery.
NULL
signifies the absence of a value, whereas an empty string, array, or struct signifies the presence of a value that happens to be empty.Handling Different Data Types
To accurately identify truly null values alongside empty representations, you often need to combine IS NULL
with other checks specific to the data type. Below are examples for common scenarios.
Strings
-- To find truly NULL strings or empty strings
SELECT
string_column,
string_column IS NULL OR string_column = '' AS is_null_or_empty
FROM
`your_project.your_dataset.your_table`;
Arrays
-- To find truly NULL arrays or empty arrays
SELECT
array_column,
array_column IS NULL OR ARRAY_LENGTH(array_column) = 0 AS is_null_or_empty_array
FROM
`your_project.your_dataset.your_table`;
Structs
-- To find truly NULL structs
-- BigQuery does not have a direct way to check if a struct is 'empty' in the same way as arrays/strings
-- An empty struct {} is not NULL. You typically check for NULL on the struct itself.
SELECT
struct_column,
struct_column IS NULL AS is_null_struct
FROM
`your_project.your_dataset.your_table`;
COALESCE
with empty strings. COALESCE(string_column, 'default')
will return 'default'
only if string_column
is NULL
, not if it's an empty string. For empty strings, use IFNULL(NULLIF(string_column, ''), 'default')
or similar logic.Practical Example: Data Cleaning
Consider a scenario where you're cleaning user input data. You want to identify records where a user_email
field is either genuinely missing (NULL
) or was submitted as an empty string. This requires a combined approach.
SELECT
user_id,
user_email,
CASE
WHEN user_email IS NULL THEN 'Missing Email (NULL)'
WHEN user_email = '' THEN 'Missing Email (Empty String)'
ELSE 'Valid Email'
END AS email_status
FROM
`your_project.your_dataset.user_data`;
Categorizing email status based on NULL or empty string values.
By understanding the precise definition of NULL
in BigQuery and how it differs from empty values across various data types, you can write more robust and accurate SQL queries for data validation, cleaning, and analysis. Always test your assumptions, especially when dealing with data that might contain a mix of NULL
s and empty representations.