Sorting Records with Null Values

Last updated on: 2025-05-30

In the previous article, we covered how sorting works in Spark and how it helps in efficiently reading and organizing data. But what happens when your dataset contains null values? How does Spark treat these during sorting operations? This is a common scenario in big data processing, where incomplete or missing data is often inevitable.

In this article, we’ll explore how Spark handles sorting with null values and the specific functions it offers to manage them effectively. Before proceeding, you may want to revisit our Sort Functions article for a quick refresher.

Let’s consider a sample studentData DataFrame containing some null values:

+----+--------+-----+
|Roll|    Name|Marks|
+----+--------+-----+
|   1|    Ajay|   85|
|   2|Bharghav|   76|
|   3| Chaitra| null|
|   4|    null|   90|
|   5|  Sohaib| null|
+----+--------+-----+

Spark provides specific methods to sort data with null values, allowing you to decide whether nulls should appear at the beginning or the end of the sorted results.

These functions are

1. asc_nulls_first

Sorts the specified column in ascending order and places null values at the beginning (hence the name 'first').

val sortNullValues=studentData.orderBy(col("Name").asc_nulls_first)
sortNullValues.show()

Output

+----+--------+-----+
|Roll|    Name|Marks|
+----+--------+-----+
|   4|    null|   90|
|   1|    Ajay|   85|
|   2|Bharghav|   76|
|   3| Chaitra| null|
|   5|  Sohaib| null|
+----+--------+-----+

Since the sorting is applied only to the student names, null values in other columns are ignored.

Note: By default, Spark arranges the values in ascending order.

2. asc_nulls_last

Sorts the specified column in ascending order, placing null values at the end.

val sortNullValues=studentData.orderBy(col("Name").asc_nulls_last)
sortNullValues.show()

Output

+----+--------+-----+
|Roll|    Name|Marks|
+----+--------+-----+
|   1|    Ajay|   85|
|   2|Bharghav|   76|
|   3| Chaitra| null|
|   5|  Sohaib| null|
|   4|    null|   90|
+----+--------+-----+

3. desc_nulls_first

Sorts the specified column in descending order and places null values at the beginning.

val desNullValues=studentData.orderBy(col("Marks").desc_nulls_first)
desNullValues.show()

Output

+----+--------+-----+
|Roll|    Name|Marks|
+----+--------+-----+
|   5|  Sohaib| null|
|   3| Chaitra| null|
|   4|    null|   90|
|   1|    Ajay|   85|
|   2|Bharghav|   76|
+----+--------+-----+

Here the sorting is done in descending order based on marks and records with null value in Marks column are placed on top. The null value in the column 'Name' is treated as a string.

4. desc_nulls_last

Sorts the specified column in descending order and places null values at the end.

val desNullValues=studentData.orderBy(col("Marks").desc_nulls_last)
desNullValues.show()

Output

+----+--------+-----+
|Roll|    Name|Marks|
+----+--------+-----+
|   4|    null|   90|
|   1|    Ajay|   85|
|   2|Bharghav|   76|
|   5|  Sohaib| null|
|   3| Chaitra| null|
+----+--------+-----+

Summary

In this article, we learned how Spark handles sorting with null values and explored four dedicated functions:

asc_nulls_first: Ascending order with nulls at the top
asc_nulls_last: Ascending order with nulls at the bottom
desc_nulls_first: Descending order with nulls at the top
desc_nulls_last: Descending order with nulls at the bottom

These functions provide fine-grained control over how missing values are treated in sort operations, which is essential when working with real-world datasets containing incomplete information.

Sort Functions

Sorting Records with Null Values

1. asc_nulls_first

2. asc_nulls_last

3. desc_nulls_first

4. desc_nulls_last

Summary

Related Articles

References