Sort Functions

Last updated on: 2025-05-30

When working with large datasets, the data can often be unstructured and include null values, making it difficult to read or process efficiently. Spark provides functions to sort and organize this data, which can help optimize performance and improve readability.

In this article, we'll explore how to sort records in Spark DataFrames using the following functions:

orderBy()
sort() (alias for orderBy())

We’ll use the Shopping Bill DataFrame from our previous discussions on grouping operations. To revisit those topics, check out these articles:

Here’s our sample DataFrame:

+--------+-----------+----------+---+------------------+-----------------+
|Item no.|  Item Name|  Category|MRP|  Discounted Price|  Price After Tax|
+--------+-----------+----------+---+------------------+-----------------+
|       1|Paper Clips|Stationery| 23|              20.7|            24.84|
|       2|     Butter|     Dairy| 57|     51.3000000004|            61.56|
|       3|      Jeans|   Clothes|799|             719.1|           862.92|
|       4|      Shirt|   Clothes|570|             513.0|            615.6|
|       5|Butter Milk|     Dairy| 50|              45.0|             54.0|
|       6|        Bag|   Apparel|455|             409.5|            491.4|
|       7|      Shoes|   Apparel|901|             810.9|    973.079999999|
|       8|    Stapler|Stationery| 50|              45.0|             54.0|
|       9|       Pens|Stationery|120|             108.0|            129.6|
+--------+-----------+----------+---+------------------+-----------------+

Sort Alphabetically by Item Name

Use orderBy() to sort the DataFrame alphabetically by Item Name.

val sortPriceAfterTax = priceAfterTax.orderBy(col("Item Name"))

sortPriceAfterTax.show()

Output

+--------+-----------+----------+---+------------------+-----------------+
|Item no.|  Item Name|  Category|MRP|  Discounted Price|  Price After Tax|
+--------+-----------+----------+---+------------------+-----------------+
|       6|        Bag|   Apparel|455|             409.5|            491.4|
|       2|     Butter|     Dairy| 57|     51.3000000004|            61.56|
|       5|Butter Milk|     Dairy| 50|              45.0|             54.0|
|       3|      Jeans|   Clothes|799|             719.1|           862.92|
|       1|Paper Clips|Stationery| 23|              20.7|            24.84|
|       9|       Pens|Stationery|120|             108.0|            129.6|
|       4|      Shirt|   Clothes|570|             513.0|            615.6|
|       7|      Shoes|   Apparel|901|             810.9|    973.079999999|
|       8|    Stapler|Stationery| 50|              45.0|             54.0|
+--------+-----------+----------+---+------------------+-----------------+

Arrange values alphabetically after sorting values by category

It is possible in Spark to sort the records in a dataframe with respect to sub category. To accomplish this, we need to input multiple columns in the orderBy() function.

val sortedItems = priceAfterTax.orderBy(
  col("Category"),col("Item Name"))

sortedItems.show()

Output

+--------+-----------+----------+---+------------------+-----------------+
|Item no.|  Item Name|  Category|MRP|  Discounted Price|  Price After Tax|
+--------+-----------+----------+---+------------------+-----------------+
|       6|        Bag|   Apparel|455|             409.5|            491.4|
|       7|      Shoes|   Apparel|901|             810.9|    973.079999999|
|       3|      Jeans|   Clothes|799|             719.1|           862.92|
|       4|      Shirt|   Clothes|570|             513.0|            615.6|
|       2|     Butter|     Dairy| 57|     51.3000000004|            61.56|
|       5|Butter Milk|     Dairy| 50|              45.0|             54.0|
|       1|Paper Clips|Stationery| 23|              20.7|            24.84|
|       9|       Pens|Stationery|120|             108.0|            129.6|
|       8|    Stapler|Stationery| 50|              45.0|             54.0|
+--------+-----------+----------+---+------------------+-----------------+

Result: Categories are sorted alphabetically, and items within each category are also alphabetically sorted.

Note: Please pay attention to the order of column names in the orderBy() function, as Spark follows the specified order.

`sort()` function

The sort() Function behaves the same as orderBy() and is considered as an alias of it i.e., they return the same output.

val sortPrice = priceAfterTax.sort(
  col("Price After Tax").asc)

sortPrice.show()

Output

+--------+-----------+----------+---+------------------+-----------------+
|Item no.|  Item Name|  Category|MRP|  Discounted Price|  Price After Tax|
+--------+-----------+----------+---+------------------+-----------------+
|       1|Paper Clips|Stationery| 23|              20.7|            24.84|
|       5|Butter Milk|     Dairy| 50|              45.0|             54.0|
|       8|    Stapler|Stationery| 50|              45.0|             54.0|
|       2|     Butter|     Dairy| 57|     51.3000000004|            61.56|
|       9|       Pens|Stationery|120|             108.0|            129.6|
|       6|        Bag|   Apparel|455|             409.5|            491.4|
|       4|      Shirt|   Clothes|570|             513.0|            615.6|
|       3|      Jeans|   Clothes|799|             719.1|           862.92|
|       7|      Shoes|   Apparel|901|             810.9|    973.079999999|
+--------+-----------+----------+---+------------------+-----------------+

Note: By default, the sorting order in Spark is ascending order, unless otherwise specified. It is a good practice to mention how the ordering should be done.

Sort in descending order

To arrange the records in descending order, use .desc.

val sortPriceDesc = priceAfterTax.sort(
  col("Price After Tax").desc)

sortPriceDesc.show()

Output

+--------+-----------+----------+---+------------------+-----------------+
|Item no.|  Item Name|  Category|MRP|  Discounted Price|  Price After Tax|
+--------+-----------+----------+---+------------------+-----------------+
|       7|      Shoes|   Apparel|901|             810.9|    973.079999999|
|       3|      Jeans|   Clothes|799|             719.1|           862.92|
|       4|      Shirt|   Clothes|570|             513.0|            615.6|
|       6|        Bag|   Apparel|455|             409.5|            491.4|
|       9|       Pens|Stationery|120|             108.0|            129.6|
|       2|     Butter|     Dairy| 57|     51.3000000004|            61.56|
|       5|Butter Milk|     Dairy| 50|              45.0|             54.0|
|       8|    Stapler|Stationery| 50|              45.0|             54.0|
|       1|Paper Clips|Stationery| 23|              20.7|            24.84|
+--------+-----------+----------+---+------------------+-----------------+

Mixed Order Sorting: Descending + Ascending

This can be accomplished by using either orderBy() and sort() function, where we define the columns and in what order they are to be sorted.

val multiSort = priceAfterTax.orderBy(
  col("Category").desc, 
  col("Item Name").asc)

multiSort.show()

Output

+--------+-----------+----------+---+------------------+-----------------+
|Item no.|  Item Name|  Category|MRP|  Discounted Price|  Price After Tax|
+--------+-----------+----------+---+------------------+-----------------+
|       1|Paper Clips|Stationery| 23|              20.7|            24.84|
|       9|       Pens|Stationery|120|             108.0|            129.6|
|       8|    Stapler|Stationery| 50|              45.0|             54.0|
|       2|     Butter|     Dairy| 57|51.300000000000004|            61.56|
|       5|Butter Milk|     Dairy| 50|              45.0|             54.0|
|       3|      Jeans|   Clothes|799|             719.1|           862.92|
|       4|      Shirt|   Clothes|570|             513.0|            615.6|
|       6|        Bag|   Apparel|455|             409.5|            491.4|
|       7|      Shoes|   Apparel|901|             810.9|973.0799999999999|
+--------+-----------+----------+---+------------------+-----------------+

Summary

In this article, we explored how to sort records in a Spark DataFrame using the orderBy() and sort() functions. Sorting helps organize large datasets and improves performance.

You can arrange records alphabetically using orderBy("column_name").
To sort by multiple columns (e.g., category and item name), pass them in sequence to the orderBy() function.
The sort() function works the same as orderBy() and can be used interchangeably.
You can specify .asc or .desc with col() to control the sort order.
You can sort one column in descending order and another in ascending using multiple col() expressions.

Note: The order of columns in sorting matters. Spark follows the order provided in the function.

Sort Functions

Sort Alphabetically by Item Name

Arrange values alphabetically after sorting values by category

sort() function

Sort in descending order

Mixed Order Sorting: Descending + Ascending

Summary

Related Articles

References

`sort()` function