Sort Functions
Last updated on: 2025-05-30
When working with large datasets, the data can often be unstructured and include null values, making it difficult to read or process efficiently. Spark provides functions to sort and organize this data, which can help optimize performance and improve readability.
In this article, we'll explore how to sort records in Spark DataFrames using the following functions:
-
orderBy()
-
sort()
(alias fororderBy()
)
We’ll use the Shopping Bill DataFrame from our previous discussions on grouping operations. To revisit those topics, check out these articles:
Here’s our sample DataFrame:
+--------+-----------+----------+---+------------------+-----------------+
|Item no.| Item Name| Category|MRP| Discounted Price| Price After Tax|
+--------+-----------+----------+---+------------------+-----------------+
| 1|Paper Clips|Stationery| 23| 20.7| 24.84|
| 2| Butter| Dairy| 57| 51.3000000004| 61.56|
| 3| Jeans| Clothes|799| 719.1| 862.92|
| 4| Shirt| Clothes|570| 513.0| 615.6|
| 5|Butter Milk| Dairy| 50| 45.0| 54.0|
| 6| Bag| Apparel|455| 409.5| 491.4|
| 7| Shoes| Apparel|901| 810.9| 973.079999999|
| 8| Stapler|Stationery| 50| 45.0| 54.0|
| 9| Pens|Stationery|120| 108.0| 129.6|
+--------+-----------+----------+---+------------------+-----------------+
Sort Alphabetically by Item Name
Use orderBy() to sort the DataFrame alphabetically by Item Name.
val sortPriceAfterTax = priceAfterTax.orderBy(col("Item Name"))
sortPriceAfterTax.show()
Output
+--------+-----------+----------+---+------------------+-----------------+
|Item no.| Item Name| Category|MRP| Discounted Price| Price After Tax|
+--------+-----------+----------+---+------------------+-----------------+
| 6| Bag| Apparel|455| 409.5| 491.4|
| 2| Butter| Dairy| 57| 51.3000000004| 61.56|
| 5|Butter Milk| Dairy| 50| 45.0| 54.0|
| 3| Jeans| Clothes|799| 719.1| 862.92|
| 1|Paper Clips|Stationery| 23| 20.7| 24.84|
| 9| Pens|Stationery|120| 108.0| 129.6|
| 4| Shirt| Clothes|570| 513.0| 615.6|
| 7| Shoes| Apparel|901| 810.9| 973.079999999|
| 8| Stapler|Stationery| 50| 45.0| 54.0|
+--------+-----------+----------+---+------------------+-----------------+
Arrange values alphabetically after sorting values by category
It is possible in Spark to sort the records in a dataframe with respect to sub category. To accomplish this, we need to input multiple columns in the orderBy()
function.
val sortedItems = priceAfterTax.orderBy(
col("Category"),col("Item Name"))
sortedItems.show()
Output
+--------+-----------+----------+---+------------------+-----------------+
|Item no.| Item Name| Category|MRP| Discounted Price| Price After Tax|
+--------+-----------+----------+---+------------------+-----------------+
| 6| Bag| Apparel|455| 409.5| 491.4|
| 7| Shoes| Apparel|901| 810.9| 973.079999999|
| 3| Jeans| Clothes|799| 719.1| 862.92|
| 4| Shirt| Clothes|570| 513.0| 615.6|
| 2| Butter| Dairy| 57| 51.3000000004| 61.56|
| 5|Butter Milk| Dairy| 50| 45.0| 54.0|
| 1|Paper Clips|Stationery| 23| 20.7| 24.84|
| 9| Pens|Stationery|120| 108.0| 129.6|
| 8| Stapler|Stationery| 50| 45.0| 54.0|
+--------+-----------+----------+---+------------------+-----------------+
Result: Categories are sorted alphabetically, and items within each category are also alphabetically sorted.
Note: Please pay attention to the order of column names in the orderBy()
function, as Spark follows the specified order.
sort()
function
The sort() Function behaves the same as orderBy() and is considered as an alias of it i.e., they return the same output.
val sortPrice = priceAfterTax.sort(
col("Price After Tax").asc)
sortPrice.show()
Output
+--------+-----------+----------+---+------------------+-----------------+
|Item no.| Item Name| Category|MRP| Discounted Price| Price After Tax|
+--------+-----------+----------+---+------------------+-----------------+
| 1|Paper Clips|Stationery| 23| 20.7| 24.84|
| 5|Butter Milk| Dairy| 50| 45.0| 54.0|
| 8| Stapler|Stationery| 50| 45.0| 54.0|
| 2| Butter| Dairy| 57| 51.3000000004| 61.56|
| 9| Pens|Stationery|120| 108.0| 129.6|
| 6| Bag| Apparel|455| 409.5| 491.4|
| 4| Shirt| Clothes|570| 513.0| 615.6|
| 3| Jeans| Clothes|799| 719.1| 862.92|
| 7| Shoes| Apparel|901| 810.9| 973.079999999|
+--------+-----------+----------+---+------------------+-----------------+
Note: By default, the sorting order in Spark is ascending order, unless otherwise specified. It is a good practice to mention how the ordering should be done.
Sort in descending order
To arrange the records in descending order, use .desc
.
val sortPriceDesc = priceAfterTax.sort(
col("Price After Tax").desc)
sortPriceDesc.show()
Output
+--------+-----------+----------+---+------------------+-----------------+
|Item no.| Item Name| Category|MRP| Discounted Price| Price After Tax|
+--------+-----------+----------+---+------------------+-----------------+
| 7| Shoes| Apparel|901| 810.9| 973.079999999|
| 3| Jeans| Clothes|799| 719.1| 862.92|
| 4| Shirt| Clothes|570| 513.0| 615.6|
| 6| Bag| Apparel|455| 409.5| 491.4|
| 9| Pens|Stationery|120| 108.0| 129.6|
| 2| Butter| Dairy| 57| 51.3000000004| 61.56|
| 5|Butter Milk| Dairy| 50| 45.0| 54.0|
| 8| Stapler|Stationery| 50| 45.0| 54.0|
| 1|Paper Clips|Stationery| 23| 20.7| 24.84|
+--------+-----------+----------+---+------------------+-----------------+
Mixed Order Sorting: Descending + Ascending
This can be accomplished by using either orderBy()
and sort()
function, where we define the columns and in what order they are to be sorted.
val multiSort = priceAfterTax.orderBy(
col("Category").desc,
col("Item Name").asc)
multiSort.show()
Output
+--------+-----------+----------+---+------------------+-----------------+
|Item no.| Item Name| Category|MRP| Discounted Price| Price After Tax|
+--------+-----------+----------+---+------------------+-----------------+
| 1|Paper Clips|Stationery| 23| 20.7| 24.84|
| 9| Pens|Stationery|120| 108.0| 129.6|
| 8| Stapler|Stationery| 50| 45.0| 54.0|
| 2| Butter| Dairy| 57|51.300000000000004| 61.56|
| 5|Butter Milk| Dairy| 50| 45.0| 54.0|
| 3| Jeans| Clothes|799| 719.1| 862.92|
| 4| Shirt| Clothes|570| 513.0| 615.6|
| 6| Bag| Apparel|455| 409.5| 491.4|
| 7| Shoes| Apparel|901| 810.9|973.0799999999999|
+--------+-----------+----------+---+------------------+-----------------+
Summary
In this article, we explored how to sort records in a Spark DataFrame using the orderBy()
and sort()
functions. Sorting helps organize large datasets and improves performance.
- You can arrange records alphabetically using
orderBy("column_name")
. - To sort by multiple columns (e.g., category and item name), pass them in sequence to the
orderBy()
function. - The
sort()
function works the same asorderBy()
and can be used interchangeably. - You can specify
.asc
or.desc
withcol()
to control the sort order. - You can sort one column in descending order and another in ascending using multiple
col()
expressions.
Note: The order of columns in sorting matters. Spark follows the order provided in the function.