Row Operations
Last updated on: 2025-05-30
In the previous article, we explored column operations on a DataFrame. In this article, we’ll focus on row operations on a DataFrame. Let's consider the following DataFrame to understand row operations.
+----+--------+-----+
|Roll| Name|Marks|
+----+--------+-----+
| 1| Ajay| 55|
| 2|Bharghav| 63|
| 3| Chaitra| 60|
| 4| Kamal| 75|
| 5| Sohaib| 70|
+----+--------+-----+
Add a New Row to an Existing DataFrame
Spark dataframes are immutable, which implies that new rows can't be added directly to the existing dataframe.
To add a new row, you must create a new DataFrame and combine it with the original one using the .union()
method.
val newRow = Seq(
(6,"Tanmay", 77)
).toDF("Roll", "Name", "Marks")
val updatedDF = df.union(newRow)
updateDf.show()
Output
+----+--------+-----+
|Roll| Name|Marks|
+----+--------+-----+
| 1| Ajay| 55|
| 2|Bharghav| 63|
| 3| Chaitra| 60|
| 4| Kamal| 75|
| 5| Sohaib| 70|
| 6| Tanmay| 77|
+----+--------+-----+
Add Multiple Rows to an existing DataFrame
To add multiple rows, follow a similar approach:
val multipleRows = Seq(
(7,"Tina",67),
(8,"Utkarsh",65)
).toDF("Roll", "Name", "Marks")
val moreRows = updatedDF.union(multipleRows)
moreRows.show()
Output
+----+--------+-----+
|Roll| Name|Marks|
+----+--------+-----+
| 1| Ajay| 55|
| 2|Bharghav| 63|
| 3| Chaitra| 60|
| 4| Kamal| 75|
| 5| Sohaib| 70|
| 6| Tanmay| 77|
| 7| Tina| 67|
| 8| Utkarsh| 65|
+----+--------+-----+
Delete rows from a DataFrame
As DataFrames are immutable, rows can’t be removed directly. Instead, we use filtering to exclude unwanted rows.
// the below line executes and displays all the rows of dataframe with Marks greater than 63
val delRow = df.filter(!($"Marks" <=63))
delRow.show()
Output
+----+------+-----+
|Roll| Name|Marks|
+----+------+-----+
| 4| Kamal| 75|
| 5|Sohaib| 70|
+----+------+-----+
Alternate Way to Delete Rows: expr() + except()
expr()
function is used to determine which rows to be kept and which ones have to be returned. except()
will exclude the records that were returned by expr()
.
These functions are available in sql.function._
object, and we have to call the object before creating the spark session.
We will discuss expr()
function in Spark Expressions
val rmRows = updatedDF.filter(expr("Marks % 2 = 0"))
val newDf = updatedDF.except(rmRows)
newDf.show()
Output
+----+--------+-----+
|Roll| Name|Marks|
+----+--------+-----+
| 1| Ajay| 55|
| 2|Bharghav| 63|
| 4| Kamal| 75|
| 6| Tanmay| 77|
+----+--------+-----+
Get Distinct Rows of the DataFrame
Consider the following DataFrame (with a duplicate row):
+----+--------+-----+
|Roll| Name|Marks|
+----+--------+-----+
| 1| Ajay| 55|
| 2|Bharghav| 63|
| 3| Chaitra| 60|
| 4| Kamal| 75|
| 5| Sohaib| 70|
| 6| Tanmay| 77|
| 7| Tina| 67|
| 8| Utkarsh| 65|
| 8| Utkarsh| 65|
+----+--------+-----+
To retrieve only distinct rows from the dataframe, we use distinct()
function.
val distinctRows = moreRows.distinct()
distinctRows.show()
Output
+----+--------+-----+
|Roll| Name|Marks|
+----+--------+-----+
| 1| Ajay| 55|
| 2|Bharghav| 63|
| 3| Chaitra| 60|
| 4| Kamal| 75|
| 5| Sohaib| 70|
| 6| Tanmay| 77|
| 7| Tina| 67|
| 8| Utkarsh| 65|
+----+--------+-----+
Sort Rows of a DataFrame
Use orderBy()
method to sort the DataFrame by a specific column values.
val descendingOrder = moreRows.orderBy(
col("Marks").desc
)
descendingOrder.show()
val ascendingOrder = moreRows.orderBy(
col("Marks").asc
)
ascendingOrder.show()
Output
+----+--------+-----+
|Roll| Name|Marks|
+----+--------+-----+
| 6| Tanmay| 77|
| 4| Kamal| 75|
| 5| Sohaib| 70|
| 7| Tina| 67|
| 8| Utkarsh| 65|
| 2|Bharghav| 63|
| 3| Chaitra| 60|
| 1| Ajay| 55|
+----+--------+-----+
+----+--------+-----+
|Roll| Name|Marks|
+----+--------+-----+
| 1| Ajay| 55|
| 3| Chaitra| 60|
| 2|Bharghav| 63|
| 8| Utkarsh| 65|
| 7| Tina| 67|
| 5| Sohaib| 70|
| 4| Kamal| 75|
| 6| Tanmay| 77|
+----+--------+-----+
A detailed explanation of orderBy()
and sorting functions is available in the Sort functions article.
Summary
In this article, you learned about:
-
The immutability of Spark DataFrames and how it affects row-level operations.
-
Adding single and multiple rows using union().
-
Deleting rows using filter() and except().
-
Retrieving distinct rows using distinct().
-
Sorting rows using orderBy().
In the next article, we’ll explore Spark Expressions and how they help in building complex row-wise logic.