Row Operations

Last updated on: 2025-05-30

In the previous article, we explored column operations on a DataFrame. In this article, we’ll focus on row operations on a DataFrame. Let's consider the following DataFrame to understand row operations.

+----+--------+-----+
|Roll|    Name|Marks|
+----+--------+-----+
|   1|    Ajay|   55|
|   2|Bharghav|   63|
|   3| Chaitra|   60|
|   4|   Kamal|   75|
|   5|  Sohaib|   70|
+----+--------+-----+

Add a New Row to an Existing DataFrame

Spark dataframes are immutable, which implies that new rows can't be added directly to the existing dataframe. To add a new row, you must create a new DataFrame and combine it with the original one using the .union() method.

val newRow = Seq(
  (6,"Tanmay", 77)
).toDF("Roll", "Name", "Marks")

val updatedDF = df.union(newRow)

updateDf.show()

Output

+----+--------+-----+
|Roll|    Name|Marks|
+----+--------+-----+
|   1|    Ajay|   55|
|   2|Bharghav|   63|
|   3| Chaitra|   60|
|   4|   Kamal|   75|
|   5|  Sohaib|   70|
|   6|  Tanmay|   77|
+----+--------+-----+

Add Multiple Rows to an existing DataFrame

To add multiple rows, follow a similar approach:

val multipleRows = Seq(
  (7,"Tina",67),
  (8,"Utkarsh",65)
).toDF("Roll", "Name", "Marks")

val moreRows = updatedDF.union(multipleRows)

moreRows.show()

Output

+----+--------+-----+
|Roll|    Name|Marks|
+----+--------+-----+
|   1|    Ajay|   55|
|   2|Bharghav|   63|
|   3| Chaitra|   60|
|   4|   Kamal|   75|
|   5|  Sohaib|   70|
|   6|  Tanmay|   77|
|   7|    Tina|   67|
|   8| Utkarsh|   65|
+----+--------+-----+

Delete rows from a DataFrame

As DataFrames are immutable, rows can’t be removed directly. Instead, we use filtering to exclude unwanted rows.

// the below line executes and displays all the rows of dataframe with Marks greater than 63
val delRow = df.filter(!($"Marks" <=63))

delRow.show()

Output

+----+------+-----+
|Roll|  Name|Marks|
+----+------+-----+
|   4| Kamal|   75|
|   5|Sohaib|   70|
+----+------+-----+

Alternate Way to Delete Rows: expr() + except()

expr() function is used to determine which rows to be kept and which ones have to be returned. except() will exclude the records that were returned by expr().

These functions are available in sql.function._ object, and we have to call the object before creating the spark session. We will discuss expr() function in Spark Expressions

val rmRows = updatedDF.filter(expr("Marks % 2 = 0"))
val newDf = updatedDF.except(rmRows)

newDf.show()

Output

+----+--------+-----+
|Roll|    Name|Marks|
+----+--------+-----+
|   1|    Ajay|   55|
|   2|Bharghav|   63|
|   4|   Kamal|   75|
|   6|  Tanmay|   77|
+----+--------+-----+

Get Distinct Rows of the DataFrame

Consider the following DataFrame (with a duplicate row):

+----+--------+-----+
|Roll|    Name|Marks|
+----+--------+-----+
|   1|    Ajay|   55|
|   2|Bharghav|   63|
|   3| Chaitra|   60|
|   4|   Kamal|   75|
|   5|  Sohaib|   70|
|   6|  Tanmay|   77|
|   7|    Tina|   67|
|   8| Utkarsh|   65|
|   8| Utkarsh|   65|
+----+--------+-----+

To retrieve only distinct rows from the dataframe, we use distinct() function.

val distinctRows = moreRows.distinct()

distinctRows.show()

Output

+----+--------+-----+
|Roll|    Name|Marks|
+----+--------+-----+
|   1|    Ajay|   55|
|   2|Bharghav|   63|
|   3| Chaitra|   60|
|   4|   Kamal|   75|
|   5|  Sohaib|   70|
|   6|  Tanmay|   77|
|   7|    Tina|   67|
|   8| Utkarsh|   65|
+----+--------+-----+

Sort Rows of a DataFrame

Use orderBy() method to sort the DataFrame by a specific column values.

val descendingOrder = moreRows.orderBy(
  col("Marks").desc
)

descendingOrder.show()

val ascendingOrder = moreRows.orderBy(
  col("Marks").asc
)

ascendingOrder.show()

Output

+----+--------+-----+
|Roll|    Name|Marks|
+----+--------+-----+
|   6|  Tanmay|   77|
|   4|   Kamal|   75|
|   5|  Sohaib|   70|
|   7|    Tina|   67|
|   8| Utkarsh|   65|
|   2|Bharghav|   63|
|   3| Chaitra|   60|
|   1|    Ajay|   55|
+----+--------+-----+

+----+--------+-----+
|Roll|    Name|Marks|
+----+--------+-----+
|   1|    Ajay|   55|
|   3| Chaitra|   60|
|   2|Bharghav|   63|
|   8| Utkarsh|   65|
|   7|    Tina|   67|
|   5|  Sohaib|   70|
|   4|   Kamal|   75|
|   6|  Tanmay|   77|
+----+--------+-----+

A detailed explanation of orderBy() and sorting functions is available in the Sort functions article.

Summary

In this article, you learned about:

  • The immutability of Spark DataFrames and how it affects row-level operations.

  • Adding single and multiple rows using union().

  • Deleting rows using filter() and except().

  • Retrieving distinct rows using distinct().

  • Sorting rows using orderBy().

In the next article, we’ll explore Spark Expressions and how they help in building complex row-wise logic.

References