Math Functions: Add, Subtract, Multiply & Divide

Last updated on: 2025-05-30

Mathematical functions are used to transform data within a DataFrame for meaningful analysis and insights. These include basic arithmetic operations like:

  • Addition
  • Subtraction
  • Multiplication
  • Division
  • Average
  • Floor values
  • Ceil values
  • ...and many more.

Let’s explore these using the following sample DataFrame:

+----+--------+-----------+-----------+------------+
|Roll|    Name|Final Marks|Float Marks|Double Marks|
+----+--------+-----------+-----------+------------+
|   1|    Ajay|        300|       55.5|       92.75|
|   2|Bharghav|        350|       63.2|        88.5|
|   3| Chaitra|        320|       60.1|        75.8|
|   4|   Kamal|        360|       75.0|        82.3|
|   5|  Sohaib|        450|       70.8|        90.6|
+----+--------+-----------+-----------+------------+

Sum of columns of a DataFrame

We can add two columns using the .withColumn() method and + operator:

val sumCol = df.withColumn("Sum of scores", 
  col("Float Marks")+col("Double Marks")
)

sumCol.show()

Output

+----+--------+-----------+-----------+------------+------------------+
|Roll|    Name|Final Marks|Float Marks|Double Marks|     Sum of scores|
+----+--------+-----------+-----------+------------+------------------+
|   1|    Ajay|        300|       55.5|       92.75|            148.25|
|   2|Bharghav|        350|       63.2|        88.5|151.70000076293945|
|   3| Chaitra|        320|       60.1|        75.8| 135.8999984741211|
|   4|   Kamal|        360|       75.0|        82.3|             157.3|
|   5|  Sohaib|        450|       70.8|        90.6| 161.4000030517578|
+----+--------+-----------+-----------+------------+------------------+

Difference of two columns of a DataFrame

We can subtract two columns using the .withColumn() method and - operator:

val diffCol = df.withColumn("Difference of scores", 
  col("Double Marks")-col("Float Marks")
)

diffCol.show()

Output

+----+--------+-----------+-----------+------------+--------------------+
|Roll|    Name|Final Marks|Float Marks|Double Marks|Difference of scores|
+----+--------+-----------+-----------+------------+--------------------+
|   1|    Ajay|        300|       55.5|       92.75|               37.25|
|   2|Bharghav|        350|       63.2|        88.5|  25.299999237060547|
|   3| Chaitra|        320|       60.1|        75.8|  15.700001525878903|
|   4|   Kamal|        360|       75.0|        82.3|   7.299999999999997|
|   5|  Sohaib|        450|       70.8|        90.6|  19.799996948242182|
+----+--------+-----------+-----------+------------+--------------------+

Multiplication Operation on a dataframe column.

Product of column values follows the same using * operator and .withColumn().

val productCol = df.withColumn("Updated scores", 
  ("Float Marks")*1.5
)

productCol.show()

Output

+----+--------+-----------+-----------+------------+------------------+
|Roll|    Name|Final Marks|Float Marks|Double Marks|    Updated scores|
+----+--------+-----------+-----------+------------+------------------+
|   1|    Ajay|        300|       55.5|       92.75|             83.25|
|   2|Bharghav|        350|       63.2|        88.5| 94.80000114440918|
|   3| Chaitra|        320|       60.1|        75.8| 90.14999771118164|
|   4|   Kamal|        360|       75.0|        82.3|             112.5|
|   5|  Sohaib|        450|       70.8|        90.6|106.20000457763672|
+----+--------+-----------+-----------+------------+------------------+

Division Operation on a DataFrame column

column values of a dataframe can be divided using / operator and .withColumn() method.

val divisionCol = df.withColumn("Divided Score", 
  col("Final Marks")/2
)

divisionCol.show()

Output

+----+--------+-----------+-----------+------------+-------------+
|Roll|    Name|Final Marks|Float Marks|Double Marks|Divided Score|
+----+--------+-----------+-----------+------------+-------------+
|   1|    Ajay|        300|       55.5|       92.75|        150.0|
|   2|Bharghav|        350|       63.2|        88.5|        175.0|
|   3| Chaitra|        320|       60.1|        75.8|        160.0|
|   4|   Kamal|        360|       75.0|        82.3|        180.0|
|   5|  Sohaib|        450|       70.8|        90.6|        225.0|
+----+--------+-----------+-----------+------------+-------------+

Beyond Basics

Other commonly used mathematical functions include:

  • Square root (sqrt)

  • Absolute value (abs)

  • Trigonometric operations (sin, cos, tan)

  • Rounding functions (floor, ceil, round)

These are covered in detail in the `Advanced Mathematical Functions' Advanced Mathematical Functions article.

Aggregate Functions

Aggregate functions help summarize large datasets by returning a single result per group or overall. Examples include:

  • sum()

  • count()

  • min()

  • max()

  • avg()

These will be explored further in the Aggregate Functions Aggregate Functions article.

Summary

In this article, we explored:

  • Basic mathematical operations in Spark using .withColumn()

  • Real DataFrame examples of addition, subtraction, multiplication, and division

  • A glimpse into advanced and aggregate mathematical functions

These operations are fundamental for data cleaning, transformation, and deriving insights from your datasets.

References