Math Functions: Add, Subtract, Multiply & Divide
Last updated on: 2025-05-30
Mathematical functions are used to transform data within a DataFrame for meaningful analysis and insights. These include basic arithmetic operations like:
- Addition
- Subtraction
- Multiplication
- Division
- Average
- Floor values
- Ceil values
- ...and many more.
Let’s explore these using the following sample DataFrame:
+----+--------+-----------+-----------+------------+
|Roll| Name|Final Marks|Float Marks|Double Marks|
+----+--------+-----------+-----------+------------+
| 1| Ajay| 300| 55.5| 92.75|
| 2|Bharghav| 350| 63.2| 88.5|
| 3| Chaitra| 320| 60.1| 75.8|
| 4| Kamal| 360| 75.0| 82.3|
| 5| Sohaib| 450| 70.8| 90.6|
+----+--------+-----------+-----------+------------+
Sum of columns of a DataFrame
We can add two columns using the .withColumn() method and + operator:
val sumCol = df.withColumn("Sum of scores",
col("Float Marks")+col("Double Marks")
)
sumCol.show()
Output
+----+--------+-----------+-----------+------------+------------------+
|Roll| Name|Final Marks|Float Marks|Double Marks| Sum of scores|
+----+--------+-----------+-----------+------------+------------------+
| 1| Ajay| 300| 55.5| 92.75| 148.25|
| 2|Bharghav| 350| 63.2| 88.5|151.70000076293945|
| 3| Chaitra| 320| 60.1| 75.8| 135.8999984741211|
| 4| Kamal| 360| 75.0| 82.3| 157.3|
| 5| Sohaib| 450| 70.8| 90.6| 161.4000030517578|
+----+--------+-----------+-----------+------------+------------------+
Difference of two columns of a DataFrame
We can subtract two columns using the .withColumn() method and - operator:
val diffCol = df.withColumn("Difference of scores",
col("Double Marks")-col("Float Marks")
)
diffCol.show()
Output
+----+--------+-----------+-----------+------------+--------------------+
|Roll| Name|Final Marks|Float Marks|Double Marks|Difference of scores|
+----+--------+-----------+-----------+------------+--------------------+
| 1| Ajay| 300| 55.5| 92.75| 37.25|
| 2|Bharghav| 350| 63.2| 88.5| 25.299999237060547|
| 3| Chaitra| 320| 60.1| 75.8| 15.700001525878903|
| 4| Kamal| 360| 75.0| 82.3| 7.299999999999997|
| 5| Sohaib| 450| 70.8| 90.6| 19.799996948242182|
+----+--------+-----------+-----------+------------+--------------------+
Multiplication Operation on a dataframe column.
Product of column values follows the same using * operator and .withColumn()
.
val productCol = df.withColumn("Updated scores",
("Float Marks")*1.5
)
productCol.show()
Output
+----+--------+-----------+-----------+------------+------------------+
|Roll| Name|Final Marks|Float Marks|Double Marks| Updated scores|
+----+--------+-----------+-----------+------------+------------------+
| 1| Ajay| 300| 55.5| 92.75| 83.25|
| 2|Bharghav| 350| 63.2| 88.5| 94.80000114440918|
| 3| Chaitra| 320| 60.1| 75.8| 90.14999771118164|
| 4| Kamal| 360| 75.0| 82.3| 112.5|
| 5| Sohaib| 450| 70.8| 90.6|106.20000457763672|
+----+--------+-----------+-----------+------------+------------------+
Division Operation on a DataFrame column
column values of a dataframe can be divided using /
operator and .withColumn()
method.
val divisionCol = df.withColumn("Divided Score",
col("Final Marks")/2
)
divisionCol.show()
Output
+----+--------+-----------+-----------+------------+-------------+
|Roll| Name|Final Marks|Float Marks|Double Marks|Divided Score|
+----+--------+-----------+-----------+------------+-------------+
| 1| Ajay| 300| 55.5| 92.75| 150.0|
| 2|Bharghav| 350| 63.2| 88.5| 175.0|
| 3| Chaitra| 320| 60.1| 75.8| 160.0|
| 4| Kamal| 360| 75.0| 82.3| 180.0|
| 5| Sohaib| 450| 70.8| 90.6| 225.0|
+----+--------+-----------+-----------+------------+-------------+
Beyond Basics
Other commonly used mathematical functions include:
-
Square root (sqrt)
-
Absolute value (abs)
-
Trigonometric operations (sin, cos, tan)
-
Rounding functions (floor, ceil, round)
These are covered in detail in the `Advanced Mathematical Functions' Advanced Mathematical Functions article.
Aggregate Functions
Aggregate functions help summarize large datasets by returning a single result per group or overall. Examples include:
-
sum()
-
count()
-
min()
-
max()
-
avg()
These will be explored further in the Aggregate Functions
Aggregate Functions article.
Summary
In this article, we explored:
-
Basic mathematical operations in Spark using
.withColumn()
-
Real DataFrame examples of addition, subtraction, multiplication, and division
-
A glimpse into advanced and aggregate mathematical functions
These operations are fundamental for data cleaning, transformation, and deriving insights from your datasets.