DateTime Datatype
Last updated on: 2025-05-30
When working with large-scale datasets, especially in domains like e-commerce, it’s common to encounter numerous records that include date and timestamp fields. For example, each transaction may have associated timestamps such as order placement time, dispatch time, etc. To efficiently manage and analyze such data, it's important to be well-versed with date and timestamp functions in Spark. These functions are useful for extracting, transforming, and presenting temporal data effectively.
Let’s use the following DataFrame as an example:
+----+--------+-----+---------------+
|Roll| Name|Marks|Enrollment Date|
+----+--------+-----+---------------+
| 1| Ajay| 55| 2024-09-15|
| 2|Bharghav| 63| 2024-09-10|
| 3| Chaitra| 60| 2024-09-20|
| 4| Kamal| 75| 2024-09-12|
| 5| Sohaib| 70| 2024-09-18|
+----+--------+-----+---------------+
Extracting Year, Month, and Day from a Date Field
To retrieve the year, month, and day from a column with date values, Spark provides functions like year()
, month()
, and dayofmonth()
:
val dfWithComponents = studentData.withColumn("Enrollment Year",
year(col("Enrollment Date")))
.withColumn("Enrollment Month",
month(col("Enrollment Date")))
.withColumn("Enrollment Day",
dayofmonth(col("Enrollment Date")))
dfWithComponents.show()
Output
+----+--------+-----+---------------+---------------+----------------+--------------+
|Roll| Name|Marks|Enrollment Date|Enrollment Year|Enrollment Month|Enrollment Day|
+----+--------+-----+---------------+---------------+----------------+--------------+
| 1| Ajay| 55| 2024-09-15| 2024| 9| 15|
| 2|Bharghav| 63| 2024-09-10| 2024| 9| 10|
| 3| Chaitra| 60| 2024-09-20| 2024| 9| 20|
| 4| Kamal| 75| 2024-09-12| 2024| 9| 12|
| 5| Sohaib| 70| 2024-09-18| 2024| 9| 18|
+----+--------+-----+---------------+---------------+----------------+--------------+
Deriving Day of Year, Week Number, and Quarter
Similar functions like dayofyear()
, weekofyear()
, and quarter()
can help extract higher-level date components:
val moreComponents = studentData.withColumn("Day of Year",
dayofyear(col("Enrollment Date")))
.withColumn("Week of Year",
weekofyear(col("Enrollment Date")))
.withColumn("Quarter",
quarter(col("Enrollment Date")))
moreComponents.show()
Output
+----+--------+-----+---------------+-----------+------------+-------+
|Roll| Name|Marks|Enrollment Date|Day of Year|Week of Year|Quarter|
+----+--------+-----+---------------+-----------+------------+-------+
| 1| Ajay| 55| 2024-09-15| 259| 37| 3|
| 2|Bharghav| 63| 2024-09-10| 254| 37| 3|
| 3| Chaitra| 60| 2024-09-20| 264| 38| 3|
| 4| Kamal| 75| 2024-09-12| 256| 37| 3|
| 5| Sohaib| 70| 2024-09-18| 262| 38| 3|
+----+--------+-----+---------------+-----------+------------+-------+
Formatting Date Values
Customizing how dates appear is helpful for readability and localization.
We use the method date_format()
to change the date format.
val dfFormatted = studentData.withColumn("Formatted Date",
date_format(col("Enrollment Date"),
"yyyy-MM-dd"))
.withColumn("Another Format",
date_format(col("Enrollment Date"),
"MM/dd/yyyy"))
.withColumn("Month Name",
date_format(col("Enrollment Date"),
"MMMM"))
dfFormatted.show()
Output
+----+--------+-----+---------------+--------------+--------------+----------+
|Roll| Name|Marks|Enrollment Date|Formatted Date|Another Format|Month Name|
+----+--------+-----+---------------+--------------+--------------+----------+
| 1| Ajay| 55| 2024-09-15| 2024-09-15| 09/15/2024| September|
| 2|Bharghav| 63| 2024-09-10| 2024-09-10| 09/10/2024| September|
| 3| Chaitra| 60| 2024-09-20| 2024-09-20| 09/20/2024| September|
| 4| Kamal| 75| 2024-09-12| 2024-09-12| 09/12/2024| September|
| 5| Sohaib| 70| 2024-09-18| 2024-09-18| 09/18/2024| September|
+----+--------+-----+---------------+--------------+--------------+----------+
Retrieving the Current Timestamp
To capture the system’s current timestamp, Spark offers the current_timestamp()
function.
val dfWithTimestamp = studentData.withColumn("Registration Time", current_timestamp())
dfWithTimestamp.show(truncate = false)
Output
+----+--------+-----+---------------+----------------------+
|Roll|Name |Marks|Enrollment Date|Registration Time |
+----+--------+-----+---------------+----------------------+
|1 |Ajay |55 |2024-09-15 |2025-03-05 10:17:51.87|
|2 |Bharghav|63 |2024-09-10 |2025-03-05 10:17:51.87|
|3 |Chaitra |60 |2024-09-20 |2025-03-05 10:17:51.87|
|4 |Kamal |75 |2024-09-12 |2025-03-05 10:17:51.87|
|5 |Sohaib |70 |2024-09-18 |2025-03-05 10:17:51.87|
+----+--------+-----+---------------+----------------------+
Converting Timestamps to a Specific Time Zone
To adjust timestamps for different regions, use the from_utc_timestamp()
function, specifying the desired time zone (e.g., "Asia/Kolkata"):
val dfWithTimeZone = dfWithTimestamp.withColumn("IST Registration Time",
from_utc_timestamp(col("Registration Time"),
"Asia/Kolkata")
)
dfWithTimeZone.show(truncate = false)
Output
+----+--------+-----+---------------+-----------------------+-----------------------+
|Roll|Name |Marks|Enrollment Date|Registration Time |IST Registration Time |
+----+--------+-----+---------------+-----------------------+-----------------------+
|1 |Ajay |55 |2024-09-15 |2025-03-05 10:17:51.965|2025-03-05 15:47:51.965|
|2 |Bharghav|63 |2024-09-10 |2025-03-05 10:17:51.965|2025-03-05 15:47:51.965|
|3 |Chaitra |60 |2024-09-20 |2025-03-05 10:17:51.965|2025-03-05 15:47:51.965|
|4 |Kamal |75 |2024-09-12 |2025-03-05 10:17:51.965|2025-03-05 15:47:51.965|
|5 |Sohaib |70 |2024-09-18 |2025-03-05 10:17:51.965|2025-03-05 15:47:51.965|
+----+--------+-----+---------------+-----------------------+-----------------------+
Summary
This article walked through how to:
-
Work with datetime fields in a Spark DataFrame
-
Extract components such as year, month, day, week, and quarter
-
Reformat dates for clarity
-
Capture the current timestamp
-
Convert timestamps to specific time zones
Mastering these operations is crucial for efficient time-based data analysis in Spark.