Date and Time Parsing

Last updated on: 2025-05-30

In today’s world, date and time information is critical across many domains. Whether it's tracking financial transactions or recording student exam timings, date and time data is used everywhere.

However, working with this type of data can be challenging. Different regions or systems may store date and time in different formats, which can lead to errors or difficulties during data analysis.

To solve this issue, Apache Spark provides options to correctly read and process date and time values from files. In this article, we’ll learn how to handle such values using Spark.

Sample JSON File: studentSubmitData.json Consider the following JSON file with student data - studentSubmitData.json

{
    "Roll": 1,
    "Name": "Ajay",
    "Age": 14,
    "DOB": "01/01/2010",
    "Submit Time": "2025/02/17 12:30:45",
    "Final Marks": 92.7
  },
  {
    "Roll": 2,
    "Name": "Bharghav",
    "Age": 15,
    "DOB": "04/06/2009",
    "Submit Time": "2025/02/17 12:35:30",
    "Final Marks": 88.5
  },
...

1. Parsing date values in JSON file

JSON does not have a built-in Date or TimeStamp format. It only stores values in String or Numeric or other equivalent data types. Thus, the column DOB is treated as String Let’s read the JSON file:

val submitData = spark.read
  .option("multiLine", "true")
  .option("inferSchema", "true")
  .json("jsonFiles/studentSubmitData.json")

submitData.printSchema()

Output

root
 |-- Age: long (nullable = true)
 |-- DOB: string (nullable = true)
 |-- Final Marks: double (nullable = true)
 |-- Name: string (nullable = true)
 |-- Roll: long (nullable = true)
 |-- Submit Time: string (nullable = true)

As you can see, the DOB and Submit Time columns are interpreted as string types.

2. Converting String to Date Format

To convert the DOB field to a proper Date type, we can use the to_date() function from org.apache.spark.sql.functions.

import org.apache.spark.sql.functions._

val submitData = spark.read
  .option("multiLine", "true")
  .option("inferSchema", "true")
  .json("jsonFiles/studentSubmitData.json")
  .withColumn("DOB", to_date(col("DOB"), "dd/MM/yyyy"))

submitData.show()
submitData.printSchema()

Output

+---+----------+-----------+--------+----+-------------------+
|Age|       DOB|Final Marks|    Name|Roll|        Submit Time|
+---+----------+-----------+--------+----+-------------------+
| 14|2010-01-01|       92.7|    Ajay|   1|2025/02/17 12:30:45|
| 15|2009-06-04|       88.5|Bharghav|   2|2025/02/17 12:35:30|
| 13|2010-12-12|       75.8| Chaitra|   3|2025/02/17 12:45:10|
| 14|2010-08-25|       82.3|   Kamal|   4|2025/02/17 12:40:05|
| 13|2009-04-14|       90.6|  Sohaib|   5|2025/02/17 12:55:20|
+---+----------+-----------+--------+----+-------------------+

root
 |-- Age: long (nullable = true)
 |-- DOB: date (nullable = true)
 |-- Final Marks: double (nullable = true)
 |-- Name: string (nullable = true)
 |-- Roll: long (nullable = true)
 |-- Submit Time: string (nullable = true)

Now, DOB is correctly converted to the Date data type.

3. Parsing JSON files with timestamp records

To convert the Submit Time field into a Timestamp, we use the to_timestamp() function. This function also requires the correct format string.

val submitTime = spark.read
  .option("multiLine", "true")
  .option("inferSchema", "true")
  .json("jsonFiles/studentSubmitData.json")
  .withColumn("DOB", to_date(col("DOB"), "dd/MM/yyyy"))
  .withColumn("Submit Time", to_timestamp(col("Submit Time"), "yyyy/MM/dd HH:mm:ss"))

submitTime.show()
submitTime.printSchema()

Output

+---+----------+-----------+--------+----+-------------------+
|Age|       DOB|Final Marks|    Name|Roll|        Submit Time|
+---+----------+-----------+--------+----+-------------------+
| 14|2010-01-01|       92.7|    Ajay|   1|2025-02-17 12:30:45|
| 15|2009-06-04|       88.5|Bharghav|   2|2025-02-17 12:35:30|
| 13|2010-12-12|       75.8| Chaitra|   3|2025-02-17 12:45:10|
| 14|2010-08-25|       82.3|   Kamal|   4|2025-02-17 12:40:05|
| 13|2009-04-14|       90.6|  Sohaib|   5|2025-02-17 12:55:20|
+---+----------+-----------+--------+----+-------------------+

root
 |-- Age: long (nullable = true)
 |-- DOB: date (nullable = true)
 |-- Final Marks: double (nullable = true)
 |-- Name: string (nullable = true)
 |-- Roll: long (nullable = true)
 |-- Submit Time: timestamp (nullable = true)

The Submit Time field is now properly formatted as a Timestamp.

Summary

In this article, we learned:

  • JSON files store date and time values as strings by default.

  • We can convert string columns to proper Date and Timestamp types using to_date() and to_timestamp() functions in Spark.

  • Correct parsing of date and time fields is important for accurate analysis, especially when working with non-standard formats.

References