Struct Data type

Last updated on: 2025-05-30

StructType is a complex data type in apache spark. It is used to store groups of multiple columns as a single entity. Simply put, structs are dataframes within dataframes. Typically, StructType is a collection of StructField elements, each of which includes:

A name
A datatype
A nullable flag - a flag indicating whether it allows null values

Creating a dataframe with StructType

val studentSchema = StructType(Array(
      StructField("ID", IntegerType, nullable = false),
      StructField("Name", StringType, nullable = false),
      StructField("Marks", IntegerType, nullable = false),
      StructField("Hobbies", StructType(Array(
        StructField("hobby1", StringType, nullable = false),
        StructField("hobby2", StringType, nullable = false)
      )),nullable = true)
    ))

val sampleData = Seq(
  Row(1, "Ajay", 55, Row("Singing", "Sudoku")),
  Row(2, "Bhargav", 63, Row("Dancing", "Painting")),
  Row(3, "Chaitra", 60, Row("Chess", "Long Walks")),
  Row(4, "Kamal", 75, Row("Reading Books", "Cooking")),
  Row(5, "Sohaib", 70, Row("Singing", "Cooking"))
)

val df = spark.createDataFrame(
  spark.sparkContext.parallelize(sampleData), 
  studentSchema)

df.show(truncate = false)

Output

+---+-------+-----+------------------------+
|ID |Name   |Marks|Hobbies                 |
+---+-------+-----+------------------------+
|1  |Ajay   |55   |[Singing, Sudoku]       |
|2  |Bhargav|63   |[Dancing, Painting]     |
|3  |Chaitra|60   |[Chess, Long Walks]     |
|4  |Kamal  |75   |[Reading Books, Cooking]|
|5  |Sohaib |70   |[Singing, Cooking]      |
+---+-------+-----+------------------------+

Access the nested fields of the dataframe

To access individual fields inside the Hobbies struct, you need to first understand the schema:

df.printSchema()

Output

root
 |-- ID: integer (nullable = false)
 |-- Name: string (nullable = false)
 |-- Marks: integer (nullable = false)
 |-- Hobbies: struct (nullable = true)
 |    |-- hobby1: string (nullable = false)
 |    |-- hobby2: string (nullable = false)

Accessing Fields Inside a Struct

Once you're familiar with the schema, you can extract individual nested fields like this:

val accessNestedFields = df.select(col("Name"),
  col("Hobbies.hobby1"))

accessNestedFields.show()

Output

+-------+-------------+
|   Name|       hobby1|
+-------+-------------+
|   Ajay|      Singing|
|Bhargav|      Dancing|
|Chaitra|        Chess|
|  Kamal|Reading Books|
| Sohaib|      Singing|
+-------+-------------+

You can apply the same logic to extract other nested fields.

Renaming a nested column

val renameColumns = df.select(col("Name"), 
  col("Hobbies.hobby1")
    .alias("hobby")
)
  
renameColumns.show()

+-------+-------------+
|   Name|        hobby|
+-------+-------------+
|   Ajay|      Singing|
|Bhargav|      Dancing|
|Chaitra|        Chess|
|  Kamal|Reading Books|
| Sohaib|      Singing|
+-------+-------------+

Flattening Struct columns

You can flatten nested struct fields using selectExpr().

val flattenStruct = df.selectExpr("Name",
  "Hobbies.hobby1 as hobby1",
  "Hobbies.hobby2 as hobby2")
  
flattenStruct.show()

Output

+-------+-------------+----------+
|   Name|       hobby1|    hobby2|
+-------+-------------+----------+
|   Ajay|      Singing|    Sudoku|
|Bhargav|      Dancing|  Painting|
|Chaitra|        Chess|Long Walks|
|  Kamal|Reading Books|   Cooking|
| Sohaib|      Singing|   Cooking|
+-------+-------------+----------+

To read more about select() and selectExpr(), check out this article: Select vs SelectExpr.

Filtering dataframe rows based on nested field values

You can also apply filters to nested fields within a struct:

val filterColumns = df.filter(col("Hobbies.hobby1") === "Singing")

filterColumns.show(truncate = false)

Output

+---+------+-----+------------------+
| ID|  Name|Marks|           Hobbies|
+---+------+-----+------------------+
|  1|  Ajay|   55| [Singing, Sudoku]|
|  5|Sohaib|   70|[Singing, Cooking]|
+---+------+-----+------------------+

Summary

The StructType in Spark enables you to combine multiple fields into one structured column, making it easier to manage nested data.
Each nested field is defined using StructField, specifying its name, type, and nullability.
You can access, rename, flatten, and filter nested fields using Spark SQL functions like col(), alias(), and selectExpr().

Struct Data type

Creating a dataframe with StructType

Access the nested fields of the dataframe

Accessing Fields Inside a Struct

Renaming a nested column

Flattening Struct columns

Filtering dataframe rows based on nested field values

Summary

Related Articles

References