Pyspark Selecting columns from Dataframe by providing schema as StructType

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Pyspark Selecting columns from Dataframe by providing schema as StructType

alokt
This post has NOT been accepted by the mailing list yet.
This post was updated on .
Hi

In a scala example of reading columns from a dataframe by providing a schema is possible in this way.
object UserSchema {
  val Name = "name"
  val NameField = StructField(Name, StringType, nullable = true)

  val Age = "age"
  val AgeField = StructField(Age, IntegerType, nullable = true)

  val Address = "address"
  val AddressField = StructField(Address, IntegerType, nullable = true)
}

val TestSchema = StructType(List(Name,
    Age,
    Address))

val TestColumns = TestSchema.fieldNames.toList.map(col)

val outDF = inDF.select(TestColumns: _*)

How can the same be achieved in pyspark?

Thank you.