Quantcast

Dataframe replace 'collect()' going in indefinite time loop

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Dataframe replace 'collect()' going in indefinite time loop

Saurabh Adhikary
final_schema_noise_data = sqlContext.createDataFrame(noise_data_parts,noise_data_struct_schema)

for a_name in name_field_names:
  final_schema_noise_data=final_schema_noise_data.withColumn(a_name,spaceDeleteUDF(a_name))
#--- till here final_schema_noise_data.collect() is working---
  for t in noise_chars:
    final_schema_noise_data = final_schema_noise_data.na.replace(t,'',a_name)
    print a_name,t
#The above loop gets completed but final_schema_noise_data.collect() dos not yield any result, cursor goes to next line & some processing goes on for hours but no output.

#Before the inner for loop , the df.collect() gives output in secs & post completion of the loop no output for hours.
Any known issue with the df.na.replace function ??
Loading...