How to propagate Non-Empty Value in SPARQL Dataset

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view

How to propagate Non-Empty Value in SPARQL Dataset

This post has NOT been accepted by the mailing list yet.
Dear All,

I am trying to propagate the last valid observation (e.g. not null) to the null values in a dataset.

Below I reported the partial solution:

Dataset<Row>"uuid", "eventTime", "Washer_rinseCycles");
            WindowSpec wspec= Window.partitionBy(tmp800.col("uuid")).orderBy(tmp800.col("uuid"),tmp800.col("eventTime"));
            Column c1 = org.apache.spark.sql.functions.lag(tmp800.col("Washer_rinseCycles"),1).over(wspec);
            Dataset<Row> tmp900=tmp800.withColumn("Washer_rinseCyclesFilled", when(tmp800.col("Washer_rinseCycles").isNull(),                             c1).otherwise(tmp800.col("Washer_rinseCycles")));
However, It does not solve the entire problem as the function lag(,1) returns the value that is the rows before the current row even if it is NULL (see the below table).

Is there in SPARK a similar method to Pandas’ “backfill” for the DataFrame?

Is it possible to do it using SPARK API? How?

Many Thanks in advance.
Best Regards,