Hi,

You don't need to run approxPercentile against a list. Since it is an aggregation function, you can simply run:

// Just for illustrate the idea.

val approxPercentile = new ApproximatePercentile(v1, Literal(percentage))

val agg_approx_percentile = Column(approxPercentile.toAggregateExpression())

df.groupBy (k1, k2, k3).agg(collect_list(v1), agg_approx_percentile)

Rishi wrote

I need to compute have a spark quantiles on a numeric field after a group by operation. Is there a way to apply the approxPercentile on an aggregated list instead of a column?

E.g. The Dataframe looks like

k1 | k2 | k3 | v1

a1 | b1 | c1 | 879

a2 | b2 | c2 | 769

a1 | b1 | c1 | 129

a2 | b2 | c2 | 323

I need to first run groupBy (k1, k2, k3) and collect_list(v1), and then compute quantiles [10th, 50th...] on list of v1's