Re: [SPARK-2.0][SQL] UDF containing non-serializable object does not work as expected
The show thing was the result of an optimization that short-circuited any real Spark computation when the input is a local collection, and the result was simply the first few rows. That's why it completed without serializing anything.
It is somewhat inconsistent. One way to eliminate the inconsistency is to always serialize the query plan even for local execution. We did that back in the days for the RDD code path, and we can do similar things for the SQL code path. However, serialization is not free and it will slow down the execution by small percentage.