On 2 Jan 2018, at 09:45, Jacek Laskowski <[hidden email]> wrote:
I was wondering what's wrong with FileSystem.getContentSummary in CommandUtils.calculateLocationSize as "expressed" in the comment :
// This method is mainly based on
// org.apache.hadoop.hive.ql.stats.StatsUtils.getFileSizeForTable(HiveConf, Table)
// in Hive 0.13 (except that we do not use fs.getContentSummary).
// TODO: Generalize statistics collection.
// TODO: Why fs.getContentSummary returns wrong size on Jenkins?
// Can we use fs.getContentSummary in future?
// Seems fs.getContentSummary returns wrong table size on Jenkins. So we use
// countFileSize to count the table size.
until I found out that there seems to be no issue whatsoever since DetermineTableStats uses it just fine .
Why does CommandUtils.calculateLocationSize *not* use what DetermineTableStats does successfully?
Mastering Spark SQL https://bit.ly/mastering-spark-sql
Spark Structured Streaming https://bit.ly/spark-structured-streaming
Mastering Kafka Streams
Follow me at