[ANNOUNCE] Spark 1.6.0 Release Preview

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

[ANNOUNCE] Spark 1.6.0 Release Preview

Michael Armbrust
In order to facilitate community testing of Spark 1.6.0, I'm excited to announce the availability of an early preview of the release. This is not a release candidate, so there is no voting involved. However, it'd be awesome if community members can start testing with this preview package and report any problems they encounter.

This preview package contains all the commits to branch-1.6 till commit 308381420f51b6da1007ea09a02d740613a226e0.

The staging maven repository for this preview build can be found here: https://repository.apache.org/content/repositories/orgapachespark-1162

Binaries for this preview build can be found here: http://people.apache.org/~pwendell/spark-releases/spark-v1.6.0-preview2-bin/

A build of the docs can also be found here: http://people.apache.org/~pwendell/spark-releases/spark-v1.6.0-preview2-docs/

The full change log for this release can be found on JIRA.

== How can you help? ==

If you are a Spark user, you can help us test this release by taking a Spark workload and running on this preview release, then reporting any regressions.

== Major Features ==

When testing, we'd appreciate it if users could focus on areas that have changed in this release.  Some notable new features include:

SPARK-11787 Parquet Performance - Improve Parquet scan performance when using flat schemas.
SPARK-10810 Session Management - Multiple users of the thrift (JDBC/ODBC) server now have isolated sessions including their own default database (i.e USE mydb) even on shared clusters.
SPARK-9999  Dataset API - A new, experimental type-safe API (similar to RDDs) that performs many operations on serialized binary data and code generation (i.e. Project Tungsten)
SPARK-10000 Unified Memory Management - Shared memory for execution and caching instead of exclusive division of the regions.
SPARK-10978 Datasource API Avoid Double Filter - When implementing a datasource with filter pushdown, developers can now tell Spark SQL to avoid double evaluating a pushed-down filter.
SPARK-2629  New improved state management - trackStateByKey - a DStream transformation for stateful stream processing, supersedes updateStateByKey in functionality and performance.

Happy testing!

Michael

Reply | Threaded
Open this post in threaded view
|

Re: [ANNOUNCE] Spark 1.6.0 Release Preview

mkhaitman-2
CONTENTS DELETED
The author has deleted this message.
Reply | Threaded
Open this post in threaded view
|

Re: [ANNOUNCE] Spark 1.6.0 Release Preview

Dean Wampler
In reply to this post by Michael Armbrust
I'm seeing an RPC timeout with the 2.11 build, but not the Hadoop1, 2.10 build: The following session with two uses of sc.parallize causes it almost every the time. Occasionally I don't see the stack trace and I don't see it with just a single sc.parallize, even the bigger, second one. When the error occurs, it does pause for about two minutes with no output before the stack trace. I elided some output; why all the non-log4j warnings occur at startup is another question:


$ pwd
/Users/deanwampler/projects/spark/spark-1.6.0-bin-hadoop1-scala2.11
$ ./bin/spark-shell
log4j:WARN No appenders could be found for logger (org.apache.hadoop.security.Groups).
log4j:WARN Please initialize the log4j system properly.
Using Spark's repl log4j profile: org/apache/spark/log4j-defaults-repl.properties
To adjust logging level use sc.setLogLevel("INFO")
Spark context available as sc.
15/11/23 13:01:45 WARN Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies)
15/11/23 13:01:45 WARN Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies)
15/11/23 13:01:49 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0
15/11/23 13:01:49 WARN ObjectStore: Failed to get database default, returning NoSuchObjectException
15/11/23 13:01:49 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
15/11/23 13:01:50 WARN Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies)
15/11/23 13:01:50 WARN Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies)
SQL context available as sqlContext.
Welcome to
      ____              __
           / __/__  ___ _____/ /__
         _\ \/ _ \/ _ `/ __/  '_/
       /___/ .__/\_,_/_/ /_/\_\   version 1.6.0-SNAPSHOT
         /_/

Using Scala version 2.11.7 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0)
Type in expressions to have them evaluated.
Type :help for more information.

scala> sc.parallelize((1 to 100000), 10000).count()

[Stage 0:>                                                      (0 + 0) / 10000]
[Stage 0:==>                                                  (420 + 4) / 10000]
[Stage 0:===>                                                 (683 + 4) / 10000]
... elided ...
[Stage 0:==========================================>         (8264 + 4) / 10000]
[Stage 0:==============================================>     (8902 + 6) / 10000]
[Stage 0:=================================================>  (9477 + 4) / 10000]

res0: Long = 100000

scala> sc.parallelize((1 to 1000000), 100000).count()

[Stage 1:>                                                     (0 + 0) / 100000]
[Stage 1:>                                                     (0 + 0) / 100000]
[Stage 1:>                                                     (0 + 0) / 100000]15/11/23 13:04:09 WARN NettyRpcEndpointRef: Error sending message [message = Heartbeat(driver,[Lscala.Tuple2;@7f9d659c,BlockManagerId(driver, localhost, 55188))] in 1 attempts
org.apache.spark.rpc.RpcTimeoutException: Futures timed out after [120 seconds]. This timeout is controlled by spark.rpc.askTimeout
at org.apache.spark.rpc.RpcTimeout.org$apache$spark$rpc$RpcTimeout$$createRpcTimeoutException(RpcTimeout.scala:48)
  at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:63)
  at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59)
  at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
  at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:76)
  at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:101)
  at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:77)
  at org.apache.spark.executor.Executor.org$apache$spark$executor$Executor$$reportHeartBeat(Executor.scala:452)
  at org.apache.spark.executor.Executor$$anon$1$$anonfun$run$1.apply$mcV$sp(Executor.scala:472)
  at org.apache.spark.executor.Executor$$anon$1$$anonfun$run$1.apply(Executor.scala:472)
  at org.apache.spark.executor.Executor$$anon$1$$anonfun$run$1.apply(Executor.scala:472)
  at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1708)
  at org.apache.spark.executor.Executor$$anon$1.run(Executor.scala:472)
  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
  at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
  at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
  at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
  at java.lang.Thread.run(Thread.java:744)
    Caused by: java.util.concurrent.TimeoutException: Futures timed out after [120 seconds]
at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
  at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
  at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:190)
  at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
  at scala.concurrent.Await$.result(package.scala:190)
  at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75)
  ... 15 more
    15/11/23 13:04:56 WARN NettyRpcEndpointRef: Ignore message Success(HeartbeatResponse(false))

[Stage 1:=>                                                 (2204 + 6) / 100000]
[Stage 1:=>                                                 (2858 + 4) / 100000]
[Stage 1:=>                                                 (3616 + 5) / 100000]
... elided ...
[Stage 1:=================================================>(98393 + 4) / 100000]
[Stage 1:=================================================>(99347 + 4) / 100000]
[Stage 1:=================================================>(99734 + 4) / 100000]

res1: Long = 1000000

scala>



On Sun, Nov 22, 2015 at 4:21 PM, Michael Armbrust <[hidden email]> wrote:
In order to facilitate community testing of Spark 1.6.0, I'm excited to announce the availability of an early preview of the release. This is not a release candidate, so there is no voting involved. However, it'd be awesome if community members can start testing with this preview package and report any problems they encounter.

This preview package contains all the commits to branch-1.6 till commit 308381420f51b6da1007ea09a02d740613a226e0.

The staging maven repository for this preview build can be found here: https://repository.apache.org/content/repositories/orgapachespark-1162

Binaries for this preview build can be found here: http://people.apache.org/~pwendell/spark-releases/spark-v1.6.0-preview2-bin/

A build of the docs can also be found here: http://people.apache.org/~pwendell/spark-releases/spark-v1.6.0-preview2-docs/

The full change log for this release can be found on JIRA.

== How can you help? ==

If you are a Spark user, you can help us test this release by taking a Spark workload and running on this preview release, then reporting any regressions.

== Major Features ==

When testing, we'd appreciate it if users could focus on areas that have changed in this release.  Some notable new features include:

SPARK-11787 Parquet Performance - Improve Parquet scan performance when using flat schemas.
SPARK-10810 Session Management - Multiple users of the thrift (JDBC/ODBC) server now have isolated sessions including their own default database (i.e USE mydb) even on shared clusters.
SPARK-9999  Dataset API - A new, experimental type-safe API (similar to RDDs) that performs many operations on serialized binary data and code generation (i.e. Project Tungsten)
SPARK-10000 Unified Memory Management - Shared memory for execution and caching instead of exclusive division of the regions.
SPARK-10978 Datasource API Avoid Double Filter - When implementing a datasource with filter pushdown, developers can now tell Spark SQL to avoid double evaluating a pushed-down filter.
SPARK-2629  New improved state management - trackStateByKey - a DStream transformation for stateful stream processing, supersedes updateStateByKey in functionality and performance.

Happy testing!

Michael


Reply | Threaded
Open this post in threaded view
|

Re: [ANNOUNCE] Spark 1.6.0 Release Preview

Ted Yu
In reply to this post by Michael Armbrust
If I am not mistaken, the binaries for Scala 2.11 were generated against hadoop 1.

What about binaries for Scala 2.11 against hadoop 2.x ?

Cheers

On Sun, Nov 22, 2015 at 2:21 PM, Michael Armbrust <[hidden email]> wrote:
In order to facilitate community testing of Spark 1.6.0, I'm excited to announce the availability of an early preview of the release. This is not a release candidate, so there is no voting involved. However, it'd be awesome if community members can start testing with this preview package and report any problems they encounter.

This preview package contains all the commits to branch-1.6 till commit 308381420f51b6da1007ea09a02d740613a226e0.

The staging maven repository for this preview build can be found here: https://repository.apache.org/content/repositories/orgapachespark-1162

Binaries for this preview build can be found here: http://people.apache.org/~pwendell/spark-releases/spark-v1.6.0-preview2-bin/

A build of the docs can also be found here: http://people.apache.org/~pwendell/spark-releases/spark-v1.6.0-preview2-docs/

The full change log for this release can be found on JIRA.

== How can you help? ==

If you are a Spark user, you can help us test this release by taking a Spark workload and running on this preview release, then reporting any regressions.

== Major Features ==

When testing, we'd appreciate it if users could focus on areas that have changed in this release.  Some notable new features include:

SPARK-11787 Parquet Performance - Improve Parquet scan performance when using flat schemas.
SPARK-10810 Session Management - Multiple users of the thrift (JDBC/ODBC) server now have isolated sessions including their own default database (i.e USE mydb) even on shared clusters.
SPARK-9999  Dataset API - A new, experimental type-safe API (similar to RDDs) that performs many operations on serialized binary data and code generation (i.e. Project Tungsten)
SPARK-10000 Unified Memory Management - Shared memory for execution and caching instead of exclusive division of the regions.
SPARK-10978 Datasource API Avoid Double Filter - When implementing a datasource with filter pushdown, developers can now tell Spark SQL to avoid double evaluating a pushed-down filter.
SPARK-2629  New improved state management - trackStateByKey - a DStream transformation for stateful stream processing, supersedes updateStateByKey in functionality and performance.

Happy testing!

Michael