benefits of code gen

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

benefits of code gen

Koert Kuipers
so i have been looking for a while now at all the catalyst expressions, and all the relative complex codegen going on.

so first off i get the benefit of codegen to turn a bunch of chained iterators transformations into a single codegen stage for spark. that makes sense to me, because it avoids a bunch of overhead.

but what i am not so sure about is what the benefit is of converting the actual stuff that happens inside the iterator transformations into codegen.

say if we have an expression that has 2 children and creates a struct for them. why would this be faster in codegen by re-creating the code to do this in a string (which is complex and error prone) compared to simply have the codegen call the normal method for this in my class?

i see so much trivial code be re-created in codegen. stuff like this:

  private[this] def castToDateCode(
      from: DataType,
      ctx: CodegenContext): CastFunction = from match {
    case StringType =>
      val intOpt = ctx.freshName("intOpt")
      (c, evPrim, evNull) => s"""
        scala.Option<Integer> $intOpt =
          org.apache.spark.sql.catalyst.util.DateTimeUtils.stringToDate($c);
        if ($intOpt.isDefined()) {
          $evPrim = ((Integer) $intOpt.get()).intValue();
        } else {
          $evNull = true;
        }
       """

is this really faster than simply calling an equivalent functions from the codegen, and keeping the codegen logic restricted to the "unrolling" of chained iterators?

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: benefits of code gen

rxin
With complex types it doesn't work as well, but for primitive types the biggest benefit of whole stage codegen is that we don't even need to put the intermediate data into rows or columns anymore. They are just variables (stored in CPU registers).

On Fri, Feb 10, 2017 at 8:22 PM, Koert Kuipers <[hidden email]> wrote:
so i have been looking for a while now at all the catalyst expressions, and all the relative complex codegen going on.

so first off i get the benefit of codegen to turn a bunch of chained iterators transformations into a single codegen stage for spark. that makes sense to me, because it avoids a bunch of overhead.

but what i am not so sure about is what the benefit is of converting the actual stuff that happens inside the iterator transformations into codegen.

say if we have an expression that has 2 children and creates a struct for them. why would this be faster in codegen by re-creating the code to do this in a string (which is complex and error prone) compared to simply have the codegen call the normal method for this in my class?

i see so much trivial code be re-created in codegen. stuff like this:

  private[this] def castToDateCode(
      from: DataType,
      ctx: CodegenContext): CastFunction = from match {
    case StringType =>
      val intOpt = ctx.freshName("intOpt")
      (c, evPrim, evNull) => s"""
        scala.Option<Integer> $intOpt =
          org.apache.spark.sql.catalyst.util.DateTimeUtils.stringToDate($c);
        if ($intOpt.isDefined()) {
          $evPrim = ((Integer) $intOpt.get()).intValue();
        } else {
          $evNull = true;
        }
       """

is this really faster than simply calling an equivalent functions from the codegen, and keeping the codegen logic restricted to the "unrolling" of chained iterators?


Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: benefits of code gen

Koert Kuipers
based on that i take it that math functions would be primary beneficiaries since they work on primitives.

so if i take UnaryMathExpression as an example, would i not get the same benefit if i change it to this?

abstract class UnaryMathExpression(val f: Double => Double, name: String)
  extends UnaryExpression with Serializable with ImplicitCastInputTypes {

  override def inputTypes: Seq[AbstractDataType] = Seq(DoubleType)
  override def dataType: DataType = DoubleType
  override def nullable: Boolean = true
  override def toString: String = s"$name($child)"
  override def prettyName: String = name

  protected override def nullSafeEval(input: Any): Any = {
    f(input.asInstanceOf[Double])
  }

  // name of function in java.lang.Math
  def funcName: String = name.toLowerCase

  def function(d: Double): Double = f(d)

  override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
    val self = ctx.addReferenceObj(name, this, getClass.getName)
    defineCodeGen(ctx, ev, c => s"$self.function($c)")
  }
}

admittedly in this case the benefit in terms of removing complex codegen is not there (the codegen was only one line), but if i can remove codegen here i could also remove it in lots of other places where it does get very unwieldy simply by replacing it with calls to methods.

Function1 is specialized, so i think (or hope) that my version does no extra boxes/unboxing.

On Fri, Feb 10, 2017 at 2:24 PM, Reynold Xin <[hidden email]> wrote:
With complex types it doesn't work as well, but for primitive types the biggest benefit of whole stage codegen is that we don't even need to put the intermediate data into rows or columns anymore. They are just variables (stored in CPU registers).

On Fri, Feb 10, 2017 at 8:22 PM, Koert Kuipers <[hidden email]> wrote:
so i have been looking for a while now at all the catalyst expressions, and all the relative complex codegen going on.

so first off i get the benefit of codegen to turn a bunch of chained iterators transformations into a single codegen stage for spark. that makes sense to me, because it avoids a bunch of overhead.

but what i am not so sure about is what the benefit is of converting the actual stuff that happens inside the iterator transformations into codegen.

say if we have an expression that has 2 children and creates a struct for them. why would this be faster in codegen by re-creating the code to do this in a string (which is complex and error prone) compared to simply have the codegen call the normal method for this in my class?

i see so much trivial code be re-created in codegen. stuff like this:

  private[this] def castToDateCode(
      from: DataType,
      ctx: CodegenContext): CastFunction = from match {
    case StringType =>
      val intOpt = ctx.freshName("intOpt")
      (c, evPrim, evNull) => s"""
        scala.Option<Integer> $intOpt =
          org.apache.spark.sql.catalyst.util.DateTimeUtils.stringToDate($c);
        if ($intOpt.isDefined()) {
          $evPrim = ((Integer) $intOpt.get()).intValue();
        } else {
          $evNull = true;
        }
       """

is this really faster than simply calling an equivalent functions from the codegen, and keeping the codegen logic restricted to the "unrolling" of chained iterators?



Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: benefits of code gen

Michael Armbrust
Function1 is specialized, but nullSafeEval is Any => Any, so that's still going to box in the non-codegened execution path.

On Fri, Feb 10, 2017 at 1:32 PM, Koert Kuipers <[hidden email]> wrote:
based on that i take it that math functions would be primary beneficiaries since they work on primitives.

so if i take UnaryMathExpression as an example, would i not get the same benefit if i change it to this?

abstract class UnaryMathExpression(val f: Double => Double, name: String)
  extends UnaryExpression with Serializable with ImplicitCastInputTypes {

  override def inputTypes: Seq[AbstractDataType] = Seq(DoubleType)
  override def dataType: DataType = DoubleType
  override def nullable: Boolean = true
  override def toString: String = s"$name($child)"
  override def prettyName: String = name

  protected override def nullSafeEval(input: Any): Any = {
    f(input.asInstanceOf[Double])
  }

  // name of function in java.lang.Math
  def funcName: String = name.toLowerCase

  def function(d: Double): Double = f(d)

  override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
    val self = ctx.addReferenceObj(name, this, getClass.getName)
    defineCodeGen(ctx, ev, c => s"$self.function($c)")
  }
}

admittedly in this case the benefit in terms of removing complex codegen is not there (the codegen was only one line), but if i can remove codegen here i could also remove it in lots of other places where it does get very unwieldy simply by replacing it with calls to methods.

Function1 is specialized, so i think (or hope) that my version does no extra boxes/unboxing.

On Fri, Feb 10, 2017 at 2:24 PM, Reynold Xin <[hidden email]> wrote:
With complex types it doesn't work as well, but for primitive types the biggest benefit of whole stage codegen is that we don't even need to put the intermediate data into rows or columns anymore. They are just variables (stored in CPU registers).

On Fri, Feb 10, 2017 at 8:22 PM, Koert Kuipers <[hidden email]> wrote:
so i have been looking for a while now at all the catalyst expressions, and all the relative complex codegen going on.

so first off i get the benefit of codegen to turn a bunch of chained iterators transformations into a single codegen stage for spark. that makes sense to me, because it avoids a bunch of overhead.

but what i am not so sure about is what the benefit is of converting the actual stuff that happens inside the iterator transformations into codegen.

say if we have an expression that has 2 children and creates a struct for them. why would this be faster in codegen by re-creating the code to do this in a string (which is complex and error prone) compared to simply have the codegen call the normal method for this in my class?

i see so much trivial code be re-created in codegen. stuff like this:

  private[this] def castToDateCode(
      from: DataType,
      ctx: CodegenContext): CastFunction = from match {
    case StringType =>
      val intOpt = ctx.freshName("intOpt")
      (c, evPrim, evNull) => s"""
        scala.Option<Integer> $intOpt =
          org.apache.spark.sql.catalyst.util.DateTimeUtils.stringToDate($c);
        if ($intOpt.isDefined()) {
          $evPrim = ((Integer) $intOpt.get()).intValue();
        } else {
          $evNull = true;
        }
       """

is this really faster than simply calling an equivalent functions from the codegen, and keeping the codegen logic restricted to the "unrolling" of chained iterators?




Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: benefits of code gen

Koert Kuipers
yes agreed. however i believe nullSafeEval is not used for codegen?

On Fri, Feb 10, 2017 at 4:56 PM, Michael Armbrust <[hidden email]> wrote:
Function1 is specialized, but nullSafeEval is Any => Any, so that's still going to box in the non-codegened execution path.

On Fri, Feb 10, 2017 at 1:32 PM, Koert Kuipers <[hidden email]> wrote:
based on that i take it that math functions would be primary beneficiaries since they work on primitives.

so if i take UnaryMathExpression as an example, would i not get the same benefit if i change it to this?

abstract class UnaryMathExpression(val f: Double => Double, name: String)
  extends UnaryExpression with Serializable with ImplicitCastInputTypes {

  override def inputTypes: Seq[AbstractDataType] = Seq(DoubleType)
  override def dataType: DataType = DoubleType
  override def nullable: Boolean = true
  override def toString: String = s"$name($child)"
  override def prettyName: String = name

  protected override def nullSafeEval(input: Any): Any = {
    f(input.asInstanceOf[Double])
  }

  // name of function in java.lang.Math
  def funcName: String = name.toLowerCase

  def function(d: Double): Double = f(d)

  override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
    val self = ctx.addReferenceObj(name, this, getClass.getName)
    defineCodeGen(ctx, ev, c => s"$self.function($c)")
  }
}

admittedly in this case the benefit in terms of removing complex codegen is not there (the codegen was only one line), but if i can remove codegen here i could also remove it in lots of other places where it does get very unwieldy simply by replacing it with calls to methods.

Function1 is specialized, so i think (or hope) that my version does no extra boxes/unboxing.

On Fri, Feb 10, 2017 at 2:24 PM, Reynold Xin <[hidden email]> wrote:
With complex types it doesn't work as well, but for primitive types the biggest benefit of whole stage codegen is that we don't even need to put the intermediate data into rows or columns anymore. They are just variables (stored in CPU registers).

On Fri, Feb 10, 2017 at 8:22 PM, Koert Kuipers <[hidden email]> wrote:
so i have been looking for a while now at all the catalyst expressions, and all the relative complex codegen going on.

so first off i get the benefit of codegen to turn a bunch of chained iterators transformations into a single codegen stage for spark. that makes sense to me, because it avoids a bunch of overhead.

but what i am not so sure about is what the benefit is of converting the actual stuff that happens inside the iterator transformations into codegen.

say if we have an expression that has 2 children and creates a struct for them. why would this be faster in codegen by re-creating the code to do this in a string (which is complex and error prone) compared to simply have the codegen call the normal method for this in my class?

i see so much trivial code be re-created in codegen. stuff like this:

  private[this] def castToDateCode(
      from: DataType,
      ctx: CodegenContext): CastFunction = from match {
    case StringType =>
      val intOpt = ctx.freshName("intOpt")
      (c, evPrim, evNull) => s"""
        scala.Option<Integer> $intOpt =
          org.apache.spark.sql.catalyst.util.DateTimeUtils.stringToDate($c);
        if ($intOpt.isDefined()) {
          $evPrim = ((Integer) $intOpt.get()).intValue();
        } else {
          $evNull = true;
        }
       """

is this really faster than simply calling an equivalent functions from the codegen, and keeping the codegen logic restricted to the "unrolling" of chained iterators?





Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: benefits of code gen

Sumedh Wale
In reply to this post by Koert Kuipers
The difference is closure invocation instead of a static java.lang.Math call. In many cases JIT may not be able to perform inlining and related code optimizations though in this specific case it should. This is highly dependent on the specific case, but when inlining cannot be done and it leads to a method call (especially virtual call) then the difference is quite large: few nanoseconds per evaluation vs tens of nanoseconds in my experiments.
Serialization of an additional object as a reference can have a measurable effect for low-latency jobs though usually can be ignored.

What has been observed is that if an expression uses CodegenFallback then it becomes an order of magnitude slower or more. Most of it is due to UnsafeRow read/write overhead which is avoided here, but still care needs to be taken for (virtual) function calls too. In some cases JIT does inline virtual calls but may not always happen. In my experience the only reliable case where it does inline is when the virtual call is on a local variable that does not change for multiple invocations (e.g. a final local variable outside the while loop of a doProduce).

I think what should work better is encapsulating such code in methods of a scala object rather than a class and those can be invoked in generated code like static methods. Such calls should be equivalent to inline code generation in most cases since JIT will inline the calls where it will determine significant benefit. In some cases such method calls will have better CPU instruction cache hits (i.e. if same inline code is emitted multiple times vs common method calls). All this needs thorough micro/macro-benchmarking.

However, I don't recall any large pieces of generated code where this can help. Most complex pieces like in HashAggregateExec/SortMergeJoinExec/BroadcastHashJoinExec are so because they generate schema specific code (to avoid virtual calls and boxing/unboxing, and UnsafeRow read/write in some cases) which is significantly faster than the equivalent generic code in doExecute. Or in your "castToDateCode" example, don't see how you can reduce it since bulk of code is already in the static stringToDate call.


On Saturday 11 February 2017 03:02 AM, Koert Kuipers wrote:
based on that i take it that math functions would be primary beneficiaries since they work on primitives.

so if i take UnaryMathExpression as an example, would i not get the same benefit if i change it to this?

abstract class UnaryMathExpression(val f: Double => Double, name: String)
  extends UnaryExpression with Serializable with ImplicitCastInputTypes {

  override def inputTypes: Seq[AbstractDataType] = Seq(DoubleType)
  override def dataType: DataType = DoubleType
  override def nullable: Boolean = true
  override def toString: String = s"$name($child)"
  override def prettyName: String = name

  protected override def nullSafeEval(input: Any): Any = {
    f(input.asInstanceOf[Double])
  }

  // name of function in java.lang.Math
  def funcName: String = name.toLowerCase

  def function(d: Double): Double = f(d)

  override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
    val self = ctx.addReferenceObj(name, this, getClass.getName)
    defineCodeGen(ctx, ev, c => s"$self.function($c)")
  }
}

admittedly in this case the benefit in terms of removing complex codegen is not there (the codegen was only one line), but if i can remove codegen here i could also remove it in lots of other places where it does get very unwieldy simply by replacing it with calls to methods.

Function1 is specialized, so i think (or hope) that my version does no extra boxes/unboxing.

On Fri, Feb 10, 2017 at 2:24 PM, Reynold Xin <[hidden email]> wrote:
With complex types it doesn't work as well, but for primitive types the biggest benefit of whole stage codegen is that we don't even need to put the intermediate data into rows or columns anymore. They are just variables (stored in CPU registers).

On Fri, Feb 10, 2017 at 8:22 PM, Koert Kuipers <[hidden email]> wrote:
so i have been looking for a while now at all the catalyst expressions, and all the relative complex codegen going on.

so first off i get the benefit of codegen to turn a bunch of chained iterators transformations into a single codegen stage for spark. that makes sense to me, because it avoids a bunch of overhead.

but what i am not so sure about is what the benefit is of converting the actual stuff that happens inside the iterator transformations into codegen.

say if we have an expression that has 2 children and creates a struct for them. why would this be faster in codegen by re-creating the code to do this in a string (which is complex and error prone) compared to simply have the codegen call the normal method for this in my class?

i see so much trivial code be re-created in codegen. stuff like this:

  private[this] def castToDateCode(
      from: DataType,
      ctx: CodegenContext): CastFunction = from match {
    case StringType =>
      val intOpt = ctx.freshName("intOpt")
      (c, evPrim, evNull) => s"""
        scala.Option<Integer> $intOpt =
          org.apache.spark.sql.catalyst.util.DateTimeUtils.stringToDate($c);
        if ($intOpt.isDefined()) {
          $evPrim = ((Integer) $intOpt.get()).intValue();
        } else {
          $evNull = true;
        }
       """

is this really faster than simply calling an equivalent functions from the codegen, and keeping the codegen logic restricted to the "unrolling" of chained iterators?




--------------------------------------------------------------------- To unsubscribe e-mail: [hidden email]
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: benefits of code gen

Koert Kuipers
thanks for that detailed response!

On Mon, Feb 13, 2017 at 12:56 AM, Sumedh Wale <[hidden email]> wrote:
The difference is closure invocation instead of a static java.lang.Math call. In many cases JIT may not be able to perform inlining and related code optimizations though in this specific case it should. This is highly dependent on the specific case, but when inlining cannot be done and it leads to a method call (especially virtual call) then the difference is quite large: few nanoseconds per evaluation vs tens of nanoseconds in my experiments.
Serialization of an additional object as a reference can have a measurable effect for low-latency jobs though usually can be ignored.

What has been observed is that if an expression uses CodegenFallback then it becomes an order of magnitude slower or more. Most of it is due to UnsafeRow read/write overhead which is avoided here, but still care needs to be taken for (virtual) function calls too. In some cases JIT does inline virtual calls but may not always happen. In my experience the only reliable case where it does inline is when the virtual call is on a local variable that does not change for multiple invocations (e.g. a final local variable outside the while loop of a doProduce).

I think what should work better is encapsulating such code in methods of a scala object rather than a class and those can be invoked in generated code like static methods. Such calls should be equivalent to inline code generation in most cases since JIT will inline the calls where it will determine significant benefit. In some cases such method calls will have better CPU instruction cache hits (i.e. if same inline code is emitted multiple times vs common method calls). All this needs thorough micro/macro-benchmarking.

However, I don't recall any large pieces of generated code where this can help. Most complex pieces like in HashAggregateExec/SortMergeJoinExec/BroadcastHashJoinExec are so because they generate schema specific code (to avoid virtual calls and boxing/unboxing, and UnsafeRow read/write in some cases) which is significantly faster than the equivalent generic code in doExecute. Or in your "castToDateCode" example, don't see how you can reduce it since bulk of code is already in the static stringToDate call.



On Saturday 11 February 2017 03:02 AM, Koert Kuipers wrote:
based on that i take it that math functions would be primary beneficiaries since they work on primitives.

so if i take UnaryMathExpression as an example, would i not get the same benefit if i change it to this?

abstract class UnaryMathExpression(val f: Double => Double, name: String)
  extends UnaryExpression with Serializable with ImplicitCastInputTypes {

  override def inputTypes: Seq[AbstractDataType] = Seq(DoubleType)
  override def dataType: DataType = DoubleType
  override def nullable: Boolean = true
  override def toString: String = s"$name($child)"
  override def prettyName: String = name

  protected override def nullSafeEval(input: Any): Any = {
    f(input.asInstanceOf[Double])
  }

  // name of function in java.lang.Math
  def funcName: String = name.toLowerCase

  def function(d: Double): Double = f(d)

  override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
    val self = ctx.addReferenceObj(name, this, getClass.getName)
    defineCodeGen(ctx, ev, c => s"$self.function($c)")
  }
}

admittedly in this case the benefit in terms of removing complex codegen is not there (the codegen was only one line), but if i can remove codegen here i could also remove it in lots of other places where it does get very unwieldy simply by replacing it with calls to methods.

Function1 is specialized, so i think (or hope) that my version does no extra boxes/unboxing.

On Fri, Feb 10, 2017 at 2:24 PM, Reynold Xin <[hidden email]> wrote:
With complex types it doesn't work as well, but for primitive types the biggest benefit of whole stage codegen is that we don't even need to put the intermediate data into rows or columns anymore. They are just variables (stored in CPU registers).

On Fri, Feb 10, 2017 at 8:22 PM, Koert Kuipers <[hidden email]> wrote:
so i have been looking for a while now at all the catalyst expressions, and all the relative complex codegen going on.

so first off i get the benefit of codegen to turn a bunch of chained iterators transformations into a single codegen stage for spark. that makes sense to me, because it avoids a bunch of overhead.

but what i am not so sure about is what the benefit is of converting the actual stuff that happens inside the iterator transformations into codegen.

say if we have an expression that has 2 children and creates a struct for them. why would this be faster in codegen by re-creating the code to do this in a string (which is complex and error prone) compared to simply have the codegen call the normal method for this in my class?

i see so much trivial code be re-created in codegen. stuff like this:

  private[this] def castToDateCode(
      from: DataType,
      ctx: CodegenContext): CastFunction = from match {
    case StringType =>
      val intOpt = ctx.freshName("intOpt")
      (c, evPrim, evNull) => s"""
        scala.Option<Integer> $intOpt =
          org.apache.spark.sql.catalyst.util.DateTimeUtils.stringToDate($c);
        if ($intOpt.isDefined()) {
          $evPrim = ((Integer) $intOpt.get()).intValue();
        } else {
          $evNull = true;
        }
       """

is this really faster than simply calling an equivalent functions from the codegen, and keeping the codegen logic restricted to the "unrolling" of chained iterators?





Loading...