Question on Spark code

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Question on Spark code

kant kodali
Hi All,

I came across this file https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/internal/Logging.scala and I am wondering what is the purpose of this? Especially it doesn't prevent any string concatenation and also the if checks are already done by the library itself right?

Thanks!

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Question on Spark code

Herman van Hövell tot Westerflier-2
I am not getting the question. The logging trait does exactly what is says on the box, I don't see what string concatenation has to do with it.

On Sun, Jun 25, 2017 at 11:27 AM, kant kodali <[hidden email]> wrote:
Hi All,

I came across this file https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/internal/Logging.scala and I am wondering what is the purpose of this? Especially it doesn't prevent any string concatenation and also the if checks are already done by the library itself right?

Thanks!




--

Herman van Hövell

Software Engineer

Databricks Inc.

[hidden email]

+31 6 420 590 27

databricks.com

http://databricks.com



Announcing Databricks Serverless. The first serverless data science and big data platform. Watch the demo from Spark Summit 2017.

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Question on Spark code

Sean Owen
In reply to this post by kant kodali
Maybe you are looking for declarations like this. "=> String" means the arg isn't evaluated until it's used, which is just what you want with log statements. The message isn't constructed unless it will be logged.

protected def logInfo(msg: => String) {

On Sun, Jun 25, 2017 at 10:28 AM kant kodali <[hidden email]> wrote:
Hi All,

I came across this file https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/internal/Logging.scala and I am wondering what is the purpose of this? Especially it doesn't prevent any string concatenation and also the if checks are already done by the library itself right?

Thanks!

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Question on Spark code

kant kodali
@Sean Got it! I come from Java world so I guess I was wrong in assuming that arguments are evaluated during the method invocation time. How about the conditional checks to see if the log is InfoEnabled or DebugEnabled? For Example, 

if (log.isInfoEnabled) log.info(msg)

I hear we should use guard condition only when the string "msg" construction is expensive otherwise we will be taking a performance hit because of the additional "if" check unless the "log" itself is declared static final and scala compiler will strip away the "if" check and produce efficient byte code. Also log.info does the log.isInfoEnabled check inside the body prior to logging the messages.


Please correct me if I am wrong.




On Sun, Jun 25, 2017 at 3:04 AM, Sean Owen <[hidden email]> wrote:
Maybe you are looking for declarations like this. "=> String" means the arg isn't evaluated until it's used, which is just what you want with log statements. The message isn't constructed unless it will be logged.

protected def logInfo(msg: => String) {


On Sun, Jun 25, 2017 at 10:28 AM kant kodali <[hidden email]> wrote:
Hi All,

I came across this file https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/internal/Logging.scala and I am wondering what is the purpose of this? Especially it doesn't prevent any string concatenation and also the if checks are already done by the library itself right?

Thanks!


Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Question on Spark code

Sean Owen
I think it's more precise to say args like any expression are evaluated when their value is required. It's just that this special syntax causes extra code to be generated that makes it effectively a function passed, not value, and one that's lazily evaluated. Look at the bytecode if you're curious.

An if conditional is pretty trivial to evaluate here. I don't think that guidance is sound. The point is that it's not worth complicating the caller code in almost all cases by checking the guard condition manually.

I'm not sure what you're referring to, but no, no compiler can elide these conditions. They're based on runtime values that can change at runtime.

scala has an @elidable annotation which you can use to indicate to the compiler that the declaration should be entirely omitted when configured to elide above a certain detail level. This is how scalac elides assertions if you ask it to and you can do it to your own code. But this is something different, not what's happening here, and a fairly niche/advanced feature.


On Sun, Jun 25, 2017 at 8:25 PM kant kodali <[hidden email]> wrote:
@Sean Got it! I come from Java world so I guess I was wrong in assuming that arguments are evaluated during the method invocation time. How about the conditional checks to see if the log is InfoEnabled or DebugEnabled? For Example, 

if (log.isInfoEnabled) log.info(msg)

I hear we should use guard condition only when the string "msg" construction is expensive otherwise we will be taking a performance hit because of the additional "if" check unless the "log" itself is declared static final and scala compiler will strip away the "if" check and produce efficient byte code. Also log.info does the log.isInfoEnabled check inside the body prior to logging the messages.


Please correct me if I am wrong.




On Sun, Jun 25, 2017 at 3:04 AM, Sean Owen <[hidden email]> wrote:
Maybe you are looking for declarations like this. "=> String" means the arg isn't evaluated until it's used, which is just what you want with log statements. The message isn't constructed unless it will be logged.

protected def logInfo(msg: => String) {


On Sun, Jun 25, 2017 at 10:28 AM kant kodali <[hidden email]> wrote:
Hi All,

I came across this file https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/internal/Logging.scala and I am wondering what is the purpose of this? Especially it doesn't prevent any string concatenation and also the if checks are already done by the library itself right?

Thanks!


Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Question on Spark code

kant kodali
impressive! I need to learn more about scala.

What I mean stripping away conditional check in Java is this.

static final boolean isLogInfoEnabled = false;

public void logMessage(String message) {
    if(isLogInfoEnabled) {
        log.info(message)
    }
}

If you look at the byte code the dead if check will be removed.








On Sun, Jun 25, 2017 at 12:46 PM, Sean Owen <[hidden email]> wrote:
I think it's more precise to say args like any expression are evaluated when their value is required. It's just that this special syntax causes extra code to be generated that makes it effectively a function passed, not value, and one that's lazily evaluated. Look at the bytecode if you're curious.

An if conditional is pretty trivial to evaluate here. I don't think that guidance is sound. The point is that it's not worth complicating the caller code in almost all cases by checking the guard condition manually.

I'm not sure what you're referring to, but no, no compiler can elide these conditions. They're based on runtime values that can change at runtime.

scala has an @elidable annotation which you can use to indicate to the compiler that the declaration should be entirely omitted when configured to elide above a certain detail level. This is how scalac elides assertions if you ask it to and you can do it to your own code. But this is something different, not what's happening here, and a fairly niche/advanced feature.


On Sun, Jun 25, 2017 at 8:25 PM kant kodali <[hidden email]> wrote:
@Sean Got it! I come from Java world so I guess I was wrong in assuming that arguments are evaluated during the method invocation time. How about the conditional checks to see if the log is InfoEnabled or DebugEnabled? For Example, 

if (log.isInfoEnabled) log.info(msg)

I hear we should use guard condition only when the string "msg" construction is expensive otherwise we will be taking a performance hit because of the additional "if" check unless the "log" itself is declared static final and scala compiler will strip away the "if" check and produce efficient byte code. Also log.info does the log.isInfoEnabled check inside the body prior to logging the messages.


Please correct me if I am wrong.




On Sun, Jun 25, 2017 at 3:04 AM, Sean Owen <[hidden email]> wrote:
Maybe you are looking for declarations like this. "=> String" means the arg isn't evaluated until it's used, which is just what you want with log statements. The message isn't constructed unless it will be logged.

protected def logInfo(msg: => String) {


On Sun, Jun 25, 2017 at 10:28 AM kant kodali <[hidden email]> wrote:
Hi All,

I came across this file https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/internal/Logging.scala and I am wondering what is the purpose of this? Especially it doesn't prevent any string concatenation and also the if checks are already done by the library itself right?

Thanks!



Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Question on Spark code

Steve Loughran

On 25 Jun 2017, at 20:57, kant kodali <[hidden email]> wrote:

impressive! I need to learn more about scala.

What I mean stripping away conditional check in Java is this.

static final boolean isLogInfoEnabled = false;

public void logMessage(String message) {
    if(isLogInfoEnabled) {
        log.info(message)
    }
}

If you look at the byte code the dead if check will be removed.





Generally it's skipped in Java too now people move to SLF4J APIs, which does on-demand string expansion

LOG.info("network IO failure from {}  source to {}", src, dest, ex). That only builds the final string callis src.toString() and dest.toString() when needed; handling null values too. So you can skip those guards everywhere. But the string template is still constructed; it's not free, and there's some merit in maintaining the guard @ debug level, though I don't personally bother.

The spark one takes a closure, so it can do much more. However, you shouldn't do anything with side effects, or indeed, anything prone to throwing exceptions. Always try to write .toString() methods which are robust against null values, that is: valid for the entire life of an instance. Your debuggers will appreciate it too.

Loading...