Removing references to slave (and maybe in the future master)

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

Removing references to slave (and maybe in the future master)

Holden Karau
Hi Folks,

I've started working on cleaning up the Spark code to remove references to slave since the word has a lot of negative connotations and we can generally replace it with more accurate/descriptive words in our code base. The PR is at https://github.com/apache/spark/pull/28864 (I'm a little uncertain on the place of where I chose the name "AgentLost" as the replacement, suggestions welcome).

At some point I think we should explore deprecating master as well, but that is used very broadley inside of our code and in our APIs, so while it is visible to more people changing it would be more work. I think having consensus around removing slave though is a good first step.

Cheers,

Holden

--
Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9 
Reply | Threaded
Open this post in threaded view
|

Re: Removing references to slave (and maybe in the future master)

rxin
Thanks for doing this. I think this is a great thing to do.

But we gotta be careful with API compatibility.


On Thu, Jun 18, 2020 at 11:32 AM, Holden Karau <[hidden email]> wrote:
Hi Folks,

I've started working on cleaning up the Spark code to remove references to slave since the word has a lot of negative connotations and we can generally replace it with more accurate/descriptive words in our code base. The PR is at https://github.com/apache/spark/pull/28864 (I'm a little uncertain on the place of where I chose the name "AgentLost" as the replacement, suggestions welcome).

At some point I think we should explore deprecating master as well, but that is used very broadley inside of our code and in our APIs, so while it is visible to more people changing it would be more work. I think having consensus around removing slave though is a good first step.

Cheers,

Holden

--
Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9 


smime.p7s (6K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Removing references to slave (and maybe in the future master)

Holden Karau
Thank you. I agree being careful with API comparability is important. I think in situations where the terms are exposed in our API we can introduce alternatives and deprecate the old ones to allow for a smooth migration.

On Thu, Jun 18, 2020 at 12:28 PM Reynold Xin <[hidden email]> wrote:
Thanks for doing this. I think this is a great thing to do.

But we gotta be careful with API compatibility.


On Thu, Jun 18, 2020 at 11:32 AM, Holden Karau <[hidden email]> wrote:
Hi Folks,

I've started working on cleaning up the Spark code to remove references to slave since the word has a lot of negative connotations and we can generally replace it with more accurate/descriptive words in our code base. The PR is at https://github.com/apache/spark/pull/28864 (I'm a little uncertain on the place of where I chose the name "AgentLost" as the replacement, suggestions welcome).

At some point I think we should explore deprecating master as well, but that is used very broadley inside of our code and in our APIs, so while it is visible to more people changing it would be more work. I think having consensus around removing slave though is a good first step.

Cheers,

Holden

--
Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9 

--
Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9 
Reply | Threaded
Open this post in threaded view
|

Re: Removing references to slave (and maybe in the future master)

Matei Zaharia
Administrator
Yup, it would be great to do this. FWIW, I would propose using “worker” everywhere instead unless it already means something in that context, just to have a single word for this (instead of multiple words such as agent, replica, etc), but I haven’t looked into whether that would make anything confusing.

On Jun 18, 2020, at 1:14 PM, Holden Karau <[hidden email]> wrote:

Thank you. I agree being careful with API comparability is important. I think in situations where the terms are exposed in our API we can introduce alternatives and deprecate the old ones to allow for a smooth migration.

On Thu, Jun 18, 2020 at 12:28 PM Reynold Xin <[hidden email]> wrote:
Thanks for doing this. I think this is a great thing to do.

But we gotta be careful with API compatibility.


On Thu, Jun 18, 2020 at 11:32 AM, Holden Karau <[hidden email]> wrote:
Hi Folks,

I've started working on cleaning up the Spark code to remove references to slave since the word has a lot of negative connotations and we can generally replace it with more accurate/descriptive words in our code base. The PR is at https://github.com/apache/spark/pull/28864 (I'm a little uncertain on the place of where I chose the name "AgentLost" as the replacement, suggestions welcome).

At some point I think we should explore deprecating master as well, but that is used very broadley inside of our code and in our APIs, so while it is visible to more people changing it would be more work. I think having consensus around removing slave though is a good first step.

Cheers,

Holden

--
Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9 

--
Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9 

Reply | Threaded
Open this post in threaded view
|

Re: Removing references to slave (and maybe in the future master)

Holden Karau
So I think using Worker everywhere would be a bit confusing since the relationship between worker and blockmanager replica is complex, also in the current PR `AgentLost` is not `WorkerLost` because it doesn't necessarily mean the worker is lost (there's a flag for if the worker has been lost).

On Thu, Jun 18, 2020 at 1:21 PM Matei Zaharia <[hidden email]> wrote:
Yup, it would be great to do this. FWIW, I would propose using “worker” everywhere instead unless it already means something in that context, just to have a single word for this (instead of multiple words such as agent, replica, etc), but I haven’t looked into whether that would make anything confusing.

On Jun 18, 2020, at 1:14 PM, Holden Karau <[hidden email]> wrote:

Thank you. I agree being careful with API comparability is important. I think in situations where the terms are exposed in our API we can introduce alternatives and deprecate the old ones to allow for a smooth migration.

On Thu, Jun 18, 2020 at 12:28 PM Reynold Xin <[hidden email]> wrote:
Thanks for doing this. I think this is a great thing to do.

But we gotta be careful with API compatibility.


On Thu, Jun 18, 2020 at 11:32 AM, Holden Karau <[hidden email]> wrote:
Hi Folks,

I've started working on cleaning up the Spark code to remove references to slave since the word has a lot of negative connotations and we can generally replace it with more accurate/descriptive words in our code base. The PR is at https://github.com/apache/spark/pull/28864 (I'm a little uncertain on the place of where I chose the name "AgentLost" as the replacement, suggestions welcome).

At some point I think we should explore deprecating master as well, but that is used very broadley inside of our code and in our APIs, so while it is visible to more people changing it would be more work. I think having consensus around removing slave though is a good first step.

Cheers,

Holden

--
Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9 

--
Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9 



--
Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9 
Reply | Threaded
Open this post in threaded view
|

Re: Removing references to slave (and maybe in the future master)

Erik Krogen
Thanks a lot for proposing this, Holden.

I'd be curious to know how others feel about also tackling the word blacklist -- while I think most would agree it is not as egregious as master/slave, it seems to be an appropriate time to use the momentum to really a make a best effort at removing any trace of language that would alienate potential community members. There is some discussion of this term in this blog post, which I also encourage reading: https://lethargy.org/~jesus/writes/a-guide-to-nomenclature-selection/

On Thu, Jun 18, 2020 at 1:27 PM Holden Karau <[hidden email]> wrote:
So I think using Worker everywhere would be a bit confusing since the relationship between worker and blockmanager replica is complex, also in the current PR `AgentLost` is not `WorkerLost` because it doesn't necessarily mean the worker is lost (there's a flag for if the worker has been lost).

On Thu, Jun 18, 2020 at 1:21 PM Matei Zaharia <[hidden email]> wrote:
Yup, it would be great to do this. FWIW, I would propose using “worker” everywhere instead unless it already means something in that context, just to have a single word for this (instead of multiple words such as agent, replica, etc), but I haven’t looked into whether that would make anything confusing.

On Jun 18, 2020, at 1:14 PM, Holden Karau <[hidden email]> wrote:

Thank you. I agree being careful with API comparability is important. I think in situations where the terms are exposed in our API we can introduce alternatives and deprecate the old ones to allow for a smooth migration.

On Thu, Jun 18, 2020 at 12:28 PM Reynold Xin <[hidden email]> wrote:
Thanks for doing this. I think this is a great thing to do.

But we gotta be careful with API compatibility.


On Thu, Jun 18, 2020 at 11:32 AM, Holden Karau <[hidden email]> wrote:
Hi Folks,

I've started working on cleaning up the Spark code to remove references to slave since the word has a lot of negative connotations and we can generally replace it with more accurate/descriptive words in our code base. The PR is at https://github.com/apache/spark/pull/28864 (I'm a little uncertain on the place of where I chose the name "AgentLost" as the replacement, suggestions welcome).

At some point I think we should explore deprecating master as well, but that is used very broadley inside of our code and in our APIs, so while it is visible to more people changing it would be more work. I think having consensus around removing slave though is a good first step.

Cheers,

Holden

--
Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9 

--
Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9 



--
Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9 
Reply | Threaded
Open this post in threaded view
|

Re: Removing references to slave (and maybe in the future master)

RussS
In reply to this post by Holden Karau
I really dislike the use of "worker" in the code base since it describes a process which doesn't actually do work, but I don't think it's in the scope for this ticket. I would definitely prefer we use "agent" instead of "worker" (or some other name) and have master switched to something like "resource manager" or something that actually describes the purpose of the process. 

I realize that touching "master" is going to disrupt just about everything but these name choices are usually the first thing that trips up new Spark Users. In my experience, I usually have to spend at least 15-20 minutes explaining that a worker will not actually do work, and the master won't run their application.

Thanks Holden for doing all the legwork on this!
Reply | Threaded
Open this post in threaded view
|

Re: Removing references to slave (and maybe in the future master)

Ryan Blue
Thanks for getting this started! I think it will be worth the effort, and it's great to get started early in the 3.x release line to give the most time to prepare for this.

On Thu, Jun 18, 2020 at 3:44 PM Russell Spitzer <[hidden email]> wrote:
I really dislike the use of "worker" in the code base since it describes a process which doesn't actually do work, but I don't think it's in the scope for this ticket. I would definitely prefer we use "agent" instead of "worker" (or some other name) and have master switched to something like "resource manager" or something that actually describes the purpose of the process. 

I realize that touching "master" is going to disrupt just about everything but these name choices are usually the first thing that trips up new Spark Users. In my experience, I usually have to spend at least 15-20 minutes explaining that a worker will not actually do work, and the master won't run their application.

Thanks Holden for doing all the legwork on this!


--
Ryan Blue
Software Engineer
Netflix
Reply | Threaded
Open this post in threaded view
|

Re: Removing references to slave (and maybe in the future master)

Erik Krogen
I've created SPARK-32036 and SPARK-32037 for changes related to "blacklist"/"whitelist" terminology, the latter of which focuses on the blacklisting feature. I invite all of you to participate in the relevant discussion on SPARK-32037 in particular, given that it would be a substantial rename.

On Fri, Jun 19, 2020 at 1:29 PM Ryan Blue <[hidden email]> wrote:
Thanks for getting this started! I think it will be worth the effort, and it's great to get started early in the 3.x release line to give the most time to prepare for this.

On Thu, Jun 18, 2020 at 3:44 PM Russell Spitzer <[hidden email]> wrote:
I really dislike the use of "worker" in the code base since it describes a process which doesn't actually do work, but I don't think it's in the scope for this ticket. I would definitely prefer we use "agent" instead of "worker" (or some other name) and have master switched to something like "resource manager" or something that actually describes the purpose of the process. 

I realize that touching "master" is going to disrupt just about everything but these name choices are usually the first thing that trips up new Spark Users. In my experience, I usually have to spend at least 15-20 minutes explaining that a worker will not actually do work, and the master won't run their application.

Thanks Holden for doing all the legwork on this!


--
Ryan Blue
Software Engineer
Netflix