DAG in Pipeline

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

DAG in Pipeline

Pranay Tonpay
Hi,
Pipeline as of now seems to be having a series of transformers and estimators in a serial fashion.
Is it possible to create a DAG sort of thing -
Eg -
Two transformers running in parallel to cleanse data (a custom built Transformer)  in some way and then their outputs ( two outputs ) used for some sort of correlation ( another custom built Transformer )

Let me know -

thx
pranay
Reply | Threaded
Open this post in threaded view
|

Re: DAG in Pipeline

Joseph Bradley
Hi Pranay,

Yes, you can do this.  The DAG structure should be specified via the various Transformers' input and output columns, where a Transformer can have multiple input and/or output columns.  Most of the classification and regression Models are good examples of Transformers with multiple input and output columns.

Hope this helps!
Joseph

On Wed, Jun 8, 2016 at 9:59 PM, Pranay Tonpay <[hidden email]> wrote:
Hi,
Pipeline as of now seems to be having a series of transformers and estimators in a serial fashion.
Is it possible to create a DAG sort of thing -
Eg -
Two transformers running in parallel to cleanse data (a custom built Transformer)  in some way and then their outputs ( two outputs ) used for some sort of correlation ( another custom built Transformer )

Let me know -

thx
pranay

Reply | Threaded
Open this post in threaded view
|

Re: DAG in Pipeline

Joseph Bradley
One more note: When you specify the stages in the Pipeline, they need to be in topological order according to the DAG.

On Sun, Jun 12, 2016 at 10:47 AM, Joseph Bradley <[hidden email]> wrote:
Hi Pranay,

Yes, you can do this.  The DAG structure should be specified via the various Transformers' input and output columns, where a Transformer can have multiple input and/or output columns.  Most of the classification and regression Models are good examples of Transformers with multiple input and output columns.

Hope this helps!
Joseph

On Wed, Jun 8, 2016 at 9:59 PM, Pranay Tonpay <[hidden email]> wrote:
Hi,
Pipeline as of now seems to be having a series of transformers and estimators in a serial fashion.
Is it possible to create a DAG sort of thing -
Eg -
Two transformers running in parallel to cleanse data (a custom built Transformer)  in some way and then their outputs ( two outputs ) used for some sort of correlation ( another custom built Transformer )

Let me know -

thx
pranay


Reply | Threaded
Open this post in threaded view
|

Re: DAG in Pipeline

Hillel
This post has NOT been accepted by the mailing list yet.
Joseph, can you take a look at this SO thread? Based on the code snippets provided there it seems that the pipeline is always linear, and that the specification of inputCol/outputCol has no effect on the way the graph is created or executed.

If that's not the case, can you comment here or on the SO thread and provide some evidence to the contrary? If inputCol/outputCol are indeed used, do you happen to know the answers to the questions in the SO thread? (what happens when the user doesn't specify an inputCol or outputCol, etc.)
Reply | Threaded
Open this post in threaded view
|

Re: DAG in Pipeline

Danil Kirsanov
This post has NOT been accepted by the mailing list yet.
Thank you for the clarifying discussion.

I can see how you can put DAG in a pipeline if all the Dataframes have the same number of rows along the way.
What if I have two Dataframes with the different number of rows? Or splitting a Dataframe into two parts inside the pipeline?

In other words, are there any plans to extend Pipeline to more general DAGs?

Thank you,
Danil
Reply | Threaded
Open this post in threaded view
|

Re: DAG in Pipeline

Srikanth Sampath
In reply to this post by Joseph Bradley
Hi,
Pranay/Joseph, Can you share an example of ML DAG pipeline?
Thanks,
-Srikanth



--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]