Implementation of RNN/LSTM in Spark

classic Classic list List threaded Threaded
12 messages Options
Reply | Threaded
Open this post in threaded view
|

Implementation of RNN/LSTM in Spark

Disha Shrivastava
Hi,

I wanted to know if someone is working on implementing RNN/LSTM in Spark or has already done. I am also willing to contribute to it and get some guidance on how to go about it.

Thanks and Regards
Disha
Masters Student, IIT Delhi
Reply | Threaded
Open this post in threaded view
|

Re: Implementation of RNN/LSTM in Spark

Sasaki Kai
Hi, Disha

There seems to be no JIRA on RNN/LSTM directly. But there were several tickets about other type of networks regarding deep learning.

Stacked Auto Encoder
https://issues.apache.org/jira/browse/SPARK-2623
CNN

Roadmap of MLlib deep learning

I think it may be good to join the discussion on SPARK-5575. 
Best

Kai Sasaki


On Nov 2, 2015, at 1:59 PM, Disha Shrivastava <[hidden email]> wrote:

Hi,

I wanted to know if someone is working on implementing RNN/LSTM in Spark or has already done. I am also willing to contribute to it and get some guidance on how to go about it.

Thanks and Regards
Disha
Masters Student, IIT Delhi

Reply | Threaded
Open this post in threaded view
|

Re: Implementation of RNN/LSTM in Spark

Disha Shrivastava
I would love to work on this and ask for ideas on how it can be done or can suggest some papers as starting point. Also, I wanted to know if Spark would be an ideal platform to have a distributive implementation for RNN/LSTM

On Mon, Nov 2, 2015 at 10:52 AM, Sasaki Kai <[hidden email]> wrote:
Hi, Disha

There seems to be no JIRA on RNN/LSTM directly. But there were several tickets about other type of networks regarding deep learning.

Stacked Auto Encoder
https://issues.apache.org/jira/browse/SPARK-2623
CNN

Roadmap of MLlib deep learning

I think it may be good to join the discussion on SPARK-5575. 
Best

Kai Sasaki


On Nov 2, 2015, at 1:59 PM, Disha Shrivastava <[hidden email]> wrote:

Hi,

I wanted to know if someone is working on implementing RNN/LSTM in Spark or has already done. I am also willing to contribute to it and get some guidance on how to go about it.

Thanks and Regards
Disha
Masters Student, IIT Delhi


Reply | Threaded
Open this post in threaded view
|

Re: Implementation of RNN/LSTM in Spark

Julio Antonio Soto de Vicente
Hi,
Is my understanding that little research has been done yet on distributed computation (without access to shared memory) in RNN. I also look forward to contributing in this respect.

El 03/11/2015, a las 16:00, Disha Shrivastava <[hidden email]> escribió:

I would love to work on this and ask for ideas on how it can be done or can suggest some papers as starting point. Also, I wanted to know if Spark would be an ideal platform to have a distributive implementation for RNN/LSTM

On Mon, Nov 2, 2015 at 10:52 AM, Sasaki Kai <[hidden email]> wrote:
Hi, Disha

There seems to be no JIRA on RNN/LSTM directly. But there were several tickets about other type of networks regarding deep learning.

Stacked Auto Encoder
https://issues.apache.org/jira/browse/SPARK-2623
CNN

Roadmap of MLlib deep learning

I think it may be good to join the discussion on SPARK-5575. 
Best

Kai Sasaki


On Nov 2, 2015, at 1:59 PM, Disha Shrivastava <[hidden email]> wrote:

Hi,

I wanted to know if someone is working on implementing RNN/LSTM in Spark or has already done. I am also willing to contribute to it and get some guidance on how to go about it.

Thanks and Regards
Disha
Masters Student, IIT Delhi


Reply | Threaded
Open this post in threaded view
|

Re: Implementation of RNN/LSTM in Spark

Disha Shrivastava
Hi Julio,

Can you please cite references based on the distributed implementation?

On Tue, Nov 3, 2015 at 8:52 PM, Julio Antonio Soto de Vicente <[hidden email]> wrote:
Hi,
Is my understanding that little research has been done yet on distributed computation (without access to shared memory) in RNN. I also look forward to contributing in this respect.

El 03/11/2015, a las 16:00, Disha Shrivastava <[hidden email]> escribió:

I would love to work on this and ask for ideas on how it can be done or can suggest some papers as starting point. Also, I wanted to know if Spark would be an ideal platform to have a distributive implementation for RNN/LSTM

On Mon, Nov 2, 2015 at 10:52 AM, Sasaki Kai <[hidden email]> wrote:
Hi, Disha

There seems to be no JIRA on RNN/LSTM directly. But there were several tickets about other type of networks regarding deep learning.

Stacked Auto Encoder
https://issues.apache.org/jira/browse/SPARK-2623
CNN

Roadmap of MLlib deep learning

I think it may be good to join the discussion on SPARK-5575. 
Best

Kai Sasaki


On Nov 2, 2015, at 1:59 PM, Disha Shrivastava <[hidden email]> wrote:

Hi,

I wanted to know if someone is working on implementing RNN/LSTM in Spark or has already done. I am also willing to contribute to it and get some guidance on how to go about it.

Thanks and Regards
Disha
Masters Student, IIT Delhi



Reply | Threaded
Open this post in threaded view
|

Re: Implementation of RNN/LSTM in Spark

Sasaki Kai
Hi, Disha

deeplearning4j seems to implement distributed RNN on core and scalaout packages.
http://deeplearning4j.org/recurrentnetwork.html
https://github.com/deeplearning4j/deeplearning4j
It might be helpful to refer the implementation of distributed RNN.

And also I found that there is a resources about efficient RNN and implementation.

Best

Kai

On Nov 4, 2015, at 12:25 AM, Disha Shrivastava <[hidden email]> wrote:

Hi Julio,

Can you please cite references based on the distributed implementation?

On Tue, Nov 3, 2015 at 8:52 PM, Julio Antonio Soto de Vicente <[hidden email]> wrote:
Hi,
Is my understanding that little research has been done yet on distributed computation (without access to shared memory) in RNN. I also look forward to contributing in this respect.

El 03/11/2015, a las 16:00, Disha Shrivastava <[hidden email]> escribió:

I would love to work on this and ask for ideas on how it can be done or can suggest some papers as starting point. Also, I wanted to know if Spark would be an ideal platform to have a distributive implementation for RNN/LSTM

On Mon, Nov 2, 2015 at 10:52 AM, Sasaki Kai <[hidden email]> wrote:
Hi, Disha

There seems to be no JIRA on RNN/LSTM directly. But there were several tickets about other type of networks regarding deep learning.

Stacked Auto Encoder
https://issues.apache.org/jira/browse/SPARK-2623
CNN

Roadmap of MLlib deep learning

I think it may be good to join the discussion on SPARK-5575. 
Best

Kai Sasaki


On Nov 2, 2015, at 1:59 PM, Disha Shrivastava <[hidden email]> wrote:

Hi,

I wanted to know if someone is working on implementing RNN/LSTM in Spark or has already done. I am also willing to contribute to it and get some guidance on how to go about it.

Thanks and Regards
Disha
Masters Student, IIT Delhi




Reply | Threaded
Open this post in threaded view
|

Re: Implementation of RNN/LSTM in Spark

n1kt0
Hi,
can anyone tell me what the current status about RNNs in Spark is?
Reply | Threaded
Open this post in threaded view
|

Re: Implementation of RNN/LSTM in Spark

Nick Pentreath
The short answer is there is none and highly unlikely to be inside of Spark MLlib any time in the near future.

The best bets are to look at other DL libraries - for JVM there is Deeplearning4J and BigDL (there are others but these seem to be the most comprehensive I have come across) - that run on Spark. Also there are various flavours of TensorFlow / Caffe on Spark. And of course the libs such as Torch, Keras, Tensorflow, MXNet, Caffe etc. Some of them have Java or Scala APIs and some form of Spark integration out there in the community (in varying states of development).

Integrations with Spark are a bit patchy currently but include the "XOnSpark" flavours mentioned above and TensorFrames (again, there may be others).

On Thu, 23 Feb 2017 at 14:23 n1kt0 <[hidden email]> wrote:
Hi,
can anyone tell me what the current status about RNNs in Spark is?



--
View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Implementation-of-RNN-LSTM-in-Spark-tp14866p21060.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

RE: Implementation of RNN/LSTM in Spark

Joeri Hermans
Hi Nikita,

We are actively working on this: https://github.com/cerndb/dist-keras This will allow you to run Keras on Spark (with distributed optimization algorithms) through pyspark. I recommend you to check the examples https://github.com/cerndb/dist-keras/tree/master/examples. However, you need to be aware that distributed optimization is a research topic, and has several approaches and caveats you need to be aware of. I wrote a blog post on this if you like to have some additional information on this topic https://db-blog.web.cern.ch/blog/joeri-hermans/2017-01-distributed-deep-learning-apache-spark-and-keras

However, if you don't want to use a distributed optimization algorithm, we also support a "sequential trainer" which allows you to train a model on Spark dataframes.

Kind regards,

Joeri
________________________________________.
From: Nick Pentreath [[hidden email]]
Sent: 23 February 2017 13:39
To: [hidden email]
Subject: Re: Implementation of RNN/LSTM in Spark

The short answer is there is none and highly unlikely to be inside of Spark MLlib any time in the near future.

The best bets are to look at other DL libraries - for JVM there is Deeplearning4J and BigDL (there are others but these seem to be the most comprehensive I have come across) - that run on Spark. Also there are various flavours of TensorFlow / Caffe on Spark. And of course the libs such as Torch, Keras, Tensorflow, MXNet, Caffe etc. Some of them have Java or Scala APIs and some form of Spark integration out there in the community (in varying states of development).

Integrations with Spark are a bit patchy currently but include the "XOnSpark" flavours mentioned above and TensorFrames (again, there may be others).

On Thu, 23 Feb 2017 at 14:23 n1kt0 <[hidden email]<mailto:[hidden email]>> wrote:
Hi,
can anyone tell me what the current status about RNNs in Spark is?



--
View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Implementation-of-RNN-LSTM-in-Spark-tp14866p21060.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]<mailto:[hidden email]>


---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Implementation of RNN/LSTM in Spark

Yuhao Yang
Welcome to try and contribute to our BigDL: https://github.com/intel-analytics/BigDL 

It's native on Spark and fast by leveraging Intel MKL. 

2017-02-23 4:51 GMT-08:00 Joeri Hermans <[hidden email]>:
Hi Nikita,

We are actively working on this: https://github.com/cerndb/dist-keras This will allow you to run Keras on Spark (with distributed optimization algorithms) through pyspark. I recommend you to check the examples https://github.com/cerndb/dist-keras/tree/master/examples. However, you need to be aware that distributed optimization is a research topic, and has several approaches and caveats you need to be aware of. I wrote a blog post on this if you like to have some additional information on this topic https://db-blog.web.cern.ch/blog/joeri-hermans/2017-01-distributed-deep-learning-apache-spark-and-keras

However, if you don't want to use a distributed optimization algorithm, we also support a "sequential trainer" which allows you to train a model on Spark dataframes.

Kind regards,

Joeri
________________________________________.
From: Nick Pentreath [[hidden email]]
Sent: 23 February 2017 13:39
To: [hidden email]
Subject: Re: Implementation of RNN/LSTM in Spark

The short answer is there is none and highly unlikely to be inside of Spark MLlib any time in the near future.

The best bets are to look at other DL libraries - for JVM there is Deeplearning4J and BigDL (there are others but these seem to be the most comprehensive I have come across) - that run on Spark. Also there are various flavours of TensorFlow / Caffe on Spark. And of course the libs such as Torch, Keras, Tensorflow, MXNet, Caffe etc. Some of them have Java or Scala APIs and some form of Spark integration out there in the community (in varying states of development).

Integrations with Spark are a bit patchy currently but include the "XOnSpark" flavours mentioned above and TensorFrames (again, there may be others).

On Thu, 23 Feb 2017 at 14:23 n1kt0 <[hidden email]<mailto:[hidden email]>> wrote:
Hi,
can anyone tell me what the current status about RNNs in Spark is?



--
View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Implementation-of-RNN-LSTM-in-Spark-tp14866p21060.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]<mailto:[hidden email]>


---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]


Reply | Threaded
Open this post in threaded view
|

Re: Implementation of RNN/LSTM in Spark

Michael Allman-2
Hi Yuhao,

BigDL looks very promising and it's a framework we're considering using. It seems the general approach to high performance DL is via GPUs. Your project mentions performance on a Xeon comparable to that of a GPU, but where does this claim come from? Can you provide benchmarks?

Thanks,

Michael

On Feb 27, 2017, at 11:11 PM, Yuhao Yang <[hidden email]> wrote:

Welcome to try and contribute to our BigDL: https://github.com/intel-analytics/BigDL 

It's native on Spark and fast by leveraging Intel MKL. 

2017-02-23 4:51 GMT-08:00 Joeri Hermans <[hidden email]>:
Hi Nikita,

We are actively working on this: https://github.com/cerndb/dist-keras This will allow you to run Keras on Spark (with distributed optimization algorithms) through pyspark. I recommend you to check the examples https://github.com/cerndb/dist-keras/tree/master/examples. However, you need to be aware that distributed optimization is a research topic, and has several approaches and caveats you need to be aware of. I wrote a blog post on this if you like to have some additional information on this topic https://db-blog.web.cern.ch/blog/joeri-hermans/2017-01-distributed-deep-learning-apache-spark-and-keras

However, if you don't want to use a distributed optimization algorithm, we also support a "sequential trainer" which allows you to train a model on Spark dataframes.

Kind regards,

Joeri
________________________________________.
From: Nick Pentreath [[hidden email]]
Sent: 23 February 2017 13:39
To: [hidden email]
Subject: Re: Implementation of RNN/LSTM in Spark

The short answer is there is none and highly unlikely to be inside of Spark MLlib any time in the near future.

The best bets are to look at other DL libraries - for JVM there is Deeplearning4J and BigDL (there are others but these seem to be the most comprehensive I have come across) - that run on Spark. Also there are various flavours of TensorFlow / Caffe on Spark. And of course the libs such as Torch, Keras, Tensorflow, MXNet, Caffe etc. Some of them have Java or Scala APIs and some form of Spark integration out there in the community (in varying states of development).

Integrations with Spark are a bit patchy currently but include the "XOnSpark" flavours mentioned above and TensorFrames (again, there may be others).

On Thu, 23 Feb 2017 at 14:23 n1kt0 <[hidden email]<mailto:[hidden email]>> wrote:
Hi,
can anyone tell me what the current status about RNNs in Spark is?



--
View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Implementation-of-RNN-LSTM-in-Spark-tp14866p21060.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]<mailto:[hidden email]>


---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]



Reply | Threaded
Open this post in threaded view
|

Re: Implementation of RNN/LSTM in Spark

Liang-Chi Hsieh
In reply to this post by Nick Pentreath

Yeah, I'd agree with Nick.

To have an implementation of RNN/LSTM in Spark, you may need a comprehensive abstraction of neural networks which is general enough to represent the computation (think of Torch, Keras, Tensorflow, MXNet, Caffe, etc.), and modify current computation engine to work with various computing units such as GPU. I don't think we will have such thing to be in Spark in the near future.

There are many efforts to integrate Spark and the specialized frameworks doing well in this abstraction and parallel computation. The best approach I think is to look at this efforts and contribute to them if possible.

Nick Pentreath wrote
The short answer is there is none and highly unlikely to be inside of Spark
MLlib any time in the near future.

The best bets are to look at other DL libraries - for JVM there is
Deeplearning4J and BigDL (there are others but these seem to be the most
comprehensive I have come across) - that run on Spark. Also there are
various flavours of TensorFlow / Caffe on Spark. And of course the libs
such as Torch, Keras, Tensorflow, MXNet, Caffe etc. Some of them have Java
or Scala APIs and some form of Spark integration out there in the community
(in varying states of development).

Integrations with Spark are a bit patchy currently but include the
"XOnSpark" flavours mentioned above and TensorFrames (again, there may be
others).

On Thu, 23 Feb 2017 at 14:23 n1kt0 <[hidden email]> wrote:

> Hi,
> can anyone tell me what the current status about RNNs in Spark is?
>
>
>
> --
> View this message in context:
> http://apache-spark-developers-list.1001551.n3.nabble.com/Implementation-of-RNN-LSTM-in-Spark-tp14866p21060.html
> Sent from the Apache Spark Developers List mailing list archive at
> Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: [hidden email]
>
>
Liang-Chi Hsieh | @viirya
Spark Technology Center
http://www.spark.tc/