Quantcast

multinomial logistic regression

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

multinomial logistic regression

Michael Kun Yang
Hi Spark-ers,

I implemented a SGD version of multinomial logistic regression based on
mllib's optimization package. If this classifier is in the future plan of
mllib, I will be happy to contribute my code.

Cheers
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: multinomial logistic regression

Evan R. Sparks
Hi Michael,

What strategy are you using to train the multinomial classifier? One-vs-all? I've got an optimized version of that method that I've been meaning to clean up and commit for a while. In particular, rather than shipping a (potentially very big) model with each map task, I ship it once before each iteration with a broadcast variable. Perhaps we can compare versions and incorporate some of my optimizations into your code?

Thanks,
Evan

> On Jan 6, 2014, at 10:57 AM, Michael Kun Yang <[hidden email]> wrote:
>
> Hi Spark-ers,
>
> I implemented a SGD version of multinomial logistic regression based on
> mllib's optimization package. If this classifier is in the future plan of
> mllib, I will be happy to contribute my code.
>
> Cheers
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: multinomial logistic regression

Michael Kun Yang
I actually have two versions:
one is based on gradient descent like the logistic regression on mllib.
the other is based on Newtown iteration, it is not as fast as SGD, but we
can get all the statistics from it like deviance, p-values and fisher info.

we can get confusion matrix in both versions

the gradient descent version is just a modification of logistic regression
with my own implementation. I did not use LabeledPoints class.


On Mon, Jan 6, 2014 at 11:13 AM, Evan Sparks <[hidden email]> wrote:

> Hi Michael,
>
> What strategy are you using to train the multinomial classifier?
> One-vs-all? I've got an optimized version of that method that I've been
> meaning to clean up and commit for a while. In particular, rather than
> shipping a (potentially very big) model with each map task, I ship it once
> before each iteration with a broadcast variable. Perhaps we can compare
> versions and incorporate some of my optimizations into your code?
>
> Thanks,
> Evan
>
> > On Jan 6, 2014, at 10:57 AM, Michael Kun Yang <[hidden email]>
> wrote:
> >
> > Hi Spark-ers,
> >
> > I implemented a SGD version of multinomial logistic regression based on
> > mllib's optimization package. If this classifier is in the future plan of
> > mllib, I will be happy to contribute my code.
> >
> > Cheers
>
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: multinomial logistic regression

Hossein
Hi Michael,

This sounds great. Would you please send these as a pull request.
Especially if you can make your Newtown method implementation generic such
that it can later be used by other algorithms, it would be very helpful.
For example, you could add it as another optimization method under
mllib/optimization.

Was there a particular reason you chose not use LabeledPoint?

We have some instructions for contributions here: <
https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark>

Thanks,

--Hossein


On Mon, Jan 6, 2014 at 11:33 AM, Michael Kun Yang <[hidden email]>wrote:

> I actually have two versions:
> one is based on gradient descent like the logistic regression on mllib.
> the other is based on Newtown iteration, it is not as fast as SGD, but we
> can get all the statistics from it like deviance, p-values and fisher info.
>
> we can get confusion matrix in both versions
>
> the gradient descent version is just a modification of logistic regression
> with my own implementation. I did not use LabeledPoints class.
>
>
> On Mon, Jan 6, 2014 at 11:13 AM, Evan Sparks <[hidden email]>
> wrote:
>
> > Hi Michael,
> >
> > What strategy are you using to train the multinomial classifier?
> > One-vs-all? I've got an optimized version of that method that I've been
> > meaning to clean up and commit for a while. In particular, rather than
> > shipping a (potentially very big) model with each map task, I ship it
> once
> > before each iteration with a broadcast variable. Perhaps we can compare
> > versions and incorporate some of my optimizations into your code?
> >
> > Thanks,
> > Evan
> >
> > > On Jan 6, 2014, at 10:57 AM, Michael Kun Yang <[hidden email]>
> > wrote:
> > >
> > > Hi Spark-ers,
> > >
> > > I implemented a SGD version of multinomial logistic regression based on
> > > mllib's optimization package. If this classifier is in the future plan
> of
> > > mllib, I will be happy to contribute my code.
> > >
> > > Cheers
> >
>
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: multinomial logistic regression

Michael Kun Yang
Hi Hossein,

I can still use LabeledPoint with little modification. Currently I convert
the category into {0, 1} sequence, but I can do the conversion in the body
of methods or functions.

In order to make the code run faster, I try not to use DoubleMatrix
abstraction to avoid memory allocation; another reason is that jblas has no
data structure to handle symmetric matrix addition efficiently.

My code is not very pretty because I handle matrix operations manually (by
indexing).

If you think it is ok, I will make a pull request.


On Mon, Jan 6, 2014 at 5:34 PM, Hossein <[hidden email]> wrote:

> Hi Michael,
>
> This sounds great. Would you please send these as a pull request.
> Especially if you can make your Newtown method implementation generic such
> that it can later be used by other algorithms, it would be very helpful.
> For example, you could add it as another optimization method under
> mllib/optimization.
>
> Was there a particular reason you chose not use LabeledPoint?
>
> We have some instructions for contributions here: <
> https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark>
>
> Thanks,
>
> --Hossein
>
>
> On Mon, Jan 6, 2014 at 11:33 AM, Michael Kun Yang <[hidden email]
> >wrote:
>
> > I actually have two versions:
> > one is based on gradient descent like the logistic regression on mllib.
> > the other is based on Newtown iteration, it is not as fast as SGD, but we
> > can get all the statistics from it like deviance, p-values and fisher
> info.
> >
> > we can get confusion matrix in both versions
> >
> > the gradient descent version is just a modification of logistic
> regression
> > with my own implementation. I did not use LabeledPoints class.
> >
> >
> > On Mon, Jan 6, 2014 at 11:13 AM, Evan Sparks <[hidden email]>
> > wrote:
> >
> > > Hi Michael,
> > >
> > > What strategy are you using to train the multinomial classifier?
> > > One-vs-all? I've got an optimized version of that method that I've been
> > > meaning to clean up and commit for a while. In particular, rather than
> > > shipping a (potentially very big) model with each map task, I ship it
> > once
> > > before each iteration with a broadcast variable. Perhaps we can compare
> > > versions and incorporate some of my optimizations into your code?
> > >
> > > Thanks,
> > > Evan
> > >
> > > > On Jan 6, 2014, at 10:57 AM, Michael Kun Yang <[hidden email]>
> > > wrote:
> > > >
> > > > Hi Spark-ers,
> > > >
> > > > I implemented a SGD version of multinomial logistic regression based
> on
> > > > mllib's optimization package. If this classifier is in the future
> plan
> > of
> > > > mllib, I will be happy to contribute my code.
> > > >
> > > > Cheers
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: multinomial logistic regression

rxin
Thanks. Why don't you submit a pr and then we can work on it?

> On Jan 6, 2014, at 6:15 PM, Michael Kun Yang <[hidden email]> wrote:
>
> Hi Hossein,
>
> I can still use LabeledPoint with little modification. Currently I convert
> the category into {0, 1} sequence, but I can do the conversion in the body
> of methods or functions.
>
> In order to make the code run faster, I try not to use DoubleMatrix
> abstraction to avoid memory allocation; another reason is that jblas has no
> data structure to handle symmetric matrix addition efficiently.
>
> My code is not very pretty because I handle matrix operations manually (by
> indexing).
>
> If you think it is ok, I will make a pull request.
>
>
>> On Mon, Jan 6, 2014 at 5:34 PM, Hossein <[hidden email]> wrote:
>>
>> Hi Michael,
>>
>> This sounds great. Would you please send these as a pull request.
>> Especially if you can make your Newtown method implementation generic such
>> that it can later be used by other algorithms, it would be very helpful.
>> For example, you could add it as another optimization method under
>> mllib/optimization.
>>
>> Was there a particular reason you chose not use LabeledPoint?
>>
>> We have some instructions for contributions here: <
>> https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark>
>>
>> Thanks,
>>
>> --Hossein
>>
>>
>> On Mon, Jan 6, 2014 at 11:33 AM, Michael Kun Yang <[hidden email]
>>> wrote:
>>
>>> I actually have two versions:
>>> one is based on gradient descent like the logistic regression on mllib.
>>> the other is based on Newtown iteration, it is not as fast as SGD, but we
>>> can get all the statistics from it like deviance, p-values and fisher
>> info.
>>>
>>> we can get confusion matrix in both versions
>>>
>>> the gradient descent version is just a modification of logistic
>> regression
>>> with my own implementation. I did not use LabeledPoints class.
>>>
>>>
>>> On Mon, Jan 6, 2014 at 11:13 AM, Evan Sparks <[hidden email]>
>>> wrote:
>>>
>>>> Hi Michael,
>>>>
>>>> What strategy are you using to train the multinomial classifier?
>>>> One-vs-all? I've got an optimized version of that method that I've been
>>>> meaning to clean up and commit for a while. In particular, rather than
>>>> shipping a (potentially very big) model with each map task, I ship it
>>> once
>>>> before each iteration with a broadcast variable. Perhaps we can compare
>>>> versions and incorporate some of my optimizations into your code?
>>>>
>>>> Thanks,
>>>> Evan
>>>>
>>>>>> On Jan 6, 2014, at 10:57 AM, Michael Kun Yang <[hidden email]>
>>>>> wrote:
>>>>>
>>>>> Hi Spark-ers,
>>>>>
>>>>> I implemented a SGD version of multinomial logistic regression based
>> on
>>>>> mllib's optimization package. If this classifier is in the future
>> plan
>>> of
>>>>> mllib, I will be happy to contribute my code.
>>>>>
>>>>> Cheers
>>
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: multinomial logistic regression

Michael Kun Yang
Thanks, will do.


On Mon, Jan 6, 2014 at 6:21 PM, Reynold Xin <[hidden email]> wrote:

> Thanks. Why don't you submit a pr and then we can work on it?
>
> > On Jan 6, 2014, at 6:15 PM, Michael Kun Yang <[hidden email]>
> wrote:
> >
> > Hi Hossein,
> >
> > I can still use LabeledPoint with little modification. Currently I
> convert
> > the category into {0, 1} sequence, but I can do the conversion in the
> body
> > of methods or functions.
> >
> > In order to make the code run faster, I try not to use DoubleMatrix
> > abstraction to avoid memory allocation; another reason is that jblas has
> no
> > data structure to handle symmetric matrix addition efficiently.
> >
> > My code is not very pretty because I handle matrix operations manually
> (by
> > indexing).
> >
> > If you think it is ok, I will make a pull request.
> >
> >
> >> On Mon, Jan 6, 2014 at 5:34 PM, Hossein <[hidden email]> wrote:
> >>
> >> Hi Michael,
> >>
> >> This sounds great. Would you please send these as a pull request.
> >> Especially if you can make your Newtown method implementation generic
> such
> >> that it can later be used by other algorithms, it would be very helpful.
> >> For example, you could add it as another optimization method under
> >> mllib/optimization.
> >>
> >> Was there a particular reason you chose not use LabeledPoint?
> >>
> >> We have some instructions for contributions here: <
> >> https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark
> >
> >>
> >> Thanks,
> >>
> >> --Hossein
> >>
> >>
> >> On Mon, Jan 6, 2014 at 11:33 AM, Michael Kun Yang <[hidden email]
> >>> wrote:
> >>
> >>> I actually have two versions:
> >>> one is based on gradient descent like the logistic regression on mllib.
> >>> the other is based on Newtown iteration, it is not as fast as SGD, but
> we
> >>> can get all the statistics from it like deviance, p-values and fisher
> >> info.
> >>>
> >>> we can get confusion matrix in both versions
> >>>
> >>> the gradient descent version is just a modification of logistic
> >> regression
> >>> with my own implementation. I did not use LabeledPoints class.
> >>>
> >>>
> >>> On Mon, Jan 6, 2014 at 11:13 AM, Evan Sparks <[hidden email]>
> >>> wrote:
> >>>
> >>>> Hi Michael,
> >>>>
> >>>> What strategy are you using to train the multinomial classifier?
> >>>> One-vs-all? I've got an optimized version of that method that I've
> been
> >>>> meaning to clean up and commit for a while. In particular, rather than
> >>>> shipping a (potentially very big) model with each map task, I ship it
> >>> once
> >>>> before each iteration with a broadcast variable. Perhaps we can
> compare
> >>>> versions and incorporate some of my optimizations into your code?
> >>>>
> >>>> Thanks,
> >>>> Evan
> >>>>
> >>>>>> On Jan 6, 2014, at 10:57 AM, Michael Kun Yang <[hidden email]
> >
> >>>>> wrote:
> >>>>>
> >>>>> Hi Spark-ers,
> >>>>>
> >>>>> I implemented a SGD version of multinomial logistic regression based
> >> on
> >>>>> mllib's optimization package. If this classifier is in the future
> >> plan
> >>> of
> >>>>> mllib, I will be happy to contribute my code.
> >>>>>
> >>>>> Cheers
> >>
>
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: multinomial logistic regression

Michael Kun Yang
I just sent the pr for multinomial logistic regression.


On Mon, Jan 6, 2014 at 6:26 PM, Michael Kun Yang <[hidden email]>wrote:

> Thanks, will do.
>
>
> On Mon, Jan 6, 2014 at 6:21 PM, Reynold Xin <[hidden email]> wrote:
>
>> Thanks. Why don't you submit a pr and then we can work on it?
>>
>> > On Jan 6, 2014, at 6:15 PM, Michael Kun Yang <[hidden email]>
>> wrote:
>> >
>> > Hi Hossein,
>> >
>> > I can still use LabeledPoint with little modification. Currently I
>> convert
>> > the category into {0, 1} sequence, but I can do the conversion in the
>> body
>> > of methods or functions.
>> >
>> > In order to make the code run faster, I try not to use DoubleMatrix
>> > abstraction to avoid memory allocation; another reason is that jblas
>> has no
>> > data structure to handle symmetric matrix addition efficiently.
>> >
>> > My code is not very pretty because I handle matrix operations manually
>> (by
>> > indexing).
>> >
>> > If you think it is ok, I will make a pull request.
>> >
>> >
>> >> On Mon, Jan 6, 2014 at 5:34 PM, Hossein <[hidden email]> wrote:
>> >>
>> >> Hi Michael,
>> >>
>> >> This sounds great. Would you please send these as a pull request.
>> >> Especially if you can make your Newtown method implementation generic
>> such
>> >> that it can later be used by other algorithms, it would be very
>> helpful.
>> >> For example, you could add it as another optimization method under
>> >> mllib/optimization.
>> >>
>> >> Was there a particular reason you chose not use LabeledPoint?
>> >>
>> >> We have some instructions for contributions here: <
>> >>
>> https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark>
>> >>
>> >> Thanks,
>> >>
>> >> --Hossein
>> >>
>> >>
>> >> On Mon, Jan 6, 2014 at 11:33 AM, Michael Kun Yang <
>> [hidden email]
>> >>> wrote:
>> >>
>> >>> I actually have two versions:
>> >>> one is based on gradient descent like the logistic regression on
>> mllib.
>> >>> the other is based on Newtown iteration, it is not as fast as SGD,
>> but we
>> >>> can get all the statistics from it like deviance, p-values and fisher
>> >> info.
>> >>>
>> >>> we can get confusion matrix in both versions
>> >>>
>> >>> the gradient descent version is just a modification of logistic
>> >> regression
>> >>> with my own implementation. I did not use LabeledPoints class.
>> >>>
>> >>>
>> >>> On Mon, Jan 6, 2014 at 11:13 AM, Evan Sparks <[hidden email]>
>> >>> wrote:
>> >>>
>> >>>> Hi Michael,
>> >>>>
>> >>>> What strategy are you using to train the multinomial classifier?
>> >>>> One-vs-all? I've got an optimized version of that method that I've
>> been
>> >>>> meaning to clean up and commit for a while. In particular, rather
>> than
>> >>>> shipping a (potentially very big) model with each map task, I ship it
>> >>> once
>> >>>> before each iteration with a broadcast variable. Perhaps we can
>> compare
>> >>>> versions and incorporate some of my optimizations into your code?
>> >>>>
>> >>>> Thanks,
>> >>>> Evan
>> >>>>
>> >>>>>> On Jan 6, 2014, at 10:57 AM, Michael Kun Yang <
>> [hidden email]>
>> >>>>> wrote:
>> >>>>>
>> >>>>> Hi Spark-ers,
>> >>>>>
>> >>>>> I implemented a SGD version of multinomial logistic regression based
>> >> on
>> >>>>> mllib's optimization package. If this classifier is in the future
>> >> plan
>> >>> of
>> >>>>> mllib, I will be happy to contribute my code.
>> >>>>>
>> >>>>> Cheers
>> >>
>>
>
>
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: multinomial logistic regression

Michael Kun Yang
I will follow up the newtown one later


On Mon, Jan 6, 2014 at 9:14 PM, Michael Kun Yang <[hidden email]>wrote:

> I just sent the pr for multinomial logistic regression.
>
>
> On Mon, Jan 6, 2014 at 6:26 PM, Michael Kun Yang <[hidden email]>wrote:
>
>> Thanks, will do.
>>
>>
>> On Mon, Jan 6, 2014 at 6:21 PM, Reynold Xin <[hidden email]> wrote:
>>
>>> Thanks. Why don't you submit a pr and then we can work on it?
>>>
>>> > On Jan 6, 2014, at 6:15 PM, Michael Kun Yang <[hidden email]>
>>> wrote:
>>> >
>>> > Hi Hossein,
>>> >
>>> > I can still use LabeledPoint with little modification. Currently I
>>> convert
>>> > the category into {0, 1} sequence, but I can do the conversion in the
>>> body
>>> > of methods or functions.
>>> >
>>> > In order to make the code run faster, I try not to use DoubleMatrix
>>> > abstraction to avoid memory allocation; another reason is that jblas
>>> has no
>>> > data structure to handle symmetric matrix addition efficiently.
>>> >
>>> > My code is not very pretty because I handle matrix operations manually
>>> (by
>>> > indexing).
>>> >
>>> > If you think it is ok, I will make a pull request.
>>> >
>>> >
>>> >> On Mon, Jan 6, 2014 at 5:34 PM, Hossein <[hidden email]> wrote:
>>> >>
>>> >> Hi Michael,
>>> >>
>>> >> This sounds great. Would you please send these as a pull request.
>>> >> Especially if you can make your Newtown method implementation generic
>>> such
>>> >> that it can later be used by other algorithms, it would be very
>>> helpful.
>>> >> For example, you could add it as another optimization method under
>>> >> mllib/optimization.
>>> >>
>>> >> Was there a particular reason you chose not use LabeledPoint?
>>> >>
>>> >> We have some instructions for contributions here: <
>>> >>
>>> https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark>
>>> >>
>>> >> Thanks,
>>> >>
>>> >> --Hossein
>>> >>
>>> >>
>>> >> On Mon, Jan 6, 2014 at 11:33 AM, Michael Kun Yang <
>>> [hidden email]
>>> >>> wrote:
>>> >>
>>> >>> I actually have two versions:
>>> >>> one is based on gradient descent like the logistic regression on
>>> mllib.
>>> >>> the other is based on Newtown iteration, it is not as fast as SGD,
>>> but we
>>> >>> can get all the statistics from it like deviance, p-values and fisher
>>> >> info.
>>> >>>
>>> >>> we can get confusion matrix in both versions
>>> >>>
>>> >>> the gradient descent version is just a modification of logistic
>>> >> regression
>>> >>> with my own implementation. I did not use LabeledPoints class.
>>> >>>
>>> >>>
>>> >>> On Mon, Jan 6, 2014 at 11:13 AM, Evan Sparks <[hidden email]>
>>> >>> wrote:
>>> >>>
>>> >>>> Hi Michael,
>>> >>>>
>>> >>>> What strategy are you using to train the multinomial classifier?
>>> >>>> One-vs-all? I've got an optimized version of that method that I've
>>> been
>>> >>>> meaning to clean up and commit for a while. In particular, rather
>>> than
>>> >>>> shipping a (potentially very big) model with each map task, I ship
>>> it
>>> >>> once
>>> >>>> before each iteration with a broadcast variable. Perhaps we can
>>> compare
>>> >>>> versions and incorporate some of my optimizations into your code?
>>> >>>>
>>> >>>> Thanks,
>>> >>>> Evan
>>> >>>>
>>> >>>>>> On Jan 6, 2014, at 10:57 AM, Michael Kun Yang <
>>> [hidden email]>
>>> >>>>> wrote:
>>> >>>>>
>>> >>>>> Hi Spark-ers,
>>> >>>>>
>>> >>>>> I implemented a SGD version of multinomial logistic regression
>>> based
>>> >> on
>>> >>>>> mllib's optimization package. If this classifier is in the future
>>> >> plan
>>> >>> of
>>> >>>>> mllib, I will be happy to contribute my code.
>>> >>>>>
>>> >>>>> Cheers
>>> >>
>>>
>>
>>
>
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: multinomial logistic regression

Michael Kun Yang
I just sent the pr, fixed a typo in the comment. Add some comments and unit
test. Please let me know if you receive the patch.


On Mon, Jan 6, 2014 at 9:18 PM, Michael Kun Yang <[hidden email]>wrote:

> I will follow up the newtown one later
>
>
> On Mon, Jan 6, 2014 at 9:14 PM, Michael Kun Yang <[hidden email]>wrote:
>
>> I just sent the pr for multinomial logistic regression.
>>
>>
>> On Mon, Jan 6, 2014 at 6:26 PM, Michael Kun Yang <[hidden email]>wrote:
>>
>>> Thanks, will do.
>>>
>>>
>>> On Mon, Jan 6, 2014 at 6:21 PM, Reynold Xin <[hidden email]> wrote:
>>>
>>>> Thanks. Why don't you submit a pr and then we can work on it?
>>>>
>>>> > On Jan 6, 2014, at 6:15 PM, Michael Kun Yang <[hidden email]>
>>>> wrote:
>>>> >
>>>> > Hi Hossein,
>>>> >
>>>> > I can still use LabeledPoint with little modification. Currently I
>>>> convert
>>>> > the category into {0, 1} sequence, but I can do the conversion in the
>>>> body
>>>> > of methods or functions.
>>>> >
>>>> > In order to make the code run faster, I try not to use DoubleMatrix
>>>> > abstraction to avoid memory allocation; another reason is that jblas
>>>> has no
>>>> > data structure to handle symmetric matrix addition efficiently.
>>>> >
>>>> > My code is not very pretty because I handle matrix operations
>>>> manually (by
>>>> > indexing).
>>>> >
>>>> > If you think it is ok, I will make a pull request.
>>>> >
>>>> >
>>>> >> On Mon, Jan 6, 2014 at 5:34 PM, Hossein <[hidden email]> wrote:
>>>> >>
>>>> >> Hi Michael,
>>>> >>
>>>> >> This sounds great. Would you please send these as a pull request.
>>>> >> Especially if you can make your Newtown method implementation
>>>> generic such
>>>> >> that it can later be used by other algorithms, it would be very
>>>> helpful.
>>>> >> For example, you could add it as another optimization method under
>>>> >> mllib/optimization.
>>>> >>
>>>> >> Was there a particular reason you chose not use LabeledPoint?
>>>> >>
>>>> >> We have some instructions for contributions here: <
>>>> >>
>>>> https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark
>>>> >
>>>> >>
>>>> >> Thanks,
>>>> >>
>>>> >> --Hossein
>>>> >>
>>>> >>
>>>> >> On Mon, Jan 6, 2014 at 11:33 AM, Michael Kun Yang <
>>>> [hidden email]
>>>> >>> wrote:
>>>> >>
>>>> >>> I actually have two versions:
>>>> >>> one is based on gradient descent like the logistic regression on
>>>> mllib.
>>>> >>> the other is based on Newtown iteration, it is not as fast as SGD,
>>>> but we
>>>> >>> can get all the statistics from it like deviance, p-values and
>>>> fisher
>>>> >> info.
>>>> >>>
>>>> >>> we can get confusion matrix in both versions
>>>> >>>
>>>> >>> the gradient descent version is just a modification of logistic
>>>> >> regression
>>>> >>> with my own implementation. I did not use LabeledPoints class.
>>>> >>>
>>>> >>>
>>>> >>> On Mon, Jan 6, 2014 at 11:13 AM, Evan Sparks <[hidden email]
>>>> >
>>>> >>> wrote:
>>>> >>>
>>>> >>>> Hi Michael,
>>>> >>>>
>>>> >>>> What strategy are you using to train the multinomial classifier?
>>>> >>>> One-vs-all? I've got an optimized version of that method that I've
>>>> been
>>>> >>>> meaning to clean up and commit for a while. In particular, rather
>>>> than
>>>> >>>> shipping a (potentially very big) model with each map task, I ship
>>>> it
>>>> >>> once
>>>> >>>> before each iteration with a broadcast variable. Perhaps we can
>>>> compare
>>>> >>>> versions and incorporate some of my optimizations into your code?
>>>> >>>>
>>>> >>>> Thanks,
>>>> >>>> Evan
>>>> >>>>
>>>> >>>>>> On Jan 6, 2014, at 10:57 AM, Michael Kun Yang <
>>>> [hidden email]>
>>>> >>>>> wrote:
>>>> >>>>>
>>>> >>>>> Hi Spark-ers,
>>>> >>>>>
>>>> >>>>> I implemented a SGD version of multinomial logistic regression
>>>> based
>>>> >> on
>>>> >>>>> mllib's optimization package. If this classifier is in the future
>>>> >> plan
>>>> >>> of
>>>> >>>>> mllib, I will be happy to contribute my code.
>>>> >>>>>
>>>> >>>>> Cheers
>>>> >>
>>>>
>>>
>>>
>>
>
Loading...