how to retain part of the features in LogisticRegressionModel (spark2.0)

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

how to retain part of the features in LogisticRegressionModel (spark2.0)

jinhong lu

I train my LogisticRegressionModel like this,  I want my model to retain only some of the features(e.g. 500 of them), not all the 5555 features. What shou I do? 
I use .setElasticNetParam(1.0), but still all the features is in lrModel.coefficients.

 import org.apache.spark.ml.classification.LogisticRegression
 val data=spark.read.format("libsvm").option("numFeatures","5555").load("/tmp/data/training_data3") 
 val Array(trainingData, testData) = data.randomSplit(Array(0.5, 0.5), seed = 1234L)

 val lr = new LogisticRegression()
 val lrModel = lr.fit(trainingData)
 println(s"Coefficients: ${lrModel.coefficients} Intercept: ${lrModel.intercept}")

 val predictions = lrModel.transform(testData)
 predictions.show()


Thanks, 
lujinhong

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: how to retain part of the features in LogisticRegressionModel (spark2.0)

jinhong lu
By the way, I found in spark 2.1 I can use setFamily() to decide binomial or multinomial, but how  can I do the same thing in spark 2.0.2?
If  not support , which one is used in spark 2.0.2?  binomial or multinomial?

在 2017年3月19日,18:12,jinhong lu <[hidden email]> 写道:


I train my LogisticRegressionModel like this,  I want my model to retain only some of the features(e.g. 500 of them), not all the 5555 features. What shou I do? 
I use .setElasticNetParam(1.0), but still all the features is in lrModel.coefficients.

 import org.apache.spark.ml.classification.LogisticRegression
 val data=spark.read.format("libsvm").option("numFeatures","5555").load("/tmp/data/training_data3") 
 val Array(trainingData, testData) = data.randomSplit(Array(0.5, 0.5), seed = 1234L)

 val lr = new LogisticRegression()
 val lrModel = lr.fit(trainingData)
 println(s"Coefficients: ${lrModel.coefficients} Intercept: ${lrModel.intercept}")

 val predictions = lrModel.transform(testData)
 predictions.show()


Thanks, 
lujinhong


Thanks,
lujinhong

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: how to retain part of the features in LogisticRegressionModel (spark2.0)

Dhanesh Padmanabhan
binomial. Please use in combination with onevsrest for multi-class problems in spark 2.0.2

Dhanesh
+91-9741125245

On Sun, Mar 19, 2017 at 4:29 PM, jinhong lu <[hidden email]> wrote:
By the way, I found in spark 2.1 I can use setFamily() to decide binomial or multinomial, but how  can I do the same thing in spark 2.0.2?
If  not support , which one is used in spark 2.0.2?  binomial or multinomial?

在 2017年3月19日,18:12,jinhong lu <[hidden email]> 写道:


I train my LogisticRegressionModel like this,  I want my model to retain only some of the features(e.g. 500 of them), not all the 5555 features. What shou I do? 
I use .setElasticNetParam(1.0), but still all the features is in lrModel.coefficients.

 import org.apache.spark.ml.classification.LogisticRegression
 val data=spark.read.format("libsvm").option("numFeatures","5555").load("/tmp/data/training_data3") 
 val Array(trainingData, testData) = data.randomSplit(Array(0.5, 0.5), seed = 1234L)

 val lr = new LogisticRegression()
 val lrModel = lr.fit(trainingData)
 println(s"Coefficients: ${lrModel.coefficients} Intercept: ${lrModel.intercept}")

 val predictions = lrModel.transform(testData)
 predictions.show()


Thanks, 
lujinhong


Thanks,
lujinhong


Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: how to retain part of the features in LogisticRegressionModel (spark2.0)

jinhong lu
Thanks Dhanesh,  and how about the features question?

在 2017年3月19日,19:08,Dhanesh Padmanabhan <[hidden email]> 写道:

Dhanesh

Thanks,
lujinhong

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: how to retain part of the features in LogisticRegressionModel (spark2.0)

Dhanesh Padmanabhan
It shouldn't be difficult to convert the coefficients to a sparse vector. Not sure if that is what you are looking for

-Dhanesh

On Sun, Mar 19, 2017 at 5:02 PM jinhong lu <[hidden email]> wrote:
Thanks Dhanesh,  and how about the features question?

在 2017年3月19日,19:08,Dhanesh Padmanabhan <[hidden email]> 写道:

Dhanesh

Thanks,
lujinhong

--
Dhanesh
+91-9741125245
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: how to retain part of the features in LogisticRegressionModel (spark2.0)

Yanbo Liang-2
Do you want to get sparse model that most of the coefficients are zeros? If yes, using L1 regularization leads to sparsity. But the LogisticRegressionModel coefficients vector's size is still equal with the number of features, you can get the non-zero elements manually. Actually, it would be a sparse vector (or matrix for multinomial case) if it's sparse enough.

Thanks
Yanbo

On Sun, Mar 19, 2017 at 5:02 AM, Dhanesh Padmanabhan <[hidden email]> wrote:
It shouldn't be difficult to convert the coefficients to a sparse vector. Not sure if that is what you are looking for

-Dhanesh

On Sun, Mar 19, 2017 at 5:02 PM jinhong lu <[hidden email]> wrote:
Thanks Dhanesh,  and how about the features question?

在 2017年3月19日,19:08,Dhanesh Padmanabhan <[hidden email]> 写道:

Dhanesh

Thanks,
lujinhong

--
Dhanesh
+91-9741125245

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: how to retain part of the features in LogisticRegressionModel (spark2.0)

颜发才(Yan Facai)
Hi, jinhong.
Do you use `setRegParam`, which is 0.0 by default ?


Both elasticNetParam and regParam are required if regularization is need.

val regParamL1 = $(elasticNetParam) * $(regParam)
val regParamL2 = (1.0 - $(elasticNetParam)) * $(regParam)




On Mon, Mar 20, 2017 at 6:31 PM, Yanbo Liang <[hidden email]> wrote:
Do you want to get sparse model that most of the coefficients are zeros? If yes, using L1 regularization leads to sparsity. But the LogisticRegressionModel coefficients vector's size is still equal with the number of features, you can get the non-zero elements manually. Actually, it would be a sparse vector (or matrix for multinomial case) if it's sparse enough.

Thanks
Yanbo

On Sun, Mar 19, 2017 at 5:02 AM, Dhanesh Padmanabhan <[hidden email]> wrote:
It shouldn't be difficult to convert the coefficients to a sparse vector. Not sure if that is what you are looking for

-Dhanesh

On Sun, Mar 19, 2017 at 5:02 PM jinhong lu <[hidden email]> wrote:
Thanks Dhanesh,  and how about the features question?

在 2017年3月19日,19:08,Dhanesh Padmanabhan <[hidden email]> 写道:

Dhanesh

Thanks,
lujinhong

--
Dhanesh
<a href="tel:+91%2097411%2025245" value="+919741125245" target="_blank">+91-9741125245


Loading...