Issue on SPARK 3.0.0 loading MultilayerPerceptronClassificationModel

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Issue on SPARK 3.0.0 loading MultilayerPerceptronClassificationModel

Steve Taylor

Hi,

 

I’m not sure if this is the right place to raise this, if not hopefully you can direct me to the right place.

 

I believe I have discovered a bug when loading MultilayerPerceptronClassificationModel in spark 3.0.0, scala 2.1.2 which I have tested and can see is not there in at least Spark 2.4.3, Scala 2.11.  (I’m not sure if the Scala version is important).

 

I am using pyspark on a databricks cluster and importing the library  “from pyspark.ml.classification import MultilayerPerceptronClassificationModel”

 

When running model=MultilayerPerceptronClassificationModel.(“load”) and then model. transform (df) I get the following error: IllegalArgumentException: MultilayerPerceptronClassifier_8055d1368e78 parameter solver given invalid value auto.

 

 

This issue can be easily replicated by running the example given on the spark documents: http://spark.apache.org/docs/latest/ml-classification-regression.html#multilayer-perceptron-classifier

 

Then adding a save model, load model and transform statement as such:

 

from pyspark.ml.classification import MultilayerPerceptronClassifier

from pyspark.ml.evaluation import MulticlassClassificationEvaluator

 

# Load training data

data = spark.read.format("libsvm")\

    .load("data/mllib/sample_multiclass_classification_data.txt")

 

# Split the data into train and test

splits = data.randomSplit([0.6, 0.4], 1234)

train = splits[0]

test = splits[1]

 

# specify layers for the neural network:

# input layer of size 4 (features), two intermediate of size 5 and 4

# and output of size 3 (classes)

layers = [4, 5, 4, 3]

 

# create the trainer and set its parameters

trainer = MultilayerPerceptronClassifier(maxIter=100, layers=layers, blockSize=128, seed=1234)

 

# train the model

model = trainer.fit(train)

 

# compute accuracy on the test set

result = model.transform(test)

predictionAndLabels = result.select("prediction", "label")

evaluator = MulticlassClassificationEvaluator(metricName="accuracy")

print("Test set accuracy = " + str(evaluator.evaluate(predictionAndLabels)))

 

from pyspark.ml.classification import MultilayerPerceptronClassifier, MultilayerPerceptronClassificationModel

model.save(Save_location)

model2. MultilayerPerceptronClassificationModel.load(Save_location)

 

result_from_loaded = model2.transform(test)

 

 

Reply | Threaded
Open this post in threaded view
|

Re: Issue on SPARK 3.0.0 loading MultilayerPerceptronClassificationModel

Sean Owen-2
Yeah that's a bug, I can reproduce it. Can you open a JIRA?
It works in Scala, so must be an issue with the Python wrapper. The
serialized model is fine; it's loading it back.

I think it's because the MultilayerPerceptronParams extends HasSolver
which defaults to 'auto', but doesn't seem to fully override it
correctly, as it picks up this default which isn't valid for MLP.

Huaxin maybe you have some insight? I think you have worked on this
code recently.

On Wed, Jul 8, 2020 at 4:05 AM Steve Taylor
<[hidden email]> wrote:

>
> Hi,
>
>
>
> I’m not sure if this is the right place to raise this, if not hopefully you can direct me to the right place.
>
>
>
> I believe I have discovered a bug when loading MultilayerPerceptronClassificationModel in spark 3.0.0, scala 2.1.2 which I have tested and can see is not there in at least Spark 2.4.3, Scala 2.11.  (I’m not sure if the Scala version is important).
>
>
>
> I am using pyspark on a databricks cluster and importing the library  “from pyspark.ml.classification import MultilayerPerceptronClassificationModel”
>
>
>
> When running model=MultilayerPerceptronClassificationModel.(“load”) and then model. transform (df) I get the following error: IllegalArgumentException: MultilayerPerceptronClassifier_8055d1368e78 parameter solver given invalid value auto.
>
>
>
>
>
> This issue can be easily replicated by running the example given on the spark documents: http://spark.apache.org/docs/latest/ml-classification-regression.html#multilayer-perceptron-classifier
>
>
>
> Then adding a save model, load model and transform statement as such:
>
>
>
> from pyspark.ml.classification import MultilayerPerceptronClassifier
>
> from pyspark.ml.evaluation import MulticlassClassificationEvaluator
>
>
>
> # Load training data
>
> data = spark.read.format("libsvm")\
>
>     .load("data/mllib/sample_multiclass_classification_data.txt")
>
>
>
> # Split the data into train and test
>
> splits = data.randomSplit([0.6, 0.4], 1234)
>
> train = splits[0]
>
> test = splits[1]
>
>
>
> # specify layers for the neural network:
>
> # input layer of size 4 (features), two intermediate of size 5 and 4
>
> # and output of size 3 (classes)
>
> layers = [4, 5, 4, 3]
>
>
>
> # create the trainer and set its parameters
>
> trainer = MultilayerPerceptronClassifier(maxIter=100, layers=layers, blockSize=128, seed=1234)
>
>
>
> # train the model
>
> model = trainer.fit(train)
>
>
>
> # compute accuracy on the test set
>
> result = model.transform(test)
>
> predictionAndLabels = result.select("prediction", "label")
>
> evaluator = MulticlassClassificationEvaluator(metricName="accuracy")
>
> print("Test set accuracy = " + str(evaluator.evaluate(predictionAndLabels)))
>
>
>
> from pyspark.ml.classification import MultilayerPerceptronClassifier, MultilayerPerceptronClassificationModel
>
> model.save(Save_location)
>
> model2. MultilayerPerceptronClassificationModel.load(Save_location)
>
>
>
> result_from_loaded = model2.transform(test)
>
>
>
>

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]