LibSVM should have just one input file

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

LibSVM should have just one input file

darion.yaphet
Hi team :

Currently when we using SVM to train dataset we found the input files limit only one .

the source code as following :

val path = if (dataFiles.length == 1) {
dataFiles.head.getPath.toUri.toString
} else if (dataFiles.isEmpty) {
throw new IOException("No input path specified for libsvm data")
} else {
throw new IOException("Multiple input paths are not supported for libsvm data.")
}

The file store on the Distributed File System such as HDFS is split into mutil piece and I think this limit is not necessary . I'm not sure is it a bug ? or something I'm using not correctly . 

thanks a lot ~~~


 

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: LibSVM should have just one input file

颜发才(Yan Facai)
Hi, yaphet.
It seems that the code you pasted should be located in  LibSVM,  rather than SVM.
Do I misunderstand?

For LibSVMDataSource,
1. if numFeatures is unspecified, only one file is valid input.
val df = spark.read.format("libsvm")
  .load("data/mllib/sample_libsvm_data.txt")
2. otherwise, multiple files are OK.
val df = spark.read.format("libsvm")
  .option("numFeatures", "780")
  .load("data/mllib/sample_libsvm_data.txt")



On Mon, Jun 12, 2017 at 11:46 AM, darion.yaphet <[hidden email]> wrote:
Hi team :

Currently when we using SVM to train dataset we found the input files limit only one .

the source code as following :

val path = if (dataFiles.length == 1) {
dataFiles.head.getPath.toUri.toString
} else if (dataFiles.isEmpty) {
throw new IOException("No input path specified for libsvm data")
} else {
throw new IOException("Multiple input paths are not supported for libsvm data.")
}

The file store on the Distributed File System such as HDFS is split into mutil piece and I think this limit is not necessary . I'm not sure is it a bug ? or something I'm using not correctly . 

thanks a lot ~~~


 


Loading...