This could be a good optimization. But can it be done without changing

any APIs or slowing anything else down? if so this could be worth a

pull request.

On Sun, Jul 1, 2018 at 9:21 PM Vincent Wang <

[hidden email]> wrote:

>

> Hi there,

>

> I'm using GBTClassifier do some classification jobs and find the performance of scoring stage is not quite satisfying. The trained model has about 160 trees and the input feature vector is sparse and its size is about 20+.

>

> After some digging, I found the model will repeatedly and randomly access feature in SparseVector when predicting an input vector, which will eventually call function breeze.linalg.SparseVector#apply. That function generally uses a binary search to locate the corresponding index so the complexity is O(log numNonZero).

>

> Then I tried to convert my feature vectors to dense vectors before inference and the result shows that the inference stage can speed up for about 2~3 times. (Random access in DenseVector is O(1))

>

> So my question is why not use breeze.linalg.HashVector when randomly accessing values in SpareVector since the complexity is O(1) according to Breeze's documentation, much better than the SparseVector in such case.

>

> Thanks,

> Vincent

---------------------------------------------------------------------

To unsubscribe e-mail:

[hidden email]