GraphFrames 0.5.0 - critical bug fix + other improvements

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
Report Content as Inappropriate

GraphFrames 0.5.0 - critical bug fix + other improvements

Joseph Bradley
Hi Spark community,

I'd like to announce a new release of GraphFrames, a Spark Package for DataFrame-based graphs!

We strongly encourage all users to use this latest release for the bug fix described below.

Critical bug fix
This release fixes a bug in indexing vertices.  This may have affected your results if:
* your graph uses non-Integer IDs and
* you use ConnectedComponents and other algorithms which are wrappers around GraphX.
The bug occurs when the input DataFrame is non-deterministic. E.g., running an algorithm on a DataFrame just loaded from disk should be fine in previous releases, but running that algorithm on a DataFrame produced using shuffling, unions, and other operators can cause incorrect results. This issue is fixed in this release.

New features
* Python API for aggregateMessages for building custom graph algorithms
* Scala API for parallel personalized PageRank, wrapping the GraphX implementation. This is only available when using GraphFrames with Spark 2.1+.

Support for Spark 1.6, 2.0, and 2.1

Special thanks to Felix Cheung for his work as a new committer for GraphFrames!

Thanks to all contributors and to the community for feedback!


Joseph Bradley

Software Engineer - Machine Learning

Databricks, Inc.