Scala vs PySpark Inconsistency: SQLContext/SparkSession access from DataFrame/DataSet

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Scala vs PySpark Inconsistency: SQLContext/SparkSession access from DataFrame/DataSet

Ben Roling
I tried this on the users mailing list but didn't get traction.  It's probably more appropriate here anyway.

I've noticed that DataSet.sqlContext is public in Scala but the equivalent (DataFrame._sc) in PySpark is named as if it should be treated as private.

Is this intentional?  If so, what's the rationale?  If not, then it feels like a bug and DataFrame should have some form of public access back to the context/session.  I'm happy to log the bug but thought I would ask here first.  Thanks!
Reply | Threaded
Open this post in threaded view
|

Re: Scala vs PySpark Inconsistency: SQLContext/SparkSession access from DataFrame/DataSet

zero323
Hi Ben,

Please note that `_sc` is not a SQLContext. It is a SparkContext, which
is used primarily for internal calls.

SQLContext is exposed through `sql_ctx`
(https://github.com/apache/spark/blob/8bfaa62f2fcc942dd99a63b20366167277bce2a1/python/pyspark/sql/dataframe.py#L80)

On 3/17/20 5:53 PM, Ben Roling wrote:

> I tried this on the users mailing list but didn't get traction.  It's
> probably more appropriate here anyway.
>
> I've noticed that DataSet.sqlContext is public in Scala but the
> equivalent (DataFrame._sc) in PySpark is named as if it should be
> treated as private.
>
> Is this intentional?  If so, what's the rationale?  If not, then it
> feels like a bug and DataFrame should have some form of public access
> back to the context/session.  I'm happy to log the bug but thought I
> would ask here first.  Thanks!
--
Best regards,
Maciej Szymkiewicz

Web: https://zero323.net
Keybase: https://keybase.io/zero323
Gigs: https://www.codementor.io/@zero323
PGP: C095AA7F33E6123A



signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Scala vs PySpark Inconsistency: SQLContext/SparkSession access from DataFrame/DataSet

Ben Roling
Ah, yes, I clearly didn't read closely enough!  It was right there in front of me...

Thanks!

On Wed, Mar 18, 2020, 12:36 PM Maciej Szymkiewicz <[hidden email]> wrote:
Hi Ben,

Please note that `_sc` is not a SQLContext. It is a SparkContext, which
is used primarily for internal calls.

SQLContext is exposed through `sql_ctx`
(https://github.com/apache/spark/blob/8bfaa62f2fcc942dd99a63b20366167277bce2a1/python/pyspark/sql/dataframe.py#L80)

On 3/17/20 5:53 PM, Ben Roling wrote:
> I tried this on the users mailing list but didn't get traction.  It's
> probably more appropriate here anyway.
>
> I've noticed that DataSet.sqlContext is public in Scala but the
> equivalent (DataFrame._sc) in PySpark is named as if it should be
> treated as private.
>
> Is this intentional?  If so, what's the rationale?  If not, then it
> feels like a bug and DataFrame should have some form of public access
> back to the context/session.  I'm happy to log the bug but thought I
> would ask here first.  Thanks!

--
Best regards,
Maciej Szymkiewicz

Web: https://zero323.net
Keybase: https://keybase.io/zero323
Gigs: https://www.codementor.io/@zero323
PGP: C095AA7F33E6123A