Re: Oracle JDBC - Spark SQL - Key Not Found: Scale

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Re: Oracle JDBC - Spark SQL - Key Not Found: Scale

Takeshi Yamamuro
-user +dev
cc: xiao

Hi, ayan,

I made pr to fix the issue that your reported though, it seems all the releases I checked (e.g., v1.6, v2.0, v2.1)
does not hit the issue. Could you described more about your environments and conditions?

You first reported you used v1.6 though, I checked and found that the exception does not exist there.
Do I miss anything?

// maropu



On Fri, Jan 27, 2017 at 11:10 AM, ayan guha <[hidden email]> wrote:
Hi

I will do a little more testing and will let you know. It did not work with INT and Number types, for sure.

While writing, everything is fine :)

On Fri, Jan 27, 2017 at 1:04 PM, Takeshi Yamamuro <[hidden email]> wrote:
How about this?
Or, how about using Double or something instead of Numeric?

// maropu

On Fri, Jan 27, 2017 at 10:25 AM, ayan guha <[hidden email]> wrote:
Okay, it is working with varchar columns only. Is there any way to workaround this?

On Fri, Jan 27, 2017 at 12:22 PM, ayan guha <[hidden email]> wrote:
hi

I thought so too, so I created a table with INT and Varchar columns

desc agtest1

Name Null Type          
---- ---- ------------- 
PID       NUMBER(38)    
DES       VARCHAR2(100) 

url="jdbc:oracle:thin:@mpimpclu1-scan:1521/DEVAIM"
table = "agtest1"
user = "bal"
password= "bal"
driver="oracle.jdbc.OracleDriver"
df = sqlContext.read.jdbc(url=url,table=table,properties={"user":user,"password":password,"driver":driver})


Still the issue persists. 

On Fri, Jan 27, 2017 at 11:19 AM, Takeshi Yamamuro <[hidden email]> wrote:
Hi,

I think you got this error because you used `NUMERIC` types in your schema (https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/jdbc/OracleDialect.scala#L32). So, IIUC avoiding the type is a workaround.

// maropu


On Fri, Jan 27, 2017 at 8:18 AM, ayan guha <[hidden email]> wrote:
Hi

I am facing exact issue with Oracle/Exadataas mentioned here. Any idea? I could not figure out so sending to this grou hoping someone have see it (and solved it)

Spark Version: 1.6
pyspark command:

pyspark --driver-class-path /opt/oracle/bigdatasql/bdcell-12.1/jlib-bds/kvclient.jar:/opt/oracle/bigdatasql/bdcell-12.1/jlib-bds/ojdbc7.jar:/opt/oracle/bigdatasql/bdcell-12.1/jlib-bds/ojdbc7-orig.jar:/opt/oracle/bigdatasql/bdcell-12.1/jlib-bds/oracle-hadoop-sql.jar:/opt/oracle/bigdatasql/bdcell-12.1/jlib-bds/ora-hadoop-common.jar:/opt/oracle/bigdatasql/bdcell-12.1/jlib-bds/ora-hadoop-common-orig.jar:/opt/oracle/bigdatasql/bdcell-12.1/jlib-bds/orahivedp.jar:/opt/oracle/bigdatasql/bdcell-12.1/jlib-bds/orahivedp-orig.jar:/opt/oracle/bigdatasql/bdcell-12.1/jlib-bds/orai18n.jar:/opt/oracle/bigdatasql/bdcell-12.1/jlib-bds/orai18n-orig.jar:/opt/oracle/bigdatasql/bdcell-12.1/jlib-bds/oraloader.jar:/opt/oracle/bigdatasql/bdcell-12.1/jlib-bds/oraloader-orig.jar   --conf spark.jars=/opt/oracle/bigdatasql/bdcell-12.1/jlib-bds/oracle-hadoop-sql.jar,/opt/oracle/bigdatasql/bdcell-12.1/jlib-bds/ora-hadoop-common.jar,/opt/oracle/bigdatasql/bdcell-12.1/jlib-bds/orahivedp.jar,/opt/oracle/bigdatasql/bdcell-12.1/jlib-bds/oraloader.jar,/opt/oracle/bigdatasql/bdcell-12.1/jlib-bds/ojdbc7.jar,/opt/oracle/bigdatasql/bdcell-12.1/jlib-bds/orai18n.jar/opt/oracle/bigdatasql/bdcell-12.1/jlib-bds/kvclient.jar,/opt/oracle/bigdatasql/bdcell-12.1/jlib-bds/ojdbc7.jar,/opt/oracle/bigdatasql/bdcell-12.1/jlib-bds/ojdbc7-orig.jar,/opt/oracle/bigdatasql/bdcell-12.1/jlib-bds/oracle-hadoop-sql.jar,/opt/oracle/bigdatasql/bdcell-12.1/jlib-bds/ora-hadoop-common.jar,/opt/oracle/bigdatasql/bdcell-12.1/jlib-bds/ora-hadoop-common-orig.jar,/opt/oracle/bigdatasql/bdcell-12.1/jlib-bds/orahivedp.jar,/opt/oracle/bigdatasql/bdcell-12.1/jlib-bds/orahivedp-orig.jar,/opt/oracle/bigdatasql/bdcell-12.1/jlib-bds/orai18n.jar,/opt/oracle/bigdatasql/bdcell-12.1/jlib-bds/orai18n-orig.jar,/opt/oracle/bigdatasql/bdcell-12.1/jlib-bds/oraloader.jar,/opt/oracle/bigdatasql/bdcell-12.1/jlib-bds/oraloader-orig.jar


Here is my code:

url="jdbc:oracle:thin:@mpimpclu1-scan:1521/DEVAIM"
table = "HIST_FORECAST_NEXT_BILL_DGTL"
user = "bal"
password= "bal"
driver="oracle.jdbc.OracleDriver"
df = sqlContext.read.jdbc(url=url,table=table,properties={"user":user,"password":password,"driver":driver})


Error:
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/opt/cloudera/parcels/CDH-5.8.3-1.cdh5.8.3.p2001.2081/lib/spark/python/pyspark/sql/readwriter.py", line 289, in jdbc
    return self._df(self._jreader.jdbc(url, table, jprop))
  File "/opt/cloudera/parcels/CDH-5.8.3-1.cdh5.8.3.p2001.2081/lib/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py", line 813, in __call__
  File "/opt/cloudera/parcels/CDH-5.8.3-1.cdh5.8.3.p2001.2081/lib/spark/python/pyspark/sql/utils.py", line 45, in deco
    return f(*a, **kw)
  File "/opt/cloudera/parcels/CDH-5.8.3-1.cdh5.8.3.p2001.2081/lib/spark/python/lib/py4j-0.9-src.zip/py4j/protocol.py", line 308, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o40.jdbc.
: java.util.NoSuchElementException: key not found: scale
        at scala.collection.MapLike$class.default(MapLike.scala:228)
        at scala.collection.AbstractMap.default(Map.scala:58)
        at scala.collection.MapLike$class.apply(MapLike.scala:141)
        at scala.collection.AbstractMap.apply(Map.scala:58)
        at org.apache.spark.sql.types.Metadata.get(Metadata.scala:108)
        at org.apache.spark.sql.types.Metadata.getLong(Metadata.scala:51)
        at org.apache.spark.sql.jdbc.OracleDialect$.getCatalystType(OracleDialect.scala:33)
        at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.resolveTable(JDBCRDD.scala:140)
        at org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation.<init>(JDBCRelation.scala:91)
        at org.apache.spark.sql.DataFrameReader.jdbc(DataFrameReader.scala:222)
        at org.apache.spark.sql.DataFrameReader.jdbc(DataFrameReader.scala:146)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:381)
        at py4j.Gateway.invoke(Gateway.java:259)
        at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
        at py4j.commands.CallCommand.execute(CallCommand.java:79)
        at py4j.GatewayConnection.run(GatewayConnection.java:209)
        at java.lang.Thread.run(Thread.java:745)



--
Best Regards,
Ayan Guha



--
---
Takeshi Yamamuro



--
Best Regards,
Ayan Guha



--
Best Regards,
Ayan Guha



--
---
Takeshi Yamamuro



--
Best Regards,
Ayan Guha



--
---
Takeshi Yamamuro
Reply | Threaded
Open this post in threaded view
|

Re: Oracle JDBC - Spark SQL - Key Not Found: Scale

Takeshi Yamamuro
Hi,

Thank for your report!
Sean had already told us why this happened here:
https://issues.apache.org/jira/browse/SPARK-19392

// maropu

On Mon, Feb 27, 2017 at 1:19 PM, ayan guha <[hidden email]> wrote:
Hi

I am using CDH 5.8 build of Spark 1.6, so some patches may have been made. 

Environment: Oracle Big Data Appliance, comes with CDH 5.8. 
I am using following command to launch:

pyspark --driver-class-path $BDA_ORACLE_AUXJAR_PATH/kvclient.jar:$BDA_ORACLE_AUXJAR_PATH/ojdbc7.jar   --conf spark.jars=$BDA_ORACLE_AUXJAR_PATH/kvclient.jar,$BDA_ORACLE_AUXJAR_PATH/ojdbc7.jar

>>> df = sqlContext.read.jdbc(url='jdbc:oracle:thin:@hostname:1521/DEVAIM',table="Table",properties="user":"user","password":"password","driver":"oracle.jdbc.OracleDriver"})

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/opt/cloudera/parcels/CDH-5.8.3-1.cdh5.8.3.p2001.2081/lib/spark/python/pyspark/sql/readwriter.py", line 289, in jdbc
    return self._df(self._jreader.jdbc(url, table, jprop))
  File "/opt/cloudera/parcels/CDH-5.8.3-1.cdh5.8.3.p2001.2081/lib/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py", line 813, in __call__
  File "/opt/cloudera/parcels/CDH-5.8.3-1.cdh5.8.3.p2001.2081/lib/spark/python/pyspark/sql/utils.py", line 45, in deco
    return f(*a, **kw)
  File "/opt/cloudera/parcels/CDH-5.8.3-1.cdh5.8.3.p2001.2081/lib/spark/python/lib/py4j-0.9-src.zip/py4j/protocol.py", line 308, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o40.jdbc.
: java.util.NoSuchElementException: key not found: scale
        at scala.collection.MapLike$class.default(MapLike.scala:228)
        at scala.collection.AbstractMap.default(Map.scala:58)
        at scala.collection.MapLike$class.apply(MapLike.scala:141)
        at scala.collection.AbstractMap.apply(Map.scala:58)
        at org.apache.spark.sql.types.Metadata.get(Metadata.scala:108)
        at org.apache.spark.sql.types.Metadata.getLong(Metadata.scala:51)
        at org.apache.spark.sql.jdbc.OracleDialect$.getCatalystType(OracleDialect.scala:33)
        at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.resolveTable(JDBCRDD.scala:140)
        at org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation.<init>(JDBCRelation.scala:91)
        at org.apache.spark.sql.DataFrameReader.jdbc(DataFrameReader.scala:222)
        at org.apache.spark.sql.DataFrameReader.jdbc(DataFrameReader.scala:146)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:381)
        at py4j.Gateway.invoke(Gateway.java:259)
        at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
        at py4j.commands.CallCommand.execute(CallCommand.java:79)
        at py4j.GatewayConnection.run(GatewayConnection.java:209)
        at java.lang.Thread.run(Thread.java:745)

Table Structure:
SURROGATE_KEY_ID NUMBER(19,0) No
SOURCE_KEY_PART_1 VARCHAR2(255 BYTE) No
SOURCE_KEY_PART_2 VARCHAR2(255 BYTE) Yes
SOURCE_KEY_PART_3 VARCHAR2(255 BYTE) Yes
SOURCE_KEY_PART_4 VARCHAR2(255 BYTE) Yes
SOURCE_KEY_PART_5 VARCHAR2(255 BYTE) Yes
SOURCE_KEY_PART_6 VARCHAR2(255 BYTE) Yes
SOURCE_KEY_PART_7 VARCHAR2(255 BYTE) Yes
SOURCE_KEY_PART_8 VARCHAR2(255 BYTE) Yes
SOURCE_KEY_PART_9 VARCHAR2(255 BYTE) Yes
SOURCE_KEY_PART_10 VARCHAR2(255 BYTE) Yes
SOURCE_SYSTEM_NAME VARCHAR2(50 BYTE) Yes
SOURCE_DOMAIN_NAME VARCHAR2(50 BYTE) Yes
EFFECTIVE_FROM_TIMESTAMP DATE No
EFFECTIVE_TO_TIMESTAMP DATE No
SESS_NO NUMBER(19,0) No

HTH....but please feel free to let me know if I can help in any other way...

Best
Ayan


On Fri, Feb 3, 2017 at 3:18 PM, Takeshi Yamamuro <[hidden email]> wrote:
-user +dev
cc: xiao

Hi, ayan,

I made pr to fix the issue that your reported though, it seems all the releases I checked (e.g., v1.6, v2.0, v2.1)
does not hit the issue. Could you described more about your environments and conditions?

You first reported you used v1.6 though, I checked and found that the exception does not exist there.
Do I miss anything?

// maropu



On Fri, Jan 27, 2017 at 11:10 AM, ayan guha <[hidden email]> wrote:
Hi

I will do a little more testing and will let you know. It did not work with INT and Number types, for sure.

While writing, everything is fine :)

On Fri, Jan 27, 2017 at 1:04 PM, Takeshi Yamamuro <[hidden email]> wrote:
How about this?
Or, how about using Double or something instead of Numeric?

// maropu

On Fri, Jan 27, 2017 at 10:25 AM, ayan guha <[hidden email]> wrote:
Okay, it is working with varchar columns only. Is there any way to workaround this?

On Fri, Jan 27, 2017 at 12:22 PM, ayan guha <[hidden email]> wrote:
hi

I thought so too, so I created a table with INT and Varchar columns

desc agtest1

Name Null Type          
---- ---- ------------- 
PID       NUMBER(38)    
DES       VARCHAR2(100) 

url="jdbc:oracle:thin:@mpimpclu1-scan:1521/DEVAIM"
table = "agtest1"
user = "bal"
password= "bal"
driver="oracle.jdbc.OracleDriver"
df = sqlContext.read.jdbc(url=url,table=table,properties={"user":user,"password":password,"driver":driver})


Still the issue persists. 

On Fri, Jan 27, 2017 at 11:19 AM, Takeshi Yamamuro <[hidden email]> wrote:
Hi,

I think you got this error because you used `NUMERIC` types in your schema (https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/jdbc/OracleDialect.scala#L32). So, IIUC avoiding the type is a workaround.

// maropu


On Fri, Jan 27, 2017 at 8:18 AM, ayan guha <[hidden email]> wrote:
Hi

I am facing exact issue with Oracle/Exadataas mentioned here. Any idea? I could not figure out so sending to this grou hoping someone have see it (and solved it)

Spark Version: 1.6
pyspark command:

pyspark --driver-class-path /opt/oracle/bigdatasql/bdcell-12.1/jlib-bds/kvclient.jar:/opt/oracle/bigdatasql/bdcell-12.1/jlib-bds/ojdbc7.jar:/opt/oracle/bigdatasql/bdcell-12.1/jlib-bds/ojdbc7-orig.jar:/opt/oracle/bigdatasql/bdcell-12.1/jlib-bds/oracle-hadoop-sql.jar:/opt/oracle/bigdatasql/bdcell-12.1/jlib-bds/ora-hadoop-common.jar:/opt/oracle/bigdatasql/bdcell-12.1/jlib-bds/ora-hadoop-common-orig.jar:/opt/oracle/bigdatasql/bdcell-12.1/jlib-bds/orahivedp.jar:/opt/oracle/bigdatasql/bdcell-12.1/jlib-bds/orahivedp-orig.jar:/opt/oracle/bigdatasql/bdcell-12.1/jlib-bds/orai18n.jar:/opt/oracle/bigdatasql/bdcell-12.1/jlib-bds/orai18n-orig.jar:/opt/oracle/bigdatasql/bdcell-12.1/jlib-bds/oraloader.jar:/opt/oracle/bigdatasql/bdcell-12.1/jlib-bds/oraloader-orig.jar   --conf spark.jars=/opt/oracle/bigdatasql/bdcell-12.1/jlib-bds/oracle-hadoop-sql.jar,/opt/oracle/bigdatasql/bdcell-12.1/jlib-bds/ora-hadoop-common.jar,/opt/oracle/bigdatasql/bdcell-12.1/jlib-bds/orahivedp.jar,/opt/oracle/bigdatasql/bdcell-12.1/jlib-bds/oraloader.jar,/opt/oracle/bigdatasql/bdcell-12.1/jlib-bds/ojdbc7.jar,/opt/oracle/bigdatasql/bdcell-12.1/jlib-bds/orai18n.jar/opt/oracle/bigdatasql/bdcell-12.1/jlib-bds/kvclient.jar,/opt/oracle/bigdatasql/bdcell-12.1/jlib-bds/ojdbc7.jar,/opt/oracle/bigdatasql/bdcell-12.1/jlib-bds/ojdbc7-orig.jar,/opt/oracle/bigdatasql/bdcell-12.1/jlib-bds/oracle-hadoop-sql.jar,/opt/oracle/bigdatasql/bdcell-12.1/jlib-bds/ora-hadoop-common.jar,/opt/oracle/bigdatasql/bdcell-12.1/jlib-bds/ora-hadoop-common-orig.jar,/opt/oracle/bigdatasql/bdcell-12.1/jlib-bds/orahivedp.jar,/opt/oracle/bigdatasql/bdcell-12.1/jlib-bds/orahivedp-orig.jar,/opt/oracle/bigdatasql/bdcell-12.1/jlib-bds/orai18n.jar,/opt/oracle/bigdatasql/bdcell-12.1/jlib-bds/orai18n-orig.jar,/opt/oracle/bigdatasql/bdcell-12.1/jlib-bds/oraloader.jar,/opt/oracle/bigdatasql/bdcell-12.1/jlib-bds/oraloader-orig.jar


Here is my code:

url="jdbc:oracle:thin:@mpimpclu1-scan:1521/DEVAIM"
table = "HIST_FORECAST_NEXT_BILL_DGTL"
user = "bal"
password= "bal"
driver="oracle.jdbc.OracleDriver"
df = sqlContext.read.jdbc(url=url,table=table,properties={"user":user,"password":password,"driver":driver})


Error:
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/opt/cloudera/parcels/CDH-5.8.3-1.cdh5.8.3.p2001.2081/lib/spark/python/pyspark/sql/readwriter.py", line 289, in jdbc
    return self._df(self._jreader.jdbc(url, table, jprop))
  File "/opt/cloudera/parcels/CDH-5.8.3-1.cdh5.8.3.p2001.2081/lib/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py", line 813, in __call__
  File "/opt/cloudera/parcels/CDH-5.8.3-1.cdh5.8.3.p2001.2081/lib/spark/python/pyspark/sql/utils.py", line 45, in deco
    return f(*a, **kw)
  File "/opt/cloudera/parcels/CDH-5.8.3-1.cdh5.8.3.p2001.2081/lib/spark/python/lib/py4j-0.9-src.zip/py4j/protocol.py", line 308, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o40.jdbc.
: java.util.NoSuchElementException: key not found: scale
        at scala.collection.MapLike$class.default(MapLike.scala:228)
        at scala.collection.AbstractMap.default(Map.scala:58)
        at scala.collection.MapLike$class.apply(MapLike.scala:141)
        at scala.collection.AbstractMap.apply(Map.scala:58)
        at org.apache.spark.sql.types.Metadata.get(Metadata.scala:108)
        at org.apache.spark.sql.types.Metadata.getLong(Metadata.scala:51)
        at org.apache.spark.sql.jdbc.OracleDialect$.getCatalystType(OracleDialect.scala:33)
        at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.resolveTable(JDBCRDD.scala:140)
        at org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation.<init>(JDBCRelation.scala:91)
        at org.apache.spark.sql.DataFrameReader.jdbc(DataFrameReader.scala:222)
        at org.apache.spark.sql.DataFrameReader.jdbc(DataFrameReader.scala:146)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:381)
        at py4j.Gateway.invoke(Gateway.java:259)
        at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
        at py4j.commands.CallCommand.execute(CallCommand.java:79)
        at py4j.GatewayConnection.run(GatewayConnection.java:209)
        at java.lang.Thread.run(Thread.java:745)



--
Best Regards,
Ayan Guha



--
---
Takeshi Yamamuro



--
Best Regards,
Ayan Guha



--
Best Regards,
Ayan Guha



--
---
Takeshi Yamamuro



--
Best Regards,
Ayan Guha



--
---
Takeshi Yamamuro



--
Best Regards,
Ayan Guha



--
---
Takeshi Yamamuro