Stream Stream joins with update and complete mode

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Stream Stream joins with update and complete mode

sandeep_katta
As per the documentation
http://spark.apache.org/docs/2.3.2/structured-streaming-programming-guide.html#stream-stream-joins
, only append mode is supported

*As of Spark 2.3, you can use joins only when the query is in Append output
mode. Other output modes are not yet supported.*

But as per the code there is no check done for the output mode

// output mode checked is missed (UnsupportedOperationChecker.scala)
case LeftOuter =>
              if (!left.isStreaming && right.isStreaming) {
                throwError("Left outer join with a streaming
DataFrame/Dataset " +
                  "on the right and a static DataFrame/Dataset on the left
is not supported")
              } else if (left.isStreaming && right.isStreaming) {
                val watermarkInJoinKeys =
StreamingJoinHelper.isWatermarkInJoinKeys(subPlan)

                val hasValidWatermarkRange =
                  StreamingJoinHelper.getStateValueWatermark(
                    left.outputSet, right.outputSet, condition,
Some(1000000)).isDefined

                if (!watermarkInJoinKeys && !hasValidWatermarkRange) {
                  throwError("Stream-stream outer join between two streaming
DataFrame/Datasets " +
                    "is not supported without a watermark in the join keys,
or a watermark on " +
                    "the nullable side and an appropriate range condition")
                }
              }

If the documentation is correct, then I can raise the PR to fix the code




--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]