TextSocketMicroBatchReader no longer supports nc utility

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

TextSocketMicroBatchReader no longer supports nc utility

Jungtaek Lim
Hi devs,

Not sure I can hear back the response sooner since Spark summit is just around the corner, but just would want to post and wait.

While playing with Spark 2.4.0-SNAPSHOT, I found nc command exits before reading actual data so the query also exits with error.

The reason is due to launching temporary reader for reading schema, and closing reader, and re-opening reader. While reliable socket server should be able to handle this without any issue, nc command normally can't handle multiple connections and simply exits when closing temporary reader.

I would like to file an issue and contribute on fixing this if we think this is a bug (otherwise we need to replace nc utility with another one, maybe our own implementation?), but not sure we are happy to apply workaround for specific source. 

Would like to hear opinions before giving a shot.

Thanks,
Jungtaek Lim (HeartSaVioR)
Reply | Threaded
Open this post in threaded view
|

Re: TextSocketMicroBatchReader no longer supports nc utility

Joseph Torres
I tend to agree that this is a bug. It's kinda silly that nc does this, but a socket connector that doesn't work with netcat will surely seem broken to users. It wouldn't be a huge change to defer opening the socket until a read is actually required.

On Sun, Jun 3, 2018 at 9:55 PM, Jungtaek Lim <[hidden email]> wrote:
Hi devs,

Not sure I can hear back the response sooner since Spark summit is just around the corner, but just would want to post and wait.

While playing with Spark 2.4.0-SNAPSHOT, I found nc command exits before reading actual data so the query also exits with error.

The reason is due to launching temporary reader for reading schema, and closing reader, and re-opening reader. While reliable socket server should be able to handle this without any issue, nc command normally can't handle multiple connections and simply exits when closing temporary reader.

I would like to file an issue and contribute on fixing this if we think this is a bug (otherwise we need to replace nc utility with another one, maybe our own implementation?), but not sure we are happy to apply workaround for specific source. 

Would like to hear opinions before giving a shot.

Thanks,
Jungtaek Lim (HeartSaVioR)

Reply | Threaded
Open this post in threaded view
|

Re: TextSocketMicroBatchReader no longer supports nc utility

Jungtaek Lim
Yeah that's why I initiated this thread, especially socket source is expected to be used from examples on official document or some experiments, which we tend to simply use netcat.

I'll file an issue and provide the fix.

2018년 6월 5일 (화) 오전 1:48, Joseph Torres <[hidden email]>님이 작성:
I tend to agree that this is a bug. It's kinda silly that nc does this, but a socket connector that doesn't work with netcat will surely seem broken to users. It wouldn't be a huge change to defer opening the socket until a read is actually required.

On Sun, Jun 3, 2018 at 9:55 PM, Jungtaek Lim <[hidden email]> wrote:
Hi devs,

Not sure I can hear back the response sooner since Spark summit is just around the corner, but just would want to post and wait.

While playing with Spark 2.4.0-SNAPSHOT, I found nc command exits before reading actual data so the query also exits with error.

The reason is due to launching temporary reader for reading schema, and closing reader, and re-opening reader. While reliable socket server should be able to handle this without any issue, nc command normally can't handle multiple connections and simply exits when closing temporary reader.

I would like to file an issue and contribute on fixing this if we think this is a bug (otherwise we need to replace nc utility with another one, maybe our own implementation?), but not sure we are happy to apply workaround for specific source. 

Would like to hear opinions before giving a shot.

Thanks,
Jungtaek Lim (HeartSaVioR)

Reply | Threaded
Open this post in threaded view
|

Re: TextSocketMicroBatchReader no longer supports nc utility

Jungtaek Lim

2018년 6월 5일 (화) 오전 11:30, Jungtaek Lim <[hidden email]>님이 작성:
Yeah that's why I initiated this thread, especially socket source is expected to be used from examples on official document or some experiments, which we tend to simply use netcat.

I'll file an issue and provide the fix.

2018년 6월 5일 (화) 오전 1:48, Joseph Torres <[hidden email]>님이 작성:
I tend to agree that this is a bug. It's kinda silly that nc does this, but a socket connector that doesn't work with netcat will surely seem broken to users. It wouldn't be a huge change to defer opening the socket until a read is actually required.

On Sun, Jun 3, 2018 at 9:55 PM, Jungtaek Lim <[hidden email]> wrote:
Hi devs,

Not sure I can hear back the response sooner since Spark summit is just around the corner, but just would want to post and wait.

While playing with Spark 2.4.0-SNAPSHOT, I found nc command exits before reading actual data so the query also exits with error.

The reason is due to launching temporary reader for reading schema, and closing reader, and re-opening reader. While reliable socket server should be able to handle this without any issue, nc command normally can't handle multiple connections and simply exits when closing temporary reader.

I would like to file an issue and contribute on fixing this if we think this is a bug (otherwise we need to replace nc utility with another one, maybe our own implementation?), but not sure we are happy to apply workaround for specific source. 

Would like to hear opinions before giving a shot.

Thanks,
Jungtaek Lim (HeartSaVioR)