You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am creating a Jupyter notebook to illustrate how to us the new Druid catalog. As part of that task, I submit an MSQ ingestion task, wait for the Overlord to report task completion, then query the table. Each ingestion uses REPLACE and usually creates a new datasource.
When running queries, I occasionally (about 20% of the time) get an error saying that there is no such table. Yet, if I wait a few seconds, and try again, the query succeeds. The reason is clear: MSQ reported success as soon as ingestion is complete. It takes a while for the new segments to be loaded onto my one historical node. During that time, the Broker knows nothing about the new table.
To be very specific:
No segments for the target table exist.
Call /sql/task to submit an MSQ REPLACE query.
Poll Overlord waiting for the task to be marked as completed.
Immediately issue a /sql query against that same table.
This creates a race condition. Druid reports that the ingest is done, but it is not really done. The client has to be smart enough to know that the resulting query error is due to a race condition, not to one of possibly many other problems. This puts the burden on the client. Or, in my case, I have to add extra verbiage that says "if this query fails, wait a while and try again", which doesn't scream "easy to use."
The MSQ ITs (and now the Jupyter notebook) use a two-part wait loop: first wait for segment load, then wait for a simple SQL query to succeed. This approach works, but means that each client (the Druid console, the Jupyter notebook, custom clients) must all discover the issue, discover the workaround, and code up the workaround every place that an MSQ query is run followed by a SELECT query. Again, this is not "easy to use."
The ask is for MSQ to wait for segments to be loaded before declaring completion. That way, a client that waits for MSQ task completion can be assured that, when the task is complete, the table is actually ready to be queried. If we don't feel that such a check is generally useful, then provide an option do do the wait when requested (say with a query context parameter.)
The text was updated successfully, but these errors were encountered:
Affected Version
26.0.0-SNAPSHOT
Description
I am creating a Jupyter notebook to illustrate how to us the new Druid catalog. As part of that task, I submit an MSQ ingestion task, wait for the Overlord to report task completion, then query the table. Each ingestion uses
REPLACE
and usually creates a new datasource.When running queries, I occasionally (about 20% of the time) get an error saying that there is no such table. Yet, if I wait a few seconds, and try again, the query succeeds. The reason is clear: MSQ reported success as soon as ingestion is complete. It takes a while for the new segments to be loaded onto my one historical node. During that time, the Broker knows nothing about the new table.
To be very specific:
/sql/task
to submit an MSQREPLACE
query./sql
query against that same table.This creates a race condition. Druid reports that the ingest is done, but it is not really done. The client has to be smart enough to know that the resulting query error is due to a race condition, not to one of possibly many other problems. This puts the burden on the client. Or, in my case, I have to add extra verbiage that says "if this query fails, wait a while and try again", which doesn't scream "easy to use."
The MSQ ITs (and now the Jupyter notebook) use a two-part wait loop: first wait for segment load, then wait for a simple SQL query to succeed. This approach works, but means that each client (the Druid console, the Jupyter notebook, custom clients) must all discover the issue, discover the workaround, and code up the workaround every place that an MSQ query is run followed by a
SELECT
query. Again, this is not "easy to use."The ask is for MSQ to wait for segments to be loaded before declaring completion. That way, a client that waits for MSQ task completion can be assured that, when the task is complete, the table is actually ready to be queried. If we don't feel that such a check is generally useful, then provide an option do do the wait when requested (say with a query context parameter.)
The text was updated successfully, but these errors were encountered: