Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MSQ controller: Support in-memory shuffles; towards JVM reuse. #16168

Merged
merged 21 commits into from
May 1, 2024

Commits on Mar 19, 2024

  1. MSQ controller: Support in-memory shuffles; towards JVM reuse.

    This patch contains two controller changes that make progress towards a
    lower-latency MSQ.
    
    First, support for in-memory shuffles. The main feature of in-memory shuffles,
    as far as the controller is concerned, is that they are not fully buffered. That
    means that whenever a producer stage uses in-memory output, its consumer must run
    concurrently. The controller determines which stages run concurrently, and when
    they start and stop.
    
    "Leapfrogging" allows any chain of sort-based stages to use in-memory shuffles
    even if we can only run two stages at once. For example, in a linear chain of
    stages 0 -> 1 -> 2 where all do sort-based shuffles, we can use in-memory shuffling
    for each one while only running two at once. (When stage 1 is done reading input
    and about to start writing its output, we can stop 0 and start 2.)
    
    1) New OutputChannelMode enum attached to WorkOrders that tells workers
       whether stage output should be in memory (MEMORY), or use local or durable
       storage.
    
    2) New logic in the ControllerQueryKernel to determine which stages can use
       in-memory shuffling (ControllerUtils#computeStageGroups) and to launch them
       at the appropriate time (ControllerQueryKernel#createNewKernels).
    
    3) New "doneReadingInput" method on Controller (passed down to the stage kernels)
       which allows stages to transition to POST_READING even if they are not
       gathering statistics. This is important because it enables "leapfrogging"
       for HASH_LOCAL_SORT shuffles, and for GLOBAL_SORT shuffles with 1 partition.
    
    4) Moved result-reading from ControllerContext#writeReports to new QueryListener
       interface, which ControllerImpl feeds results to row-by-row while the query
       is still running. Important so we can read query results from the final
       stage using an in-memory channel.
    
    5) New class ControllerQueryKernelConfig holds configs that control kernel
       behavior (such as whether to pipeline, maximum number of concurrent stages,
       etc). Generated by the ControllerContext.
    
    Second, a refactor towards running workers in persistent JVMs that are able to
    cache data across queries. This is helpful because I believe we'll want to reuse
    JVMs and cached data for latency reasons.
    
    1) Move creation of WorkerManager and TableInputSpecSlicer to the
       ControllerContext, rather than ControllerImpl. This allows managing workers and
       work assignment differently when JVMs are reusable.
    
    2) Lift the Controller Jersey resource out from ControllerChatHandler to a
       reusable resource.
    
    3) Move memory introspection to a MemoryIntrospector interface, and introduce
       ControllerMemoryParameters that uses it. This makes it easier to run MSQ in
       process types other than Indexer and Peon.
    
    Both of these areas will have follow-ups that make similar changes on the
    worker side.
    gianm committed Mar 19, 2024
    Configuration menu
    Copy the full SHA
    6b7d766 View commit details
    Browse the repository at this point in the history
  2. Address static checks.

    gianm committed Mar 19, 2024
    Configuration menu
    Copy the full SHA
    75376ce View commit details
    Browse the repository at this point in the history
  3. Address static checks.

    gianm committed Mar 19, 2024
    Configuration menu
    Copy the full SHA
    1d80f02 View commit details
    Browse the repository at this point in the history
  4. Fixes.

    gianm committed Mar 19, 2024
    Configuration menu
    Copy the full SHA
    08b4671 View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    c30b1cc View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    7a2693f View commit details
    Browse the repository at this point in the history
  7. Report writer tests.

    gianm committed Mar 19, 2024
    Configuration menu
    Copy the full SHA
    292b143 View commit details
    Browse the repository at this point in the history

Commits on Apr 4, 2024

  1. Configuration menu
    Copy the full SHA
    456436a View commit details
    Browse the repository at this point in the history

Commits on Apr 9, 2024

  1. Configuration menu
    Copy the full SHA
    57ba025 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    fc6d846 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    7e99c13 View commit details
    Browse the repository at this point in the history
  4. Adjustments.

    gianm committed Apr 9, 2024
    Configuration menu
    Copy the full SHA
    23b0cdf View commit details
    Browse the repository at this point in the history

Commits on Apr 15, 2024

  1. Configuration menu
    Copy the full SHA
    f26931e View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    279ab58 View commit details
    Browse the repository at this point in the history

Commits on Apr 16, 2024

  1. Fix reports.

    gianm committed Apr 16, 2024
    Configuration menu
    Copy the full SHA
    6cce08a View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    5da7200 View commit details
    Browse the repository at this point in the history

Commits on Apr 23, 2024

  1. Review updates.

    gianm committed Apr 23, 2024
    Configuration menu
    Copy the full SHA
    1d43eb2 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    120a917 View commit details
    Browse the repository at this point in the history

Commits on Apr 26, 2024

  1. Adjust name.

    gianm committed Apr 26, 2024
    Configuration menu
    Copy the full SHA
    87d6ff5 View commit details
    Browse the repository at this point in the history

Commits on Apr 29, 2024

  1. Configuration menu
    Copy the full SHA
    62be98e View commit details
    Browse the repository at this point in the history
  2. Small changes.

    gianm committed Apr 29, 2024
    Configuration menu
    Copy the full SHA
    05ec634 View commit details
    Browse the repository at this point in the history