Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

jobsprofiler: enable requesting a job's execution details #105384

Merged
merged 1 commit into from
Jul 11, 2023

Conversation

adityamaru
Copy link
Contributor

@adityamaru adityamaru commented Jun 22, 2023

Similar to statement bundles this change introduces the
infrastructure to request, collect and read the execution
details for a particular job.
Right now, the execution details will only contain the
latest DSP diagram for a job, but going forward this will
give us a place to dump raw files such as:

  • cluster-wide job traces
  • cpu profiles
  • trace-driven aggregated stats
  • raw payload and progress protos

Downloading some or all of these execution details will be
exposed in a future patch in all of the places where
statement bundles are today:

  • DBConsole
  • CLI shell
  • SQL shell

This change introduces a builtin that allows the caller
to request the collection and persistence of a job's
current execution details.

This change also introduces a new endpoint on the status
server to read the data corresponding to the execution details
persisted for a job. The next set of
PRs will add the necessary components to allow downloading
the files from the DBConsole.

Informs: #105076

Release note: None

@adityamaru adityamaru requested a review from dt June 22, 2023 20:01
@adityamaru adityamaru requested review from a team as code owners June 22, 2023 20:01
@adityamaru adityamaru requested a review from a team June 22, 2023 20:01
@adityamaru adityamaru requested review from a team as code owners June 22, 2023 20:01
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@adityamaru adityamaru requested a review from a team as a code owner June 22, 2023 21:33
adityamaru added a commit to adityamaru/cockroach that referenced this pull request Jun 24, 2023
This change adds a new component to the `Profiler` tab
of the job details page that supports collecting and viewing
job profiler bundles. The component has a button to collect
job profiler bundles. These bundles are then listed in a sorted
table with the ability to download each bundle.

The above operations are backed by the infrastructure added
in cockroachdb#105384.

Note, the `Profiler` tab is currently disabled for CC but this
change allows for a future project to enable the collection of
bundles through the CC console as well.

Informs: cockroachdb#105076
Release note (ui change): collect and download job profiler
bundles from the `Profiler` tab on the job details page.
@adityamaru adityamaru force-pushed the bundle-one branch 2 times, most recently from 682e9f7 to 3b49e79 Compare June 24, 2023 21:17
@dt
Copy link
Member

dt commented Jun 27, 2023

Overall looks good to me. One question I had though is if we even need to request/persist/fetch the generated bundle to job_info, or if we could just have the bundle fetch endpoint generate it on the fly since it is generated from job state that is already persisted, isn't it?

pkg/sql/jobs_profiler_bundle.go Outdated Show resolved Hide resolved
@adityamaru
Copy link
Contributor Author

adityamaru commented Jun 28, 2023

if we even need to request/persist/fetch the generated bundle to job_info, or if we could just have the bundle fetch endpoint generate it on the fly since it is generated from job state that is already persisted

Not all the information in the bundle is going to persisted to job state. For example, active tracing spans of a job or goroutine stacks at the time the bundle was collected. Separating the request/persist from the fetch allows us to download older bundles at a later point in time if we want to see the state of the job at different points in time.

https://www.loom.com/share/4d0ff8ffe53b4e09bf8f0de1009c066e?sid=7b1bb121-b8b4-4215-8b5c-12f103f482df is a prototype of how I want the bundles to be listed. When you request a bundle it shows up in the table, it can then be downloaded at any point in the future.

@adityamaru adityamaru requested a review from dt June 28, 2023 00:44
@adityamaru adityamaru force-pushed the bundle-one branch 2 times, most recently from 1e70b8e to 5cde323 Compare June 28, 2023 22:01
@adityamaru adityamaru changed the title jobsprofiler: introduce collection of job bundles jobsprofiler: enable requesting a job's execution details Jun 28, 2023
@adityamaru
Copy link
Contributor Author

friendly ping @dt with the updated approach discussed offline

pkg/sql/jobs_profiler_bundle.go Outdated Show resolved Hide resolved
return errors.Wrapf(err, "failed to compress chunk for file %s", filename)
}

// On listing we want the info_key of each chunk to sort after the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple nits:

  1. %d formatted monotonic ints don't sort monotonically
  2. unixnano isn't using the monotonic clock

I wonder if we should just use a loop counter that starts at zero and goes up, and I wonder if we should give the last chunk a well-known name so that the reader can verify they got all chunks.

I might just say use a loop counter that starts at 0 and then print them with %04d in MakeProfilerBundleChunkKey

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice %04d was what I had forgotten about. Changed to a chunk counter, and I prefix the last chunk with _final.

Also caught a potential txn retry bug where we were mutating data inside the closure. Now we take a copy and operate on that.

Similar to statement bundles this change introduces the
infrastructure to request, collect and read the execution
details for a particular job.
Right now, the execution details  will only contain the
latest DSP diagram for a job, but going forward this will
give us a place to dump raw files such as:
- cluster-wide job traces
- cpu profiles
- trace-driven aggregated stats
- raw payload and progress protos

Downloading some or all of these execution details will be
exposed in a future patch in all of the places where
statement bundles are today:
- DBConsole
- CLI shell
- SQL shell

This change introduces a builtin that allows the caller
to request the collection and persistence of a job's
current execution details.

This change also introduces a new endpoint on the status
server to read the data corresponding to the execution details
persisted for a job. The next set of
PRs will add the necessary components to allow downloading
the files from the DBConsole.

Informs: cockroachdb#105076

Release note: None
@adityamaru
Copy link
Contributor Author

TFTR!

bors r=dt

@craig
Copy link
Contributor

craig bot commented Jul 11, 2023

Build succeeded:

@craig craig bot merged commit 89d6fdd into cockroachdb:master Jul 11, 2023
2 checks passed
@adityamaru adityamaru deleted the bundle-one branch July 11, 2023 19:46
adityamaru added a commit to adityamaru/cockroach that referenced this pull request Jul 11, 2023
In cockroachdb#105384
we added infrastructure to request and store execution details
for a job. This currently only includes the DistSQL diagram
generated during a job execution. Going forward this will
include several files such as traces, goroutines, profiles etc.

This change introduces an endpoint that allows listing all such
files that are available for consumption. This list will be displayed
on the job details page allowing the user to download any subset of
the files collected during job execution.

Informs: cockroachdb#105076
Release note: None
adityamaru added a commit to adityamaru/cockroach that referenced this pull request Jul 13, 2023
In cockroachdb#105384
we added infrastructure to request and store execution details
for a job. This currently only includes the DistSQL diagram
generated during a job execution. Going forward this will
include several files such as traces, goroutines, profiles etc.

This change introduces an endpoint that allows listing all such
files that are available for consumption. This list will be displayed
on the job details page allowing the user to download any subset of
the files collected during job execution.

Informs: cockroachdb#105076
Release note: None
craig bot pushed a commit that referenced this pull request Jul 13, 2023
106629: sql,server: add endpoint to list a job's execution details r=dt a=adityamaru

In #105384 we added infrastructure to request and store execution details for a job. This currently only includes the DistSQL diagram generated during a job execution. Going forward this will include several files such as traces, goroutines, profiles etc.

This change introduces an endpoint that allows listing all such files that are available for consumption. This list will be displayed on the job details page allowing the user to download any subset of the files collected during job execution.

Informs: #105076
Release note: None

Co-authored-by: adityamaru <adityamaru@gmail.com>
adityamaru added a commit to adityamaru/cockroach that referenced this pull request Jul 17, 2023
In cockroachdb#105384 and cockroachdb#106629 we added support to collect
and list files that had been collected as part of
a job's execution details. These files are meant
to provide improved obersvability into the state
of a job.

This change is the first of a few that exposes these
endpoints on the DBConsole job details page. This change
only adds support for listing files that have been
requested as part of a job's execution details.
A future change will add support to request these files,
sort them and download them from the job details page.

This page is not available on the Cloud Console as it
is meant for advanced debugging.

Informs: cockroachdb#105076

Release note (ui change): add table in the Profiler job
details page that lists all the available files describing
a job's execution details
adityamaru added a commit to adityamaru/cockroach that referenced this pull request Jul 17, 2023
This change teaches the job resumer to fetch and write
its trace recording before finishing its tracing span.
These traces will be a part of the execution detail files
introduced in cockroachdb#105384. These traces will be
valuable in understanding a job's execution characteristics
during each resumption, even if the job has reached a terminal
state.

Currently, this behaviour is opt-in and has been enabled for
backups, restore, import and physical replication jobs.

Informs: cockroachdb#102794
Release note: None
craig bot pushed a commit that referenced this pull request Jul 17, 2023
105368: backupccl: add unit tests for FileSSTSink r=rhu713 a=rhu713

Backfill unit tests for the basic functionality of FileSSTSink with additional test cases involving inputs of keys with many entries in its revision history.

Epic: CRDB-27758

Release note: None

105624: jobsprofiler: dump trace recording on job completion r=dt a=adityamaru

This change teaches the job resumer to fetch and write its trace recording before finishing its tracing span. These traces will be consumed by the job profiler bundle that is being introduced in #105384. These traces will be valuable in understanding a job's execution characteristics during each resumption, even if the job has reached a terminal state.

Currently, this behaviour is opt-in and has been enabled for backups, restore, import and physical replication jobs.

Informs: #102794
Release note: None

106515: DEPS: bump across etcd-io/raft#81 and disable conf change validation r=erikgrinaker a=tbg

We don't want raft to validate conf changes, since that causes issues due to false positives (the check is above raft, but needs to be below raft to always work correctly). We are
taking responsibility for carrying out only valid conf changes, as we always
have.

See also etcd-io/raft#80.

Fixes #105797.
Epic: CRDB-25287
Release note (bug fix): under rare circumstances, a replication change could get
stuck when proposed near lease/leadership changes (and likely under overload),
and the replica circuit breakers could trip. This problem has been addressed.
Note to editors: this time it's really addressed (fingers crossed); a previous
attempt with an identical release note had to be reverted.


106939: changefeedccl: fix flake in TestParquetRows r=miretskiy a=jayshrivastava

changefeedccl: fix flake in TestParquetRows
Previously, this test would flake when rows were not emitted
in the exact order they were inserted/modified. This change
makes the test resilient to different ordering.

Epic: None
Fixes: #106911
Release note: None

---

util/parquet: make metadata transparent in tests
Previously, users of the library would need to explicitly
call `NewWriterWithReaderMetadata()` to configure the parquet
writer to add metadata required to use reader utils in
`pkg/util/parquet/testutils.go`. This led to a lot of code
uncessary code duplication. This moves the logic to decide if
metadata should be written to `NewWriter()` so callers do not
need to do the extra work.

Epic: None
Release note: None

Co-authored-by: Rui Hu <rui@cockroachlabs.com>
Co-authored-by: adityamaru <adityamaru@gmail.com>
Co-authored-by: Tobias Grieger <tobias.b.grieger@gmail.com>
Co-authored-by: Jayant Shrivastava <jayants@cockroachlabs.com>
adityamaru added a commit to adityamaru/cockroach that referenced this pull request Jul 17, 2023
In cockroachdb#105384 and cockroachdb#106629 we added support to collect
and list files that had been collected as part of
a job's execution details. These files are meant
to provide improved observability into the state
of a job.

This change is the first of a few that exposes these
endpoints on the DBConsole job details page. This change
only adds support for listing files that have been
requested as part of a job's execution details.
A follow-up change will add support to request these files,
sort them and download them from the job details page.

This page is not available on the Cloud Console as it
is meant for advanced debugging.

This change also renames the `Profiler` tab to
`Advanced Debugging` as the users of this tab are
going to be internal CRDB support and engineering
for the time being.

Informs: cockroachdb#105076

Release note (ui change): add table in the Profiler job
details page that lists all the available files describing
a job's execution details
adityamaru added a commit to adityamaru/cockroach that referenced this pull request Jul 18, 2023
In cockroachdb#105384 and cockroachdb#106629 we added support to collect
and list files that had been collected as part of
a job's execution details. These files are meant
to provide improved observability into the state
of a job.

This change is the first of a few that exposes these
endpoints on the DBConsole job details page. This change
only adds support for listing files that have been
requested as part of a job's execution details.
A follow-up change will add support to request these files,
sort them and download them from the job details page.

This page is not available on the Cloud Console as it
is meant for advanced debugging.

This change also renames the `Profiler` tab to
`Advanced Debugging` as the users of this tab are
going to be internal CRDB support and engineering
for the time being.

Informs: cockroachdb#105076

Release note (ui change): add table in the Profiler job
details page that lists all the available files describing
a job's execution details
adityamaru added a commit to adityamaru/cockroach that referenced this pull request Jul 19, 2023
In cockroachdb#105384 and cockroachdb#106629 we added support to collect
and list files that had been collected as part of
a job's execution details. These files are meant
to provide improved observability into the state
of a job.

This change is the first of a few that exposes these
endpoints on the DBConsole job details page. This change
only adds support for listing files that have been
requested as part of a job's execution details.
A follow-up change will add support to request these files,
sort them and download them from the job details page.

This page is not available on the Cloud Console as it
is meant for advanced debugging.

This change also renames the `Profiler` tab to
`Advanced Debugging` as the users of this tab are
going to be internal CRDB support and engineering
for the time being.

Informs: cockroachdb#105076

Release note (ui change): add table in the Profiler job
details page that lists all the available files describing
a job's execution details
adityamaru added a commit to adityamaru/cockroach that referenced this pull request Jul 24, 2023
In cockroachdb#105384 and cockroachdb#106629 we added support to collect
and list files that had been collected as part of
a job's execution details. These files are meant
to provide improved observability into the state
of a job.

This change is the first of a few that exposes these
endpoints on the DBConsole job details page. This change
only adds support for listing files that have been
requested as part of a job's execution details.
A follow-up change will add support to request these files,
sort them and download them from the job details page.

This page is not available on the Cloud Console as it
is meant for advanced debugging.

This change also renames the `Profiler` tab to
`Advanced Debugging` as the users of this tab are
going to be internal CRDB support and engineering
for the time being.

Informs: cockroachdb#105076

Release note (ui change): add table in the Profiler job
details page that lists all the available files describing
a job's execution details
craig bot pushed a commit that referenced this pull request Jul 24, 2023
106879: jobs: add table to display execution details r=maryliag a=adityamaru

In #105384 and #106629 we added support to collect and list files that had been collected as part of
a job's execution details. These files are meant
to provide improved obersvability into the state
of a job.

This change is the first of a few that exposes these endpoints on the DBConsole job details page. This change only adds support for listing files that have been requested as part of a job's execution details.
A future change will add support to request these files, sort them and download them from the job details page.

This page is not available on the Cloud Console as it is meant for advanced debugging.

Informs: #105076

Release note (ui change): add table in the Profiler job details page that lists all the available files describing a job's execution details
<img width="1505" alt="Screenshot 2023-07-18 at 2 26 50 PM" src="https://github.com/cockroachdb/cockroach/assets/13837382/aebe18a6-9c25-4c9a-ad7c-a94e2e4c97ff">
<img width="1510" alt="Screenshot 2023-07-18 at 2 27 03 PM" src="https://github.com/cockroachdb/cockroach/assets/13837382/da9b3a21-8dc6-47ca-ac02-24d8bb7d09e7">



107236: sql: use txn.NewBatch instead of &kv.Batch{} r=fqazi a=rafiss

This will make these requests properly passes along the admission control headers.

informs #79212
Epic: None
Release note: None

107447: sql: fix CREATE MATERIALIZED VIEW AS schema change job description r=fqazi a=ecwall

Fixes #107445

This changes the CREATE MATERIALIZED VIEW AS schema change job description SQL syntax. For example
```
CREATE VIEW "v" AS "SELECT t.id FROM movr.public.t";
```
becomes
```
CREATE MATERIALIZED VIEW defaultdb.public.v AS SELECT t.id FROM defaultdb.public.t WITH DATA;
```

Release note (bug fix): Fix CREATE MATERIALIZED VIEW AS schema change job description SQL syntax.

Co-authored-by: adityamaru <adityamaru@gmail.com>
Co-authored-by: Rafi Shamim <rafi@cockroachlabs.com>
Co-authored-by: Evan Wall <wall@cockroachlabs.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants