Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SQL Smith: Measure Coverage #7825

Open
2 tasks
jon-chuang opened this issue Feb 10, 2023 · 9 comments
Open
2 tasks

SQL Smith: Measure Coverage #7825

jon-chuang opened this issue Feb 10, 2023 · 9 comments

Comments

@jon-chuang
Copy link
Contributor

jon-chuang commented Feb 10, 2023

Is your feature request related to a problem? Please describe.

I noticed that the plans generated often evaluate to very idiosyncratic queries. Hence, I am worried about the actual coverage. I believe the coverage for frontend/optimizer side may be ok, but am more worried about the e2e testing (batch and stream part).

As an anecdote, a bunch of errors were caught in the frontend due to const eval but were not caught in the batch/stream executors.

Describe the solution you'd like

I am suggested several ideas to get a better understanding of the e2e coverage:

  • Test how many columns/table are in the underlying tables referenced in the generated SQL v.s. columns/tables in the final optimized query plan's output.
    • If a column does not appear in the final query or in a condition, it is very likely to be pruned away, so we do not get coverage.
    • This metric should take into account the fact that certain columns may reference columns via input ref. As long as these input refs can be traced to an underlying table/MV, that table/MV's column should be counted as being referenced.
    • If a column is derived from aggregating InputRef, then it should be considered referenced.
  • fraction of expressions that reference input variables after they have been optimized in frontend.
    • Another issue is that with enough expressions, a condition always ends up being either false or true. Hence I would suggest not putting too many literals in expressions. Const exprs are not very interesting compared to expressions that need to be evaluated in our executors with row data as their input.

Describe alternatives you've considered

No response

Additional context

No response

@xxchan
Copy link
Member

xxchan commented Feb 10, 2023

We want to implement coverage-guided fuzzing? 😲

@jon-chuang
Copy link
Contributor Author

jon-chuang commented Feb 10, 2023

@xxchan not sure. The first step is to measure how good our coverage is before determining if we need to improve the fuzz test generation.

The automatic improvement (coverage-guided) part is optional, tuning it manually is also an option @kwannoel

The measurement part is definitely the first step to implement coverage-guided fuzzing, and we can define more useful metrics as we go along, especially if it helps us catch actual bugs in the e2e setting as a result.

@xxchan xxchan changed the title SQL Smith Coverage SQL Smith: Measure Coverage Feb 10, 2023
@kwannoel
Copy link
Contributor

A simple thing to do is enable codecov: https://about.codecov.io/.
We currently already use it in our ci pipeline. We can upload a separate set of reports for sqlsmith codecov (deterministic test).

@neverchanje
Copy link
Contributor

How do we measure the coverage in general now? We used to have similar discussions and the final conclusion was that the coverage is hard to be calculated if taking e2e tests into account. E2e is not covered by general Rust coverage tools.

@kwannoel
Copy link
Contributor

kwannoel commented Feb 16, 2023

How do we measure the coverage in general now? We used to have similar discussions and the final conclusion was that the coverage is hard to be calculated if taking e2e tests into account. E2e is not covered by general Rust coverage tools.

AFAIK, for unit tests we use codecov for reprots and llvm-cov via nextest: https://nexte.st/book/test-coverage.html. E2E test coverage might be possible, if we run tests with all-in-one binary / deterministic-simulation.

I don't think there are currently other things measuring coverage apart from unit test.

@jon-chuang
Copy link
Contributor Author

E2e is not covered by general Rust coverage tools.

I think we can still get coverage via deterministic test. Secondly, we can use LLVM to instrument the deterministic test all in one binary to get a better coverage.

@xxchan
Copy link
Member

xxchan commented Feb 16, 2023

the final conclusion was that the coverage is hard to be calculated

#32 I didn't find any conclusion. It seems that there are just no follow-ups. 🤪

@fuyufjh
Copy link
Member

fuyufjh commented Mar 22, 2023

Shall we close this?

@fuyufjh fuyufjh removed this from the release-0.18 milestone Mar 22, 2023
@kwannoel
Copy link
Contributor

Still needed eventually I think. Can just remove from milestone but leave open to track.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants