Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unnecessary Scans/Reads -> Extra I/O #112

Closed
Tracked by #283
queryproc opened this issue Apr 10, 2021 · 1 comment
Closed
Tracked by #283

Unnecessary Scans/Reads -> Extra I/O #112

queryproc opened this issue Apr 10, 2021 · 1 comment

Comments

@queryproc
Copy link
Contributor

Consider the following query:

MATCH (a)-e1->(b)
RETURN e1.time;

Currently, we consider two possible QVOs [a, b] and therefore generate two plans:

P1: Scan(a) → Extend(b) → ScanEdgeProperty(e1.time) → Project([e1.time])

OR

P2: Scan(b) → Extend(a) → ScanEdgeProperty(e1.time) → Project([e1.time])

The extension to (b) comes from the fact that we tend to cover all of the query edges and vertices blindly regardless of what the query actually requests. The two plans should actually be:

P1: Scan(a) → ScanEdgeProperty(e1.time) → Project([e1.time])

OR

P2: Scan(b) → ScanEdgeProperty(e1.time) → Project([e1.time])

In other cases, where we don’t actually need any properties all together, we should simply get the size from the Lists header and therefore not pin/unpin unnecessarily. The query:

MATCH (a)-e1->(b)
RETURN count(*);

Should generate:

P1: Scan(a) → GetListSize(e1) → GROUP BY (COUNT(*))
P1: Scan(b) → GetListSize(e1) → GROUP BY (COUNT(*))

We need to be more aware of the actual properties needed. ID is just another property we do a join on.

@andyfengHKU
Copy link
Contributor

Solved in PR #1329

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants