Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[5.3] Fix to prevent recursive where clause addition for chunkById #16498

Closed
wants to merge 3 commits into from

Conversation

soundsgoodsofar
Copy link

Noticed during a recent meetup presentation that when running chunkById, the id where clause gets appended over and over again. (Ie you get queries like `SELECT * FROM my_table WHERE id > 50 AND id > 100 AND id > 150 AND ...).

For smaller datasets this isn't a big deal, but as the whole point of chunkById is dealing with increasingly larger datasets, this seems problematic. Eventually you'll run into query length limits somewhere. Even if you don't, transmitting so much useless data hurts my sense of performance.

This change is intended to "tag" the chunking where clause so it gets replaced each iteration instead of appended. I couldn't find anywhere that relies on the Builder where member variable to be numerically indexed, so I don't believe this will be an issue. Still, I'm not intimately familiar with this framework.

Also, this prevents the usage of more exotic id columns with chunkById. Frankly I think that's desirable--I would restrict it further to only allowing use of indexed columns if I could. If there's disagreement with this, another option would be to make the where function a wrapper around an internal function that also accepts a tag parameter.

@taylorotwell
Copy link
Member

Can you show me some code I can use to recreate the issue?

@vlakoff
Copy link
Contributor

vlakoff commented Nov 22, 2016

I can reproduce this issue, good catch.

@vlakoff
Copy link
Contributor

vlakoff commented Nov 22, 2016

Alternatively, maybe the $this->wheres property could be "pre-filtered", like it's done above for the $this->orders property. Did you try it?

@taylorotwell
Copy link
Member

Can someone just share the code I need to reproduce this?

@vlakoff
Copy link
Contributor

vlakoff commented Nov 22, 2016

Just use this on any table you have available:

\DB::enableQueryLog();

$count = 0;
\DB::table('users')->chunkById(1, function () use (&$count) {
    if (++$count > 2) return false;
});

var_dump( \DB::getQueryLog() );

@soundsgoodsofar
Copy link
Author

Thanks vlakoff, yes that's the issue. Especially bad with low chunk sizes.

Sure, there are many ways to solve this so up to you guys how you prefer to tackle it. I figured I'd just offer one of the simpler solutions. I hesitate on running the where clauses through a pre-filter though--eliminating basic redundancies wouldn't be too bad, but I could see that quickly snow balling into some crazy finite state machine to optimize wheres. The SQL engine query optimizer already does that sort of stuff anyway, and it seemed a little overkill for this situation.

Also the chunkById issue is a bug, but I can see other cases where the ability to tag where clauses would be a useful feature and not a matter of optimizing away redundancy. For example, it might be nice to cycle through a set of enumerated values in a where clause without creating an entirely new query each loop.

Another option would also be to change the existing where function to be a wrapper to another that accepts a tag parameter (default null), allowing any variety of where clauses to be labeled for future modification/removal. Not sure if this aligns with the product vision though, so I didn't go that route.

@GrahamCampbell GrahamCampbell changed the title Fix to prevent recursive where clause addition for chunkById [5.3] Fix to prevent recursive where clause addition for chunkById Nov 22, 2016
@taylorotwell
Copy link
Member

I don't understand why we're having to fix these problems now. Did chunk always have this problem. Was the problem just introduced? If so, how and why and can we revert that change instead of introducing some new key into the wheres clause? Who knows what other consequences that will have.

@vlakoff
Copy link
Contributor

vlakoff commented Nov 23, 2016

I think some people just gained interest in this previously rarely used method.

My PRs were about usability (#16180, #16283) and adding tests (#16263, #16359), the main target being laravel/ideas#103.

But here we have a real bug. It's probably present since the first implementation of chunkById.

@vlakoff
Copy link
Contributor

vlakoff commented Nov 23, 2016

Just in case, ping @mzur, as you have authored #14499 and #14711 maybe you have some insight.

Also ping @nunomazer as you have authored #13604 (didn't you encounter the same issue with the "where" clause?)

@taylorotwell
Copy link
Member

So this big under discussion in this PR was not introduced with any recent PR, but has existed since chunk was introduced?

@vlakoff
Copy link
Contributor

vlakoff commented Nov 23, 2016

It seems so. Not chunk but the later chunkById. (refs initial implementation: #12861)

@mzur
Copy link
Contributor

mzur commented Nov 23, 2016

Yes, this where clause exists since v5.2.27 when chunkById was introduced.

*
* @param string $column
* @param int $value
*/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing return annotation

@@ -548,6 +548,24 @@ public function where($column, $operator = null, $value = null, $boolean = 'and'
}

/**
* Adds an id limit to assist with Eloquent chunkById.
* Subsequent calls override existing limit.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing newline.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what this one means

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graham meant there should be an empty line between the short description and additional comments, like I think too it's the case here.

@@ -1498,9 +1516,9 @@ public function forPageAfterId($perPage = 15, $lastId = 0, $column = 'id')
return $order['column'] === $column;
})->values()->all();

return $this->where($column, '>', $lastId)
->orderBy($column, 'asc')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please restore this whitespace.

@derekmd
Copy link
Contributor

derekmd commented Nov 26, 2016

This chunkById method behavior since 5.2 wouldn't have been noticed because its output is idempotent.

  • Chunk 1: WHERE id > 0
  • Chunk 2: WHERE id > 0 AND id > 100
  • Chunk 3: WHERE id > 0 AND id > 100 AND id > 200
  • Chunk 4: WHERE id > 0 AND id > 100 AND id > 200 AND id > 300
  • Chunk 5: WHERE id > 0 AND id > 100 AND id > 200 AND id > 300 AND id > 400

The result set for each chunk will be the same after the fix when only the last clause in the above iterations is applied.

  • Chunk 1: WHERE id > 0
  • Chunk 2: WHERE id > 100
  • Chunk 3: WHERE id > 200
  • Chunk 4: WHERE id > 300
  • Chunk 5: WHERE id > 400

Its severity is mild as the fix only affects query performance once 100+ chunks (with 100+ WHERE conditionals) are hit.

@soundsgoodsofar
Copy link
Author

@derekmd no argument with the mild statement. This will indeed not affect the majority of users and it is idempotent (aside from when you exceed buffer limits and start to fatal).

However I wouldn't want laravel to become known for "doesn't scale". These types of issues will contribute towards that reputation.

@vlakoff
Copy link
Contributor

vlakoff commented Nov 28, 2016

Don't forget about a different fix I have suggested above. I bet it would be a better solution:

Alternatively, maybe the $this->wheres property could be "pre-filtered", like it's done above for the $this->orders property.

@soundsgoodsofar
Copy link
Author

@vlakoff it seems to me that a filter at that lower level should be general purpose. And I would be worried about trying to write a general purpose optimizer to accurately recognize and eliminate redundant where clauses. Taking that to the extreme you'd essentially be writing a SQL query optimizer. It will also certainly be worse performance-wise.

Possible I misunderstand what you mean though.

Copy link
Author

@soundsgoodsofar soundsgoodsofar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should all be good now

@vlakoff
Copy link
Contributor

vlakoff commented Nov 29, 2016

I disagree when you say it's about optimizing the query, no, what we have here is a bug.

About my fix, I think it integrates better in the codebase. Your fix is introducing named where clauses. Though it would be great to get some feedback about the possible fixes.

@taylorotwell
Copy link
Member

@vlakoff if you have a better implementation of a fix then submit it.

@soundsgoodsofar
Copy link
Author

@vlakoff but how would you implement a general-purpose order-by filter at the lower level? Are you talking about inspecting each order by clause to see if it is subsumed by another one? That feels an awful lot like an NP-complete FSM waiting to happen if you do it in a way that isn't just a very naive approach targeting this specific case.

Maybe I misunderstand, as @taylorotwell said maybe send a PR?

@taylorotwell
Copy link
Member

@soundsgoodsofar possible to write a test for this?

@taylorotwell
Copy link
Member

Fixed: 53f97a0

Would appreciate if y'all can confirm.

@taylorotwell
Copy link
Member

I will make same fix to Eloquent query builder.

@jralph
Copy link

jralph commented Jan 12, 2017

I noticed this has been fixed, will this be released into illuminate/database at some point, as this is still at 5.3.23? Could really do with this fix in Lumen!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants