Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[5.5] Fix flaws in how search_path is set on connection #22414

Closed
wants to merge 1 commit into from

Conversation

cbj4074
Copy link
Contributor

@cbj4074 cbj4074 commented Dec 13, 2017

TL;DR
Currently, Laravel conflates "schema" with "search_path" and prepares statement parameter bindings incorrectly when setting the value, which could have significant negative implications for PostgreSQL-backed applications. This commit corrects the terminology and configuration, with backward compatibility.
/TL;DR

This commit seeks to address two fundamental issues: 1) incorrect PostgreSQL terminology and application of concepts; 2) ineffective "search_path" values set on connection due to incorrect prepared statement parameter bindings (entire comma-separated string of schemas is quoted when each schema must be quoted separately).

In the existing code, "schema" is conflated with "search_path", which has significant negative implications for PostgreSQL-backed applications.

This commit corrects the terminology within the source so that the code reflects more accurately its actual behavior, but more importantly, it changes the connection configuration variable name that is used to set the "search_path" from "schema" to "searchpath".

The change allows for "decoupling" in that "schema" can still be used for its intended purpose (to prefix object references for schemas not in the "search_path", or to disambiguate objects of the same name that exist in more than one schema in the "search_path"), while "searchpath" can now be used for its intended purpose. In other words, "schema" and "searchpath" are now two separate and unrelated directives, which can be configured independently.

Note that the "schema" connection property is no longer used when configuring the connection, but may still be accessed as needed via getConnection()->getConfig('schema').

The change is backward-compatible in that if the "searchpath" key is not defined on the connection, the "schema" key will be used to set the search_path instead. If "schema" is not defined, PostgreSQL's default schema ("public") will be used.

Note further that "searchpath" may be specified as a string (which allows for a single schema only), or an array (which allows for any number of schemas, excluding only dynamic schemas that have special meaning to PostgreSQL, such as '$user'; support for $-style variable notation may be added in a future revision).

For an exhaustive discussion, see: laravel/ideas#918

return ":$item";
}, $schemas);

return implode(', ', array_combine($preparedSchemas, $schemas));
} else {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this else statement is unnecessary

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not if we want the existing tests to pass...

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe what he means is that you could keep the code inside of the else block but remove the else keyword itself and the functionality would remain the same. eg:

if (is_array($searchPath)) {
    ...
    return implode(', ', array_combine($preparedSchemas, $schemas));
}

return implode(', ', [":$searchPath" => '"'.$searchPath.'"']);

@cbj4074

Copy link
Contributor Author

@cbj4074 cbj4074 Dec 13, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dlett Aha! Of course that's what he meant. Been a long few days... thanks! Is there a "Laravel Style Guide" that articulates these preferences, beyond what PSR-2 requires? Or are they merely personal preferences? Because as far as I know, PSR-2 makes no recommendation in this regard.

Copy link
Contributor

@poppabear8883 poppabear8883 Dec 13, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is exactly the reason to provide more clear suggestions and examples. Thanks for clearing that up @dlett . I do agree that this style makes more sense.

However, this is the style preference from the original ....

/**
     * Format the schema for the DSN.
     *
     * @param  array|string  $schema
     * @return string
     */
    protected function formatSchema($schema)
    {
        if (is_array($schema)) {
            return '"'.implode('", "', $schema).'"';
        } else {
            return '"'.$schema.'"';
        }
    }

Reference: https://github.com/laravel/framework/blob/5.5/src/Illuminate/Database/Connectors/PostgresConnector.php#L109

Not sure why this isn't merged yet to be honest.

Any serious discussion in regards to why this should not be merged ??

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Things take a little time to get merged sometimes. There is no specific reason. It's just nature of open source 😄

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@taylorotwell sure, I understand that completely.

My concern was the hair splitting discussions around personal style preferences with no relevant discussions on actual concerns with the ability to merge this proposal. I have no issues making changes to satisfy any concerns, but they should be justified.

I don't mean to offend anyone involved. Just my 2 cents ;)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And this is precisely why I left it alone: #22425

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hopefully, everybody is happy now. You win, @jmarcher and @carusogabriel . 👏

if (isset($config['schema'])) {
$schema = $this->formatSchema($config['schema']);
if (isset($config['searchpath']) || isset($config['schema'])) {
$searchPath = $this->formatSearchPath($config['searchpath'] ?? $config['schema'] ?? '"public"');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line looks a bit weird with the chain of ??

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorry can you elaborate on "looks weird" ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like a new protected method would work better than chaining the null coalesce operator

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you elaborate on "better"? Hehe... seriously, though, this seems like the "simplest" solution to me. If there is some worthwhile advantage to be gained by adding a new method, then feel free to propose the change.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not to sound smug but, "work better" ? It would work no differently then what is here ? If you are referring to readability, then the change is merely an option. This implementation does not violate any coding principles or practices. Please be more clear with your comments and maybe contribute with an example.

Thanks in advance

@jmarcher

@cbj4074 cbj4074 force-pushed the Branch_v5.5.0 branch 5 times, most recently from 5d8ed01 to f04f0ec Compare December 14, 2017 17:00
if (isset($config['schema'])) {
$schema = $this->formatSchema($config['schema']);
if (isset($config['searchpath']) || isset($config['schema'])) {
$searchPath = $this->formatSearchPath($config['searchpath'] ?? $config['schema'] ?? '"public"');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cbj4074 since you introduced ->formatSearchPath(), doesn't this mean the last null-coalesce default value should be simply 'public' instead of '"public"', as the method will take care of it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may be right. It will be a few minutes but I'll fire up the laptop and confirm this.

Copy link
Contributor

@poppabear8883 poppabear8883 Dec 16, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mfn Tests do check out in either case. Regarding tests they explicitly check a literal string which tells me in both cases the string is formatted correctly.

tests/Database/DatabaseConnectorTest.php

$connection->shouldReceive('prepare')->once()->with('set search_path to "public"')->andReturn($connection);

Unless I am over looking something here ;)

Copy link
Contributor

@poppabear8883 poppabear8883 Dec 16, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact we could remove the ?? '"public"' altogether and tests will pass as we explicitly check that either schema OR searchpath isset . However, to your point if '"public"' was in fact used in this instant it would likely fail to an improper formatted string. So removing this as its unnecessary is likely to be the solution here.

@GrahamCampbell GrahamCampbell changed the title [5.5.] Fix flaws in how search_path is set on connection [5.5] Fix flaws in how search_path is set on connection Dec 16, 2017
This commit seeks to address two fundamental issues: 1) incorrect PostgreSQL terminology and application of concepts; 2) ineffective "search_path" values set on connection due to incorrect prepared statement parameter bindings (entire comma-separated string of schemas is quoted when each schema must be quoted separately).

In the existing code, "schema" is conflated with "search_path", which has significant negative implications for PostgreSQL-backed applications.

This commit corrects the terminology within the source so that the code reflects more accurately its actual behavior, but more importantly, it changes the connection configuration variable name that is used to set the "search_path" from "schema" to "searchpath".

The change allows for "decoupling" in that "schema" can still be used for its intended purpose (to prefix object references for schemas not in the "search_path", or to disambiguate objects of the same name that exist in more than one schema in the "search_path"), while "searchpath" can now be used for *its* intended purpose. In other words, "schema" and "searchpath" are now two separate and unrelated directives, which can be configured independently.

Note that the "schema" connection property is no longer used when configuring the connection, but may still be accessed as needed via getConnection()->getConfig('schema').

The change is backward-compatible in that if the "searchpath" key is not defined on the connection, the "schema" key will be used to set the search_path instead. If "schema" is not defined, PostgreSQL's default schema ("public") will be used.

Note further that "searchpath" may be specified as a string (which allows for a single schema only), or an array (which allows for any number of schemas, excluding only dynamic schemas that have special meaning to PostgreSQL, such as '$user'; support for $-style variable notation may be added in a future revision).

For an exhaustive discussion, see: laravel/ideas#918
@cbj4074
Copy link
Contributor Author

cbj4074 commented Dec 16, 2017

Merged upstream... again. Are we waiting on something specific here?

@cbj4074
Copy link
Contributor Author

cbj4074 commented Dec 16, 2017

Several of us have performed additional research regarding this subject and would like to revise the PR. This one wouldn't have made the status quo any worse, but the consensus is that we can address the relevant concerns more simply. We'll open another one when we're ready for feedback.

@cbj4074 cbj4074 closed this Dec 16, 2017
Copy link
Contributor

@mfn mfn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Guess I'm too late to the game as it was already closed in the meantime 😄

I'm definitely pro fixing this (postgres user here, though not using non-standard search_path at the moment).

return ":$item";
}, $schemas);

return implode(', ', array_combine($preparedSchemas, $schemas));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand your use of the key=>value approach together with implode.

In my book, the keys when performing implode are ignored?

 $ php -v
PHP 7.1.11-1…
…
$ php -r 'var_dump(implode(",", ["a"=>"b", "c" => "d"]));'
string(3) "b,d"

I understand you try to use something with prepared statements here but IMHO it ends up being not used.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, you are spot-on!

{
if (is_array($schema)) {
return '"'.implode('", "', $schema).'"';
if (is_array($searchPath)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small suggestion; instead of having is_array / else (like the old code), I think we can get rid of nesting the whole method if you actually convert the string into an array:

$searchPath = (array) $searchPath;
// the code always just has to handle arrays

See:

$ php -r 'var_dump( (array) "foo");'
array(1) {
  [0]=>
  string(3) "foo"
}

$ php -r 'var_dump( (array) ["foo"]);'
array(1) {
  [0]=>
  string(3) "foo"
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Precisely. That is one of the things we are seeking to change.

{
if (isset($config['schema'])) {
$schema = $this->formatSchema($config['schema']);
if (isset($config['searchpath']) || isset($config['schema'])) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason why the config should be named searchpath when the postgres run-time parameter is called search_path?

E.g. the same class also deals with application_name which is named that way in the configuration and in postgres.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

None at all! I had overlooked the fact that application_name contains an underscore, and concur that it should perhaps be search_path, for the sake of consistency. Nice catch!

@cbj4074 cbj4074 deleted the Branch_v5.5.0 branch December 16, 2017 18:54
@cbj4074
Copy link
Contributor Author

cbj4074 commented Dec 16, 2017

@mfn Where were you three days ago?! 🤣 Seriously, though, your insights are succinct, helpful, and very much appreciated. This PR simply grew to be too messy and confusing, so we're paring-down and working to present the perceived limitations and proposed changes as straightforwardly as possible. We'll keep you in the loop and would love for you to take a look as soon as it's posted.

@mfn
Copy link
Contributor

mfn commented Dec 16, 2017

Thanks, appreciated being kept in the loop!

One more thing: by accident I've came over \Illuminate\Database\Schema\PostgresBuilder::parseSchemaAndTable:

    protected function parseSchemaAndTable($table)
    {
        $table = explode('.', $table);

        if (is_array($schema = $this->connection->getConfig('schema'))) {
            if (in_array($table[0], $schema)) {
                return [array_shift($table), implode('.', $table)];
            }

            $schema = head($schema);
        }

        return [$schema ?: 'public', implode('.', $table)];
    }

This too looks like it need to be adapted.

@poppabear8883
Copy link
Contributor

@mfn is there any tests that touch that method ? I can't seem to find any. I am assuming that logically this method would work with our new implementation as tests are all passing. So are you referring to updating it to be consistent with naming specifically ?

@mfn
Copy link
Contributor

mfn commented Dec 16, 2017

I might have missed something, but the pasted code uses:

$this->connection->getConfig('schema')

And given the attempt in this PR, yes, schema would still be supported but "the new way" would be the search_path variable which isn't covered in this codepath and might yield unexpected results therefore, methinks.

@poppabear8883
Copy link
Contributor

Excellent point with the getConfig('schema'). I totally over looked that.

You are right to assume that, it in fact would need to be updated.

Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants