Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add topological sorting for dumped views via TSort #398

Closed
wants to merge 12 commits into from

Conversation

calebhearth
Copy link
Contributor

Rebase of #337

@edwardloveall
Copy link
Contributor

Hello! We (me and @thorncp) have been trying this code, but it hasn't worked for us. Here's exactly what we tried:

  • pin scenic to 1431c49ec6d6063212aa718db73cc6158faa1461
  • create view as
  • migrate
  • This added as view at the top of the views in db/schema (alphabetical sort)
  • create view bs
  • migrate
  • added bs view after as in the db/schema (alphabetical sort)
  • replace_view as with select * from bs to generate dependent order
    • We also tried it with update_view and got the same outcome
  • migrate
  • as replaced the view and did not move it below bs

At this point, our schema can't be loaded into the database, because it will try to create as using bs but bs does not yet exist.

rails db:schema:load

ActiveRecord::StatementInvalid: PG::UndefinedTable: ERROR:  relation "bs" does not exist
LINE 2:      FROM bs;
                  ^
/path/to/db/schema.rb:4416:in `block in <main>'
/path/to/db/schema.rb:13:in `<main>'

Caused by:
PG::UndefinedTable: ERROR:  relation "bs" does not exist
LINE 2:      FROM bs;
                  ^

We also tried this with our existing views. Before we dumped the schema they were in a valid, topological order. After we dumped, they were in an invalid non-topological order.

It this expected? As it stands, our app would break if this got merged.

@derekprior
Copy link
Contributor

This is not expected. The idea of this PR is to provide both correct ordering (which we have today) and stable ordering (which we do not).

Your manual steps are a good test case for us to consider including in our suite today (that bs must come before as) to prevent regression. I'm a bit surprised we don't have something like that already though...

@edwardloveall
Copy link
Contributor

edwardloveall commented Nov 19, 2023

Thanks for the confirmation.

FWIW, I just wrote a test against this commit and it passes:

it "sorts dependency order over alphabetical order" do
  conn.create_view "as", sql_definition: <<-SQL
    SELECT 1;
  SQL
  conn.create_view "bs", sql_definition: <<-SQL
    SELECT 2;
  SQL
  conn.update_view "as", sql_definition: <<-SQL
    SELECT * FROM bs;
  SQL
  
  expect(views).to eq(%w[bs as])
end

I also tried conn.replace_view but that doesn't take a sql_definition and I didn't research further.

Maybe it's our setup somehow? Maybe there's some untested behavior if you migrate in between creating and updating dependent views?

@edwardloveall
Copy link
Contributor

Update: I tried this on a brand new app and it did put bs before as. So I'm thinking it has something to do with our particular app. Hopefully I'll get some more info for you.

@edwardloveall
Copy link
Contributor

edwardloveall commented Nov 20, 2023

We (@heatherbooker and I) figured it out today. Our database views are not in the public schema, they exist in a different schema set via the config/database.yml that looks something like this:

production:
  database: app_production
  schema_search_path: "abc"

The SQL that generates the list of dependent views does not include the schema name, but the code to generate view names does include the non-public schema name.

This breaks the logic that prepares the views to be TSorted.

We updated the select portion of the SQL and now our example case passes and puts as below bs:

-      SELECT distinct dependent_ns.nspname AS dependent_schema
-      , dependent_view.relname AS dependent_view
-      , source_ns.nspname AS source_schema
-      , source_table.relname AS source_table
+      SELECT DISTINCT
+        dependent_ns.nspname AS dependent_schema,
+        CASE
+          WHEN dependent_ns.nspname = 'public' THEN dependent_view.relname
+          ELSE (dependent_ns.nspname || '.' || dependent_view.relname)
+        END AS dependent_view,
+        source_ns.nspname AS source_schema,
+        CASE
+          WHEN source_ns.nspname = 'public' THEN source_table.relname
+          ELSE (source_ns.nspname || '.' || source_table.relname)
+        END AS source_table
       FROM pg_depend
       JOIN pg_rewrite ON pg_depend.objid = pg_rewrite.oid
       JOIN pg_class as dependent_view ON pg_rewrite.ev_class = dependent_view.oid
       JOIN pg_class as source_table ON pg_depend.refobjid = source_table.oid
       JOIN pg_namespace dependent_ns ON dependent_ns.oid = dependent_view.relnamespace
       JOIN pg_namespace source_ns ON source_ns.oid = source_table.relnamespace
       WHERE dependent_ns.nspname = ANY (current_schemas(false)) AND source_ns.nspname = ANY (current_schemas(false))
       AND source_table.relname != dependent_view.relname
       AND source_table.relkind IN ('m', 'v') AND dependent_view.relkind IN ('m', 'v')
-      ORDER BY dependent_view.relname;
+      ORDER BY dependent_view;

This should replicate the current scenic logic.

I also created a test, but it's failing when the new test just before it (the test added in this PR on line 152) runs before it. Not sure why:

it "sorts dependency order when views exist in a schema" do
  conn.execute("CREATE SCHEMA IF NOT EXISTS scenic; SET search_path TO scenic")
  conn.execute("CREATE VIEW apples AS SELECT 1;")
  conn.execute("CREATE VIEW bananas AS SELECT 2;")

  conn.execute("CREATE OR REPLACE VIEW apples AS SELECT * FROM bananas;")

  expect(views).to eq(%w[scenic.bananas scenic.apples])

  conn.execute("DROP SCHEMA IF EXISTS scenic CASCADE;")
end

@calebhearth
Copy link
Contributor Author

At my jobby job we've been using this since 10/26 and have made several updates to views (though none that change the dependency structure) and have not seen any unnecessary reordering of views, so I'm inclined after ~2.5 months to call this a win.

@edwardloveall I wonder if #401 helps with your use case at all? Also if you want to write up a PR to add specs that exercise view ordering I'd appreciate it.

@derekprior
Copy link
Contributor

It's good enough for me. Merge when ready.

When we had talked about this problem originally (years ago), we said we would make this a "breaking change" for 2.0 since it would likely change everyone's generated structure.sql once. Do you still feel that's appropriate?

@calebhearth
Copy link
Contributor Author

Unclear - I think we'd re-order the schema.rb one more time whenever it's next generated (with a view change?) which has already been happening, then it should stop. That may not be a breaking change.

@calebhearth calebhearth added this to the 2.0 milestone Jan 20, 2024
edwardloveall added a commit to edwardloveall/scenic that referenced this pull request Jan 22, 2024
Upcoming changes like [topological sorting][1] have the potential to
write views to `db/schema.rb` in an order that does not respect
dependencies. This adds a test to ensure that views in all schemas are
both included, and in the correct dependency order.

[1]: scenic-views#398
@edwardloveall
Copy link
Contributor

@edwardloveall I wonder if #401 helps with your use case at all?

It does, but only in combination with the changes here.

Also if you want to write up a PR to add specs that exercise view ordering I'd appreciate it.

Done: #403

calebhearth pushed a commit that referenced this pull request Feb 23, 2024
Upcoming changes like [topological sorting][1] have the potential to
write views to `db/schema.rb` in an order that does not respect
dependencies. This adds a test to ensure that views in all schemas are
both included, and in the correct dependency order.

[1]: #398
@calebhearth
Copy link
Contributor Author

I'm seeing a failure on one of the specs here after @edwardloveall's PR adding sort order specs was merged. I'm fairly sure that the spec is correct and I'm not sure why the failure is happening.

@edwardloveall
Copy link
Contributor

I'm seeing a failure on one of the specs here after @edwardloveall's PR adding sort order specs was merged. I'm fairly sure that the spec is correct and I'm not sure why the failure is happening.

Looking at the branch, it's similar to my previous comment: the SQL that generates the dependent views does not contain the schema, but Scenic's views do. tsorted_views then sorts the non-schema-qualified views correctly and ignores the schema-qualified views. Then the non-schema-qualified views get filtered out.

Here's one possible fix:

  def tsorted_views(views_names)
    views_hash = TSortableHash.new
  
    ::Scenic.database.execute(DEPENDENT_SQL).each do |relation|
-     source_v = relation["source_table"]
-     dependent = relation["dependent_view"]
+     source_v = [relation["source_schema"], relation["source_table"]].join(".")
+     dependent = [relation["dependent_schema"], relation["dependent_view"]].join(".")
      views_hash[dependent] ||= []
      views_hash[source_v] ||= []
      views_hash[dependent] << source_v
      views_names.delete(source_v)
      views_names.delete(dependent)
    end
  
    # after dependencies, there might be some views left
    # that don't have any dependencies
    views_names.sort.each { |v| views_hash[v] = [] }
    binding.irb
  
    views_hash.tsort
  end

@Unknown-Guy
Copy link

I'm seeing a failure on one of the specs here after @edwardloveall's PR adding sort order specs was merged. I'm fairly sure that the spec is correct and I'm not sure why the failure is happening.

Looking at the branch, it's similar to my previous comment: the SQL that generates the dependent views does not contain the schema, but Scenic's views do. tsorted_views then sorts the non-schema-qualified views correctly and ignores the schema-qualified views. Then the non-schema-qualified views get filtered out.

Here's one possible fix:

  def tsorted_views(views_names)
    views_hash = TSortableHash.new
  
    ::Scenic.database.execute(DEPENDENT_SQL).each do |relation|
-     source_v = relation["source_table"]
-     dependent = relation["dependent_view"]
+     source_v = [relation["source_schema"], relation["source_table"]].join(".")
+     dependent = [relation["dependent_schema"], relation["dependent_view"]].join(".")
      views_hash[dependent] ||= []
      views_hash[source_v] ||= []
      views_hash[dependent] << source_v
      views_names.delete(source_v)
      views_names.delete(dependent)
    end
  
    # after dependencies, there might be some views left
    # that don't have any dependencies
    views_names.sort.each { |v| views_hash[v] = [] }
    binding.irb
  
    views_hash.tsort
  end

Guess my comment on original PR was lost, mentioned in here #337 (review), we have been running with those changes since then without issues in apps that use cross schema views.

- Fix some dependency conflicts / unneeded version locks
- Fix some Rails 7.1 issues
calebhearth pushed a commit that referenced this pull request May 2, 2024
Upcoming changes like [topological sorting][1] have the potential to
write views to `db/schema.rb` in an order that does not respect
dependencies. This adds a test to ensure that views in all schemas are
both included, and in the correct dependency order.

[1]: #398
@calebhearth
Copy link
Contributor Author

Superseded by #416

@calebhearth calebhearth closed this May 2, 2024
calebhearth pushed a commit that referenced this pull request May 15, 2024
Upcoming changes like [topological sorting][1] have the potential to
write views to `db/schema.rb` in an order that does not respect
dependencies. This adds a test to ensure that views in all schemas are
both included, and in the correct dependency order.

[1]: #398
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants