Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sync issue on child tables #569

Open
gitbute opened this issue Aug 14, 2024 · 5 comments
Open

Sync issue on child tables #569

gitbute opened this issue Aug 14, 2024 · 5 comments

Comments

@gitbute
Copy link

gitbute commented Aug 14, 2024

PGSync version: 3.2.0

Postgres version: 16.4.0

Elasticsearch/OpenSearch version: 7.16.3

Redis version: 7.4.0

Python version: 3.9

Problem Description:

Possibly related to #568, maybe also #552

Problem Description:

Take this schema:

[
  {
    "database": "db",
    "index": "document",
    "nodes": {
      "table": "document",
      "columns": [
        "title"
      ],
      "children": [
        {
          "table": "content",
          "columns": ["content"],
          "relationship": {
            "variant": "object",
            "type": "one_to_one",
            "foreign_key": {
              "child": ["document_id"],
              "parent": ["uuid"]
            }
          }
        }
      ]
    }
  },
  {
    "database": "db",
    "index": "content",
    "nodes": {
      "table": "content",
      "columns": [
        "content"
      ]
    }
  }
]

Lets prepopulate the document table with 20 documents and no content, and let pgsync index. While pgsync is down, i populate the content table with 20 rows. After pgsync starts, only one document in index "document" has content, while in index "content" i have 20 objects.
So pgsync in my opinion seems to have a more general syncing issue related to child tables.

Error Message (if any):



@toluaina
Copy link
Owner

  • Can you please share your db schema (roughly).
  • Also, does the same happen if you exclude the 2nd node in the schema with index content

@toluaina
Copy link
Owner

  • I also see you are manually specifying the foreign_key between the document and content.
  • I'm guessing this means Content.document_id is of type UUID
  • Document.uuid is the primary key and of type UUID?
  • There is no foreign key defined on content relation for document_id?
  • is this correct?

@gitbute
Copy link
Author

gitbute commented Aug 16, 2024

  • I also see you are manually specifying the foreign_key between the document and content.

I tried manually specifying foreign keys but also automatically, the issue stays the same

  • I'm guessing this means Content.document_id is of type UUID

Correct

  • Document.uuid is the primary key and of type UUID?

Yes, also correct

  • There is no foreign key defined on content relation for document_id?

Foreign key is defined in database relation, (content.document_id) -> (document.uuid)

  • Also, does the same happen if you exclude the 2nd node in the schema with index content

Yes, it happens aswell

@toluaina
Copy link
Owner

This has now been resolved on the main branch. Can you please try it out now?

@gitbute
Copy link
Author

gitbute commented Aug 26, 2024

Thanks so much for working on this issue. Sadly, in my testing, the change seemed to make no difference. Im currently trying to debug three issues with pgsync all relating to child relations:

  • losing data updated/insert in child tables while pgsync is not running
  • losing data updated/insert in child tables while pgsync IS running with > 1 redis read chunk size
  • pgsync doing lots of updates to the index even if only a low number of inserts / updates are happening

We probably will be looking for another indexing solution for now, but will reconsider if those issues are fixed. I dont understand the pgsync code enough to help with it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants