Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle schema references #195

Closed
lornajane opened this issue Apr 15, 2021 · 31 comments
Closed

Handle schema references #195

lornajane opened this issue Apr 15, 2021 · 31 comments
Labels
bug Something isn't working

Comments

@lornajane
Copy link
Contributor

I ran into an error using https://github.com/riferrei/srclient which is a Go client for Kafka with Confluent Schema Registry support. That (since v5.5 I believe) has support for schema references, however Karapace sees this as an unknown field and gives an error:

422 Unprocessable Entity: Unrecognized field: references

Looking at what the client is sending, it's an empty [] field named references at the same level as schema and schemaType.

What I expected: Karapace should ignore/tolerate this field, even if it doesn't have support for schema references quite yet.

@lornajane
Copy link
Contributor Author

Here's what the client sent to Karapace:

{
    "schema": "{     \"namespace\": \"io.aiven.example\",     \"type\": \"record\",     \"name\": \"MachineSensor\",     \"fields\": [         {\"name\": \"machine\", \"type\": \"string\", \"doc\": \"The machine whose sensor this is\"},         {\"name\": \"sensor\", \"type\": \"string\", \"doc\": \"Which sensor was read\"},         {\"name\": \"value\", \"type\": \"float\", \"doc\": \"Sensor reading\"},         {\"name\": \"units\", \"type\": \"string\", \"doc\": \"Measurement units\"}     ] } ",
    "schemaType": "AVRO",
    "references": []
}

(in case an example is useful)

@lornajane lornajane added the bug Something isn't working label May 17, 2021
@alpreu
Copy link

alpreu commented Jul 7, 2021

Any updates? We would like to depend on karapace for license reasons but need to have feature compatibility regarding the schema-references

@hackaugusto
Copy link
Contributor

@alpreu currently we are not working on this feature. PRs are welcome :)

@gklijs
Copy link

gklijs commented Jul 26, 2021

For https://crates.io/crates/schema_registry_converter references are also supported for almost a year now. It would be nice if they were supported by karapace as well.

@perj
Copy link

perj commented Apr 26, 2022

confluent-kafka-python also sends "references": [] for json or protobuf schemas.

confluent_kafka.schema_registry.error.SchemaRegistryError: Unrecognized field: references (HTTP status code 422, SR code 422)

@ChristosMantzouranisMT
Copy link

Hi, a bit confused here. My understanding is that there is no support for schema references, so is there an alternative in order to avoid redundant embedded schemas?

@gklijs
Copy link

gklijs commented May 6, 2022

@ChristosMantzouranisMT you can use Confluent Schema registry, if that's an option and you want to use references. Or you could build some tooling so you can embed them dynamically.

@ChristosMantzouranisMT
Copy link

ChristosMantzouranisMT commented May 9, 2022

@ChristosMantzouranisMT you can use Confluent Schema registry, if that's an option and you want to use references. Or you could build some tooling so you can embed them dynamically.

@gklijs I am using aiven's hosted service so, using Confluent Schema registry is not an option ,thanks for your answer.

@sujayinstaclustr
Copy link

sujayinstaclustr commented May 9, 2022

We at Instaclustr have commenced work to support Protobuf Schema references. Here is the draft plan of PR's we are working on.

The implementation plan consists of the following PRs to Aiven Karapace repository.

Question to @hackaugusto or @lornajane - Can the PRs be conducted in official Aiven’s Karapace GitHub repository instead of Instaclustr's fork? For example, PR1 is not feature-complete functionality, it is required to implement other PRs? PR2, PR3, where as PR4 can be considered feature-complete once implemented.

Protobuf Schema References Implementation in Karapace

PR 1

Add references support to all the required endpoints. The functionality is about 90% common to all the protocols (i.e. Avro, JSON, Protobuf, etc.). All the requests, responses, and ways of storing data in Kafka should be 1-to-1 compatible with Schema Registry:

  • GET: return references data
  • POST: Store references data to Kafka in 1-to-1
  • Implement new endpoints specific to references. (REFERENCED BY: return list of schema IDs which reference a schema)
  • DELETE: don’t allow to delete schemas that are referenced by others, etc.
  • Tests
PR 2

Support references (imports) in Protobuf schemas.

  • Resolve imported entities using references
  • POST: Validate Protobuf schema with references
  • POST: Compatibility check with references
  • Tests
PR 3

Support Known Dependencies (KD) in Protobuf schemas

  • Research an approach to storing and resolving KD
  • Resolve imported entities using KD
  • Tests
PR 4

Serialize and deserialize using Protobuf schemas with imports

  • Resolve references by protoc to build Python modules
  • Resolve KD by protoc to build Python modules
  • Tests

@hackaugusto
Copy link
Contributor

Question to @hackaugusto or @lornajane - Can the PRs be conducted in official Aiven’s Karapace GitHub repository instead of Instaclustr's fork?

Should be fine as long as the tests are passing and the code can be released :)

@sujayinstaclustr
Copy link

Hi

We have a PR ready in our forked repository. Thought before we raise the PR to upstream repository, we can get some feedback and a review from you all.

PR - instaclustr#21.

PR 1 consists of the following changes.

Add references support to all the required endpoints. The functionality is about 90% common to all the protocols (i.e. Avro, JSON, Protobuf, etc.). All the requests, responses, and ways of storing data in Kafka should be 1-to-1 compatible with Schema Registry:

  • GET: return references data
  • POST: Store references data to Kafka in 1-to-1
  • Implement new endpoints specific to references. (REFERENCED BY: return list of schema IDs which reference a schema)
  • DELETE: don’t allow to delete schemas that are referenced by others, etc.
  • Tests

Cheers,
Sujay

@sujayinstaclustr
Copy link

sujayinstaclustr commented May 24, 2022

@hackaugusto let me know if it is ok to review the PR on instaclustr/karapace fork? or should we raise the PR to avien/karapace upstream? We prefer the reviewing on our fork before we raise the PR to upstream, but ok with raising it to the upstream as well. The PR is ready for review as mentioned above.

Cheers,
Sujay

@hackaugusto
Copy link
Contributor

@sujayinstaclustr
Copy link

Hello,
Hope everyone is doing well.

Just following up on the PR review for instaclustr#21? If anyone of you have time to provide some feedback? We are continuing to work on PR # 2 as per the plan above.

@sujayinstaclustr
Copy link

Hi @hackaugusto following up again on the PR #1 review. Sorry to follow up so many times, but I am working with a deadline on this project, it will be helpful if we can get the PR review done.

Thanks in advance.

@hackaugusto
Copy link
Contributor

Hi @sujayinstaclustr, no worries. @jjaakola-aiven is doing the review :)

@jjaakola-aiven
Copy link
Contributor

@sujayinstaclustr What is the plan for the Avro and JSON support?

@sujayinstaclustr
Copy link

@jjaakola-aiven based on when protobuf support is complete, we will come up with a plan on working to support Avro and JSON. Is there a preference for which one to support, Avro or JSON? I am asking because I have limited resources and budget I am working with.

Cheers

@jjaakola-aiven
Copy link
Contributor

@sujayinstaclustr I would prefer Avro, it is the default format.

@sujayinstaclustr
Copy link

@jjaakola-aiven We have the second PR also ready for your review - instaclustr#23

Support Known Dependencies in Protobuf schemas

@sujayinstaclustr
Copy link

sujayinstaclustr commented Aug 10, 2022

Hi @jjaakola-aiven
We have the PR# 23 which now combines the PR # 21 which you have already reviewed and we fixed the comments. We had to combine this because there were a lot of changes made in the Aiven upstream and when we resynced our fork it create lots of conflicts. In order for us to avoid resolving each of the PRs, we combined them into PR# 23

Let me know if you can start reviewing and hope there are not a lot of changes required.

@sujayinstaclustr
Copy link

sujayinstaclustr commented Aug 17, 2022

@jjaakola-aiven Gentle reminder for reviewing PR# 23

@sujayinstaclustr
Copy link

@jjaakola-aiven @hackaugusto reminder to review PR# 23

Thanks in advance

@encima
Copy link
Contributor

encima commented Oct 3, 2022

@jjaakola-aiven @hackaugusto could this be reviewed? or could someone else take it to move it forward?

@sujayinstaclustr
Copy link

@jjaakola-aiven @hackaugusto could this be reviewed? or could someone else take it to move it forward?

@encima this is being reviewed and there was some feedback on the PR from @jjaakola-aiven. We finished some of the review feedback changes.

@RyanSkraba
Copy link
Contributor

Hello! I hope everyone is doing well -- I have been working on cleaning up the commit history and I got pretty far along in organising the work done in #467, #473 and #504

I had put together a PR on my personal repo before leaving for vacation: RyanSkraba#1 that mostly works. This has little new code: it's all rebasing and reorganising the work that has diverged in the 3 PRs I mentioned. I especially made an effort to retain all authorship for who wrote what, and hopefully I didn't make any mistakes. IIRC the broken bits were related to different python versions.

I'd like to take this back up when I get back into the office in about a week -- it's gotten big and unwieldy enough that merge conflicts are a real pain, and it looks like there's some new conflicts since my last effort!

@drognisep
Copy link

drognisep commented Mar 13, 2023

@jjaakola-aiven
Any updates on this? It looks like this issue is the only thing keeping us from adopting Karapace schema registry.

@libretto
Copy link
Contributor

I created PR with References/Dependencies few month ago but it was not reviewed by Aiven and was deprecated because main branch development go forward and branch which was created for references/dependencies PR nobody keep in sync with main branch. Now I synced my code with main branch and created new pull request #560 I hope somebody can review it.

@RyanSkraba
Copy link
Contributor

Hey thanks @libretto ! I know it couldn't have been easy to redo this with main being a moving target (there were a number of tricky merge conflicts that I worked through as well). I propose moving forward on this last PR #560 and closing the others (#467, #473 and #504).

I've launched the workflows for your PR, and it'll be my priority!

@drognisep Have you taken a look at the proposition? This first pass supports Protobuf only.

@drognisep
Copy link

@drognisep Have you taken a look at the proposition? This first pass supports Protobuf only.

@RyanSkraba That happens to work perfectly for us. 😁

@tvainika
Copy link
Contributor

Karapace 3.5.0 was a first release with support for references

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests