Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bazel: Migrate to bazel 7; use bzlmod for dependencies #284

Open
wants to merge 17 commits into
base: main
Choose a base branch
from

Conversation

minor-fixes
Copy link

@minor-fixes minor-fixes commented Jan 15, 2024

This change performs a large cleanup, attempting to migrate this repo to building using the latest major version of bazel, as well as to using bzlmod for dependencies instead of WORKSPACE.

In order to demonstrate that builds still work and the repo is more useful for downstream users, an example submodule project is added, that contains:

  • sample MODULE.bazel boilerplate
  • sample proto generation targets for C++, Go, and Java
  • sample applications that import gRPC generated code for C++, Go, and Java

In pursuit of the above, this change:

  • modifies how protos are built:
    • rules are inlined into the BUILD file next to each proto, instead of residing in a subdir
    • different rules are imported, depending on language
    • switched_rules WORKSPACE macros are removed; using bzlmod should make dependency resolution easier, relaxing the need for such a macro
  • updates go.mod to the format supported by the latest Go toolchain
  • removes unused gRPC bazel rules
  • adopts a patch for googleapis based on how to upgrade to 0.41.0 via bzlmod only rules_go#3685 (comment)

Tested:

  • cd examples/sample_project && bazel build //... works
  • bazel build //... in top-level directory works

@mortenmj
Copy link

@minor-fixes Adding "examples" to .bazelignore at the root would avoid Bazel recursing into that directory

@minor-fixes minor-fixes marked this pull request as ready for review January 19, 2024 18:01
/bazel-out
/bazel-remote-apis
/bazel-testlogs
**/bazel-*
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we assume a sufficiently up-to-date git client version?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's a reasonable assumption. If we find that this breaks someone, it's easy to revert back to the explicit list of directories.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity, how's this different from just "bazel-*"?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(just double-checked to confirm) This ignores bazel-generated symlinks in repo subdirs, not just the toplevel, which is significant because there now exist example workspaces where one can cd to and run builds.

Clarified the comment above.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get that **/bazel-* would indeed ignore that. But why can't we just use bazel-*? So without a leading slash, without a leading anything.

@minor-fixes
Copy link
Author

@minor-fixes Adding "examples" to .bazelignore at the root would avoid Bazel recursing into that directory

Thanks! I don't know why I expected the presence of a WORKSPACE file in a subdir to stop bazel, but the .bazelignore fix works great!

Copy link
Collaborator

@bergsieker bergsieker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this! It's a big cleanup!

We've actually had a goal for some time to remove the generated bindings from this repository and make it a reference only. Projects using these APIs would need to create their own bindings, rather than importing the ones from this repository, which I think saves us a lot of maintenance burden around generating bindings for multiple languages, supporting multiple versions of Bazel, etc.

I'm wondering if this change is the right time for that--move the generated bindings into your sample application, and let that stand as a reference for how downstream users should do things. WDYT?

@minor-fixes
Copy link
Author

I started with creating bindings in the sample application actually - and reversed course after I realized that I'm not sure what the long-term ramifications of doing specifically that for C++ and ODR violations (say one depends on two projects that each generate, build, and link against their own bindings in their respective libraries - does that work? I only know enough C++ to know that I don't know C++ - and that was back in 2016).

As far as that is an issue, the easiest way to not have that issue is to have one set of bindings after dependency resolution, and then compile those once to be linked in where needed.

It'd be a shame, too - bazel is finally giving us the tools to manage dependencies sanely, instead of the WORKSPACE shenanigans we've had to deal with up until this point. As much as I'd like to be mad about having to learn bzlmod, it seems to work well for the modules in the registry (indeed, most of the issues are with the modules not in the registry!) and it does look to be more concise and tractable than WORKSPACE.

My gut reaction is to go the other way, and put some maintenance elbow grease in (which I'm happy to do! Maybe others will join in over time) because this is a fairly slim repo, and we can work to keep it slim and maximally useful. Hopefully this puts us on the path to being an entry in the Bazel module registry, which will make it easier to import, and the examples should help new projects get spun up more quickly.


I might not be aware of the reasons for avoiding binding generation, though - looking at the ecosystem(s) Google is pushing, it's not clear what "best practice" is - maybe a Bazel expert can weigh in here? What technique (doing binding gen vs. not) builds the best ecosystem at large?

@jasonschroeder-sfdc
Copy link

I love this PR, thank you for the bzlmod support! One request before this is merged.. Can we rename the java_proto_library rules? I'd like to reserve the _java_grpc suffix to be generated by java_grpc_library(). Thank you!

Example:

java_proto_library(
-    name = "remote_asset_java_grpc",
+    name = "remote_asset_java_proto",
     deps = [":remote_asset_proto"],
)

@bergsieker
Copy link
Collaborator

I love this PR, thank you for the bzlmod support! One request before this is merged.. Can we rename the java_proto_library rules? I'd like to reserve the _java_grpc suffix to be generated by java_grpc_library(). Thank you!

Example:

java_proto_library(
-    name = "remote_asset_java_grpc",
+    name = "remote_asset_java_proto",
     deps = [":remote_asset_proto"],
)

+1

Copy link
Collaborator

@bergsieker bergsieker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks reasonable, given the position that "the ship has sailed" on including go bindings. I consulted with the Bazel team, and we did not find a better method for managing bindings, so we'll go with this for now.

I don't really have any idea how to test this, so we may find that it's necessary to roll back if it turns out to break someone.

/bazel-out
/bazel-remote-apis
/bazel-testlogs
**/bazel-*
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's a reasonable assumption. If we find that this breaks someone, it's easy to revert back to the explicit list of directories.

Copy link
Author

@minor-fixes minor-fixes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we rename the java_proto_library rules? I'd like to reserve the _java_grpc suffix to be generated by java_grpc_library()

Done!

Though you raise an excellent point - the Java targets, best I can tell, don't generate gRPC stubs, whereas the targets for Go and C++ do.

This is an unfortunate difference that maybe someone with bazel + Java ecosystem knowledge can resolve more quickly than I, or at least shouldn't block this PR (no gRPC stubs are generated for Java currently, so this is not a regression). In any event, I've added TODOs on all the targets, as well as the example Java code calling out the missing stub instantiation.

/bazel-out
/bazel-remote-apis
/bazel-testlogs
**/bazel-*
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(just double-checked to confirm) This ignores bazel-generated symlinks in repo subdirs, not just the toplevel, which is significant because there now exist example workspaces where one can cd to and run builds.

Clarified the comment above.

@minor-fixes
Copy link
Author

I don't really have any idea how to test this, so we may find that it's necessary to roll back if it turns out to break someone.

Can you expound on what we would consider "breaking"? Between the bazel client version update, bzlmod, and possibly changing target names, I do expect users to need to adapt when updating to latest; though, I also don't expect orgs with business-critical builds to be pulling tip of main in production builds.

@jasonschroeder-sfdc
Copy link

@bergsieker @EdSchouten Would you mind reviewing and merging? Thank you in advance!!

@EdSchouten
Copy link
Collaborator

@bergsieker @EdSchouten Would you mind reviewing and merging? Thank you in advance!!

I don't have permissions to merge, so I guess @bergsieker will need to do that. I just looked through the diff, and I can't see anything out of the ordinary. That said, I still haven't found any time to play around with bzlmod. So not seeing anything wrong doesn't mean it's right... 😆 So maybe just ship it?

"""remote-apis dependencies"""

module(
name = "bazel_remote_apis",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
name = "bazel_remote_apis",
name = "remoteapis",

bazel itself uses this name, not sure which should be preferred

@keith
Copy link
Member

keith commented Mar 27, 2024

I stacked a few commits on this over here #293

@Wyverald
Copy link
Member

Wyverald commented Jul 12, 2024

I've just come across this PR (disclaimer: I work on Bzlmod). Thank you for your contribution! And sorry about the long delay -- I'm dedicating time to fixing the googleapis + grpc + remoteapis situation for Bzlmod in Q3, so my hope is that we can land some concrete improvements within the next month.

My main question regarding this PR is the choice to do this:

rules are inlined into the BUILD file next to each proto, instead of residing in a subdir

Putting all language rules in the same BUILD file presents the challenge that we'll need to load all those rulesets when we need just one of them. Having each language be in a subdir at least makes it so that a user project only needs to load the rulesets they need. ("load" here meaning to evaluate those .bzl files, creating targets for the binding targets, etc.)

But there's a deeper problem -- the fact that this module depends on rules_{cc,java,go} at all already means that we're pulling down more than we need. ("pull down" here meaning to actually fetch the ruleset modules -- which happens because of toolchain registration.) Ideally, this module contains only protos, and users of language-specific bindings can somehow only use the languages they need.

I've only just started looking into this area, so I don't have a concrete suggestion as to how that should happen. A naive idea is to have a sub-Bazel-module for each language binding -- so remoteapis as the base module that contains only protos, and then remoteapis-java for Java bindings, remoteapis-go for Go bindings, etc. That's clean but may be too much maintenance burden. Another setup I'm considering (assuming that the binding generation is just calling a binary) is to generate bindings on the fly using module extensions and repo rules.

I'm talking to the googleapis team (https://github.com/googleapis/googleapis) whose project setup has largely the same problem (and googleapis is in fact a dependency of remote-apis). We have a meeting next Monday, so I'll report back after that and coordinate some sort of effort to fix these together.

@minor-fixes
Copy link
Author

Thanks, I'm excited to see it land! While working on this, I ran into some fundamental proto+bazel ecosystem questions that maybe you can help clarify:

  • Should binding generation always happen in the module with the protos, or always happen in a dependent module? (I thought I ran into an assertion in generating protobuf bindings from a separate module at one point)
    • If always done from the same module with protos - how do we avoid maintainers of proto modules having to add targets for languages that they might not care about (but downstream users do)?
    • If always done from a different module - how are ODR-type issues resolved, if two different targets exist in the dependency graph that generate bindings from the same proto?
  • Is a single repo with multiple modules a valid option as well?

@moroten
Copy link
Contributor

moroten commented Jul 16, 2024

If there are no extra arguments needed, an aspect on top of the proto_library should be able to define the generating rules for each language. The problem comes when certain annotations are needed, where should they be defined?

@Wyverald
Copy link
Member

Should binding generation always happen in the module with the protos, or always happen in a dependent module?

In my mind, the ideal state would be that the binding generation happens in a separate module, but one that's shared by everyone using that language. So remote-apis has only protos, and remote-apis-java has Java bindings that all Java people use, remote-apis-py has Python bindings, etc. This avoids ODR violations.

Is a single repo with multiple modules a valid option as well?

Yes -- we can have subdirectories with their own MODULE.bazel files. Then we either build custom release archives, or have everyone use the same archive (containing the whole repo) and rely on strip_prefix.

If there are no extra arguments needed, an aspect on top of the proto_library should be able to define the generating rules for each language. The problem comes when certain annotations are needed, where should they be defined?

Could you give an example of what annotations you're referring to?

@moroten
Copy link
Contributor

moroten commented Jul 16, 2024

If there are no extra arguments needed, an aspect on top of the proto_library should be able to define the generating rules for each language. The problem comes when certain annotations are needed, where should they be defined?

Could you give an example of what annotations you're referring to?

I was thinking about for example importpath in go_proto_library. That is solved with individual modules for each language as you suggest.

@philwo
Copy link
Member

philwo commented Aug 13, 2024

I can't believe we didn't merge this or any of the other "bump version of dependency X" PRs for over half a year, so now I just wasted 2 days on making my own version of this, because I didn't see this great PR deep down in the open pull request list. 😿

Here's my take on it: #307

It's simpler in various ways and for what I know follows all current best practices / "agreed upon workarounds" for things like the go-genproto / googleapis / cloud.google.com migration situation, and it doesn't require any patches of external repos. A lot of this is probably just due to Bazel 7.x and the bzlmod ecosystem maturing more since then.

This PR adds tests / examples, which is really nice, though.

I don't mind which one gets merged, as long as we make progress on getting the ecosystem fixed by migrating more things to bzlmod and cleaning up technical debt.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants