Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect behavior in k8s.bash? #73

Open
m10k opened this issue Nov 25, 2022 · 3 comments
Open

Incorrect behavior in k8s.bash? #73

m10k opened this issue Nov 25, 2022 · 3 comments
Assignees
Labels
bug Something isn't working

Comments

@m10k
Copy link

m10k commented Nov 25, 2022

Describe The Bug

Hello everyone,

I am currently trying to set up peridot on a multi-node kubernetes cluster, but I'm stuck where the instructions say to execute hack/setup_base_internal_services.
The output of the command is something like the following.

[...]
parse error: Invalid literal at line 1, column 13
Error from server (NotFound): namespaces "registry-secret" not found
error: no objects passed to apply
Error from server (BadRequest): error when creating "hydra/deploy/public/003-deployment.yaml": Deployment in version "v1" cannot be handled as a Deployment: strict decoding error: unknown field "spec.template.spec.containers[0].ports[0].expose", unknown field "spec.template.spec.containers[0].ports[0].external", unknown field "spec.template.spec.containers[0].ports[1].expose", unknown field "spec.template.spec.containers[0].ports[1].external"
[...]

What caught my eye is that the parse error looks a lot like something jq or yq would print if they parse something that's not JSON or YAML, so I dug a bit deeper into the script. It seems that the output is coming from rules_resf/internal/k8s/k8s.bash, which in turn is executed by the first bazel command, bazel run --platforms @io_bazel_rules_go//go/toolchain:linux_"$ARCH" //hydra/deploy/public:public.apply.
The problematic pipe is the following.

COPY_TO_NS=$(echo "{$(cat ${i} | grep "namespace" | head -n 1)}" | jq -r '.namespace' | tr -d '\n')

The value of $i is the path of one of the four YAML files in bazel-bin/hydra/deploy/public, and I'm guessing the call is attempting to parse the namespace from the YAML files. Now, grepping for "namespace" in any of those files will likely return a line like

  namespace: "foobar"

which is not valid JSON, so the jq call could not possibly succeed.
I simplified the command and changed it to use yq instead, which seems to solve at least one of the problems (there should also be a cleaner solution that does not need grep).

COPY_TO_NS=$(grep -m 1 "namespace:" "$i" | yq -r '.namespace')

However, even with that line fixed, the script does not succeed because it cannot query a secret from kubectl. The problematic line is the following.

kubectl -n "registry-secret${STABLE_STAGE}" get secret registry -o json | jq ".metadata.namespace=\"${COPY_TO_NS}\"" | kubectl apply --force -f -

This command attempts to fetch the secret called registry from a namespace whose name starts with registry-secret. There is no such namespace in my cluster, and there is no secret called registry in any of the other namespaces either. I have a secret called mlbuild-secret in the default namespace. Maybe the script is supposed to query this secret instead? My username is mlbuild, and there is also a namespace called mlbuild-dev, so this would make sense.
On the other hand I can't rule out that the namespaces and secrets in my cluster haven't been set up correctly. Could anybody please shed some light on this?

Thank you!

Reproduction Steps

  1. Set up a kubernetes cluster
  2. Follow the installation instructions until the step where it says to execute hack/setup_base_internal_services

Expected Behavior

The script completes without errors.

Version and Build Information

HEAD is at 8222ab2f43a330bf200017f9f77205983f46de9c

Additional context

No response

@m10k m10k added the bug Something isn't working label Nov 25, 2022
@NeilHanlon
Copy link
Member

Hi @m10k - Thank you for the report. The setup process is a bit of a pain point right now, but we're working on porting in some changes we use on another project which allow for a single-command setup of the development environment. We're hoping to merge that change in the next couple of months.

However, for now, let's see if we can get your setup running. I think it is complaining that you don't have a secret for hydra. You can create one as follows:

kubectl -n "$USER-dev" create secret generic server --from-literal=hydra-secret="$(export LC_CTYPE=C; cat /dev/urandom | tr -dc 'a-zA-Z0-9' | fold -w 32 | head -n 1)" --from-literal=byc-secret="$(export LC_CTYPE=C; cat /dev/urandom | tr -dc 'a-zA-Z0-9' | fold -w 32 | head -n 1)"

@m10k
Copy link
Author

m10k commented Dec 2, 2022

Hey @NeilHanlon, thank you for your response!

I tried running the command that you posted, but unfortunately setup_base_internal_services still fails with the same error.

I noticed that I already have a secret for hydra in my mlbuild-dev namespace, though.
To be honest, I don't quite understand what the script does, but I got the feeling that it is moving secrets from one namespace to another. Is it necessary to copy this secret to the default namespace?

I have the following secrets in mlbuild-dev

mlbuild@k8s:~/peridot$ kubectl -n mlbuild-dev get secrets
NAME     TYPE     DATA   AGE
env      Opaque   1      7d17h
hydra    Opaque   2      9d
server   Opaque   2      23h

And these are in the default namespace

mlbuild@k8s:~/peridot$ kubectl get secrets
NAME                               TYPE                                  DATA   AGE
hydra                              Opaque                                2      21h
minio                              Opaque                                3      9d
mlbuild-secret                     kubernetes.io/service-account-token   3      10d
postgres-postgresql                Opaque                                1      9d
sh.helm.release.v1.localstack.v1   helm.sh/release.v1                    1      9d
sh.helm.release.v1.localstack.v2   helm.sh/release.v1                    1      9d
sh.helm.release.v1.minio.v1        helm.sh/release.v1                    1      9d
sh.helm.release.v1.postgres.v1     helm.sh/release.v1                    1      9d
sh.helm.release.v1.temporal.v1     helm.sh/release.v1                    1      9d
temporal-default-store             Opaque                                1      9d
temporal-visibility-store          Opaque                                1      9d

Is there any other information I can provide that might help figure out what's going on?

@warthog9
Copy link

warthog9 commented Feb 7, 2023

I'll chime in that I'm hitting this as well, attempting to follow the instructions on working with docker-desktop, with latest top of tree peridot git. Running the command that was suggested in #73 (comment) and it's seemingly not getting picked up from the bazel public or deploy steps

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants