Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TF operator UI not showing jobs #836

Closed
jamborta opened this issue Oct 3, 2018 · 7 comments
Closed

TF operator UI not showing jobs #836

jamborta opened this issue Oct 3, 2018 · 7 comments

Comments

@jamborta
Copy link

jamborta commented Oct 3, 2018

I am using kubeflow v0.3.0-rc3. I create a TF job but unable to see it in the UI (going through ambassador):

screen shot 2018-10-03 at 17 45 07

It seems that the namespace is set to test, and the tf job I created is in kubeflow, but I am unable to change the namespace in the UI.

@jlewi
Copy link
Contributor

jlewi commented Oct 5, 2018

Which namespace is set to test?

Previous issue with namespaces: kubeflow/kubeflow#1397 and #754

@jlewi
Copy link
Contributor

jlewi commented Oct 5, 2018

Developer console show's me 500's when it fetches
https://gh-demo-1003.endpoints.kubecon-gh-demo-1.cloud.goog/tfjobs/api/tfjob/

@jlewi
Copy link
Contributor

jlewi commented Oct 5, 2018

Cluster role looks like its correct

~/git_kubeflow-website$ kubectl get clusterrole -o yaml tf-job-dashboard
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  annotations:
    ksonnet.io/managed: '{"pristine":"H4sIAAAAAAAA/4ySP2/jMAzF9/sYHA+Og2wHrzfcfkOXIgNl0YkSWRRIyv0T5LsXctMWiBEkky0+4qf3SJ0Ac3gi0cAJOhCHfYvF9izhHS1wao9/tA28njaODDfQwDEkDx38jUWN5D9HggZGMvRoCN0JIjqKWv8wZ+jAhtWB3cqj7h2jeDg3kHCkG5KUSArd82ztn3DJ9QRGSVmGyC8ty676KI6+j9sGhJSL9PTZPRzYaS1PJG4u/YbtubmCYg70WsmBk16iLlh9UePxq+RpCCnU0dzHq7Hgjm6BL3IfUZXu0xxav19AHsq5zMRpCLsRs0IDmX39KMkUqt4AJZ85JJvV+jrUKNnEsYzVbhjnpokuHex1Hbkupe5VM/YPxME8X/4z/4VJTzny2zjfck3bnn99AAAA//8BAAD//6XLUD+8AgAA"}'
  creationTimestamp: 2018-10-04T05:18:39Z
  labels:
    app: tf-job-dashboard
    app.kubernetes.io/deploy-manager: ksonnet
  name: tf-job-dashboard
  resourceVersion: "2711"
  selfLink: /apis/rbac.authorization.k8s.io/v1/clusterroles/tf-job-dashboard
  uid: f3cc43ef-c794-11e8-93e6-42010a8e0003
rules:
- apiGroups:
  - tensorflow.org
  - kubeflow.org
  resources:
  - tfjobs
  verbs:
  - '*'
- apiGroups:
  - apiextensions.k8s.io
  resources:
  - customresourcedefinitions
  verbs:
  - '*'
- apiGroups:
  - storage.k8s.io
  resources:
  - storageclasses
  verbs:
  - '*'
- apiGroups:
  - batch
  resources:
  - jobs
  verbs:
  - '*'
- apiGroups:
  - ""
  resources:
  - configmaps
  - pods
  - services
  - endpoints
  - persistentvolumeclaims
  - events
  - pods/log
  - namespaces
  verbs:
  - '*'
- apiGroups:
  - apps
  - extensions
  resources:
  - deployments
  verbs:
  - '*'

@jlewi
Copy link
Contributor

jlewi commented Oct 5, 2018

Dashboard logs show the following error

ERROR: logging before flag.Parse: W1005 05:26:22.181074       1 api_handler.go:127] failed to list TFJobs under all namespace(s): tfjobs.kubeflow.org is forbidden: User "system:serviceaccount:kubeflow:tf-job-dashboard" cannot list tfjobs.kubeflow.org at the cluster scope: Unknown user "system:serviceaccount:kubeflow:tf-job-dashboard"

@jlewi
Copy link
Contributor

jlewi commented Oct 5, 2018

Looks like a bug in the clusterrolebinding for TFJob dashboard

kubectl get clusterrolebinding -o yaml tf-job-dashboard
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  annotations:
    ksonnet.io/managed: '{"pristine":"H4sIAAAAAAAA/3yOsU7DQBBEez5jascoHXIHFPRBokEUe+c12fhye9rbCxKW/x1FQAVJN9LT07wFVOSFrYpmDLBAsafmezX5JBfN/XxXe9Hb0zaw0xYdZskjBjymVp1tp4kfJI+S39HhyE4jOWFYkChwqudFpWCAT5uDhs1IdR+UbMTaIdORLyDTxDuevn15Mm3lWuC/Ybj6UFs4cPSK4XX5lZ/ZThL5PkZt2f/4WtjI1X5ALRTPdG6Bp6QfWN/Wmy8AAAD//wEAAP//lKVwVVMBAAA="}'
  creationTimestamp: 2018-10-04T05:18:39Z
  labels:
    app: tf-job-dashboard
    app.kubernetes.io/deploy-manager: ksonnet
  name: tf-job-dashboard
  resourceVersion: "2716"
  selfLink: /apis/rbac.authorization.k8s.io/v1/clusterrolebindings/tf-job-dashboard
  uid: f3fdf76e-c794-11e8-93e6-42010a8e0003
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: tf-job-dashboard
subjects:
- kind: ServiceAccount
  name: tf-job-operator
  namespace: kubeflow

We're binding the wrong service account.

jlewi added a commit to jlewi/kubeflow that referenced this issue Oct 5, 2018
* The UI isn't able to list jobs in all namespaces because the
  cluster role binding isn't using the correct role

Related to kubeflow/training-operator#836
jlewi added a commit to jlewi/kubeflow that referenced this issue Oct 5, 2018
* The UI isn't able to list jobs in all namespaces because the
  cluster role binding isn't using the correct role

Related to kubeflow/training-operator#836
k8s-ci-robot pushed a commit to kubeflow/kubeflow that referenced this issue Oct 7, 2018
* Fix the TFJobs Dashboard UI

* The UI isn't able to list jobs in all namespaces because the
  cluster role binding isn't using the correct role

Related to kubeflow/training-operator#836

* Fix test.
leoncamel pushed a commit to leoncamel/kubeflow that referenced this issue Oct 14, 2018
* Fix the TFJobs Dashboard UI

* The UI isn't able to list jobs in all namespaces because the
  cluster role binding isn't using the correct role

Related to kubeflow/training-operator#836

* Fix test.
jlewi added a commit to jlewi/kubeflow that referenced this issue Oct 29, 2018
* The UI isn't able to list jobs in all namespaces because the
  cluster role binding isn't using the correct role

Related to kubeflow/training-operator#836
k8s-ci-robot pushed a commit to kubeflow/kubeflow that referenced this issue Nov 1, 2018
* Fix the TFJobs Dashboard UI

* The UI isn't able to list jobs in all namespaces because the
  cluster role binding isn't using the correct role

Related to kubeflow/training-operator#836

* Fix test.
@gaocegege
Copy link
Member

Is this issue fixed in kubeflow/kubeflow#1717?

Could we close it?

@GitHub007Dra
Copy link

GitHub007Dra commented May 13, 2019

i have the same problem,pls help ..
image

saffaalvi pushed a commit to StatCan/kubeflow that referenced this issue Feb 11, 2021
* Fix the TFJobs Dashboard UI

* The UI isn't able to list jobs in all namespaces because the
  cluster role binding isn't using the correct role

Related to kubeflow/training-operator#836

* Fix test.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants