Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: adding logic to apply trustyai prometheus #173

Merged
merged 3 commits into from
Jan 28, 2024

Conversation

zdtsw
Copy link

@zdtsw zdtsw commented Jan 24, 2024

as part of https://issues.redhat.com/browse/RHOAIENG-99
we have rules defined in commit #161
but still need the logic in operator to apply such

also tuning these rules from original commit

new: (live build quay.io/wenzhou/rhods-operator-catalog:v2.6.173)
update tests:

Screenshot from 2024-01-24 13-48-44

old:
test on live build: can confirm config has been applied for trustyai
Screenshot from 2024-01-24 10-36-38

@zdtsw
Copy link
Author

zdtsw commented Jan 24, 2024

this will need backport to ODH

@zdtsw
Copy link
Author

zdtsw commented Jan 26, 2024

and when i killed trusty pod i got

Screenshot from 2024-01-26 19-05-27

@zdtsw
Copy link
Author

zdtsw commented Jan 26, 2024

we will need red-hat-data-services/trustyai-service-operator#13 to go together
red-hat-data-services/trustyai-service-operator#13 fixed the port
this one uses job instead of instance name which is crazy long and does not seems working: instance="trustyai-service-operator-controller-manager-metrics-service.redhat-ods-applications.svc:8443/metrics",

Screenshot from 2024-01-26 19-12-58

Signed-off-by: Wen Zhou <wenzhou@redhat.com>
Signed-off-by: Wen Zhou <wenzhou@redhat.com>
record rules

Signed-off-by: Wen Zhou <wenzhou@redhat.com>
@zdtsw
Copy link
Author

zdtsw commented Jan 27, 2024

to sum the last test result here:
get active 1/1 for trusty
Screenshot from 2024-01-27 18-01-44
get value for alert after i killed pod

Screenshot from 2024-01-27 18-06-38
then back to normal when pod is reconciled
Screenshot from 2024-01-27 18-08-53
Screenshot from 2024-01-27 18-09-02

@zdtsw zdtsw merged commit e172aed into red-hat-data-services:rhoai-2.6 Jan 28, 2024
4 checks passed
@zdtsw zdtsw deleted the trusty_GA branch January 28, 2024 13:02
@zdtsw
Copy link
Author

zdtsw commented Jan 28, 2024

Merge this into 2.6 first, if we have more problem on this topic, can fix before backport to main

zdtsw added a commit to zdtsw-forking/rhods-operator that referenced this pull request Feb 8, 2024
…#173)

* update(trustyai): adding logic to monitoring

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

* fix(trustyai): prometheus rules for probe

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

* update(trusty): prometheus to use job instead of instance name for
record rules

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

---------

Signed-off-by: Wen Zhou <wenzhou@redhat.com>
zdtsw added a commit to zdtsw-forking/rhods-operator that referenced this pull request Feb 8, 2024
…#173)

* update(trustyai): adding logic to monitoring

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

* fix(trustyai): prometheus rules for probe

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

* update(trusty): prometheus to use job instead of instance name for
record rules

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

---------

Signed-off-by: Wen Zhou <wenzhou@redhat.com>
VaishnaviHire added a commit that referenced this pull request Feb 9, 2024
VaishnaviHire added a commit that referenced this pull request Feb 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants