Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose Additional Realtime Ingestion Metrics #11685

Merged
merged 10 commits into from
Sep 27, 2023

Conversation

suddendust
Copy link
Contributor

@suddendust suddendust commented Sep 26, 2023

  • Expose invalidRealtimeRowsDropped to measure decoding errors in stream.
  • Expose incompleteRealtimeRowsConsumed to measure transformation errors when continueOnError has been set to true.
  • Expose rowsWithErrors to cover any error while processing a row (catch-all, covers both the cases above).

JConsole output:

Screenshot 2023-09-26 at 11 38 38 AM Screenshot 2023-09-26 at 11 40 31 AM Screenshot 2023-09-26 at 12 06 04 PM

Regex test:

public class RegexMatcher {
  public static void main(String[] args) {
    // Define the regex patterns
    String[] regexPatterns = {
        "pinot.server.([^\\.]*?)_(OFFLINE|REALTIME)\\-(.+)\\-(\\w+).(invalidRealtimeRowsDropped|incompleteRealtimeRowsConsumed|rowsWithErrors|realtimeRowsConsumed)"
    };

    // Input strings to test against the regex patterns
    String[] inputStrings = {
        "pinot.server.airlineStats_REALTIME-flights-realtime-8.invalidRealtimeRowsDropped",
        "pinot.server.airlineStats_OFFLINE-flights-realtime-8.invalidRealtimeRowsDropped",
        "pinot.server.githubEvents_REALTIME-githubEvents-0.incompleteRealtimeRowsConsumed",
        "pinot.server.githubEvents_OFFLINE-githubEvents-0.incompleteRealtimeRowsConsumed",
        "pinot.server.meetupRsvp_REALTIME-meetupRSVPEvents-1.rowsWithErrors",
        "pinot.server.meetupRsvp_OFFLINE-meetupRSVPEvents-1.rowsWithErrors",
        "pinot.server.meetupRsvp_REALTIME-meetupRSVPEvents-1.realtimeRowsConsumed",
        "pinot.server.meetupRsvp_OFFLINE-meetupRSVPEvents-1.realtimeRowsConsumed"
    };

    // Iterate over each regex pattern
    for (String regexPattern : regexPatterns) {
      // Compile the regex pattern into a Pattern object
      Pattern pattern = Pattern.compile(regexPattern);

      // Iterate over each input string
      for (String inputString : inputStrings) {
        // Create a Matcher object to match the pattern against the input string
        Matcher matcher = pattern.matcher(inputString);

        // Perform the matching and check if it matches
        if (matcher.matches()) {
          System.out.println("Input string '" + inputString + "' matches the regex pattern: " + regexPattern);
        } else {
          System.out.println("Input string '" + inputString + "' does not match the regex pattern: " + regexPattern);
        }
      }
    }
  }
}

Output:

Input string 'pinot.server.airlineStats_REALTIME-flights-realtime-8.invalidRealtimeRowsDropped' matches the regex pattern: pinot.server.([^\.]*?)_(OFFLINE|REALTIME)\-(.+)\-(\w+).(invalidRealtimeRowsDropped|incompleteRealtimeRowsConsumed|rowsWithErrors|realtimeRowsConsumed)
Input string 'pinot.server.airlineStats_OFFLINE-flights-realtime-8.invalidRealtimeRowsDropped' matches the regex pattern: pinot.server.([^\.]*?)_(OFFLINE|REALTIME)\-(.+)\-(\w+).(invalidRealtimeRowsDropped|incompleteRealtimeRowsConsumed|rowsWithErrors|realtimeRowsConsumed)
Input string 'pinot.server.githubEvents_REALTIME-githubEvents-0.incompleteRealtimeRowsConsumed' matches the regex pattern: pinot.server.([^\.]*?)_(OFFLINE|REALTIME)\-(.+)\-(\w+).(invalidRealtimeRowsDropped|incompleteRealtimeRowsConsumed|rowsWithErrors|realtimeRowsConsumed)
Input string 'pinot.server.githubEvents_OFFLINE-githubEvents-0.incompleteRealtimeRowsConsumed' matches the regex pattern: pinot.server.([^\.]*?)_(OFFLINE|REALTIME)\-(.+)\-(\w+).(invalidRealtimeRowsDropped|incompleteRealtimeRowsConsumed|rowsWithErrors|realtimeRowsConsumed)
Input string 'pinot.server.meetupRsvp_REALTIME-meetupRSVPEvents-1.rowsWithErrors' matches the regex pattern: pinot.server.([^\.]*?)_(OFFLINE|REALTIME)\-(.+)\-(\w+).(invalidRealtimeRowsDropped|incompleteRealtimeRowsConsumed|rowsWithErrors|realtimeRowsConsumed)
Input string 'pinot.server.meetupRsvp_OFFLINE-meetupRSVPEvents-1.rowsWithErrors' matches the regex pattern: pinot.server.([^\.]*?)_(OFFLINE|REALTIME)\-(.+)\-(\w+).(invalidRealtimeRowsDropped|incompleteRealtimeRowsConsumed|rowsWithErrors|realtimeRowsConsumed)
Input string 'pinot.server.meetupRsvp_REALTIME-meetupRSVPEvents-1.realtimeRowsConsumed' matches the regex pattern: pinot.server.([^\.]*?)_(OFFLINE|REALTIME)\-(.+)\-(\w+).(invalidRealtimeRowsDropped|incompleteRealtimeRowsConsumed|rowsWithErrors|realtimeRowsConsumed)
Input string 'pinot.server.meetupRsvp_OFFLINE-meetupRSVPEvents-1.realtimeRowsConsumed' matches the regex pattern: pinot.server.([^\.]*?)_(OFFLINE|REALTIME)\-(.+)\-(\w+).(invalidRealtimeRowsDropped|incompleteRealtimeRowsConsumed|rowsWithErrors|realtimeRowsConsumed)

Process finished with exit code 0

Screenshot 2023-09-27 at 6 48 14 PM Screenshot 2023-09-27 at 6 48 24 PM

@suddendust suddendust changed the title Expose metric for invalidRealtimeRowsDropped Expose Additional Realtime Ingestion Metrics Sep 26, 2023
Copy link
Contributor

@Jackie-Jiang Jackie-Jiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please refer to the definition for realtimeRowsConsumed, where we can properly extract the topic and partition

Out of the scope of this PR, just realize we have different format for Gauge (table name in the end), not sure if it is intentional. Ideally we should use the same format for Meter, Timer and Gauge.

@Jackie-Jiang
Copy link
Contributor

We may also consider updating the regex to match all meters following this format so that we don't need to keep updating it whenever adding a new meter

@suddendust
Copy link
Contributor Author

We may also consider updating the regex to match all meters following this format so that we don't need to keep updating it whenever adding a new meter

I can add that but not sure of the regex preference rules. A metric can match multiple regex in that case. Any idea around this?

@suddendust
Copy link
Contributor Author

Please refer to the definition for realtimeRowsConsumed, where we can properly extract the topic and partition

Done

table: "$1"
tableType: "$2"
topic: "$3"
partition: "$5"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The partition label seems incorrect here. When I run your code I get the following

Input string 'pinot.server.meetupRsvp_OFFLINE-meetupRSVPEvents-1.realtimeRowsConsumed' matches the regex pattern: pinot.server.([^\.]*?)_(OFFLINE|REALTIME)\-(.+)\-(\w+).(invalidRealtimeRowsDropped|incompleteRealtimeRowsConsumed|rowsWithErrors|realtimeRowsConsumed)
metricName: pinot.server.meetupRsvp_OFFLINE-meetupRSVPEvents-1.realtimeRowsConsumed
Group 0: pinot.server.meetupRsvp_OFFLINE-meetupRSVPEvents-1.realtimeRowsConsumed
Group 1: meetupRsvp
Group 2: OFFLINE
Group 3: meetupRSVPEvents
Group 4: 1
Group 5: realtimeRowsConsumed

So it should be $4 and not $5, right?
And metric name should contain $5 instead of $4

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed as discussed

Copy link
Contributor

@KKcorps KKcorps left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@Jackie-Jiang Jackie-Jiang merged commit b1cc3e3 into apache:master Sep 27, 2023
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants