Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test: BatcherImplTest.testThrottlingBlocking to check actual wait time #2285

Merged
merged 9 commits into from
Dec 6, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions gax-java/gax/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -93,6 +93,14 @@
</excludes>
</configuration>
</plugin>
<plugin>
<!-- Troubleshooting a flaky test in https://github.com/googleapis/sdk-platform-java/issues/1931 -->
Copy link
Member Author

@suztomo suztomo Dec 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I keep this just in case the problem occurs again and we have the logging. This surefire (unit test) configuration does not affect the released libraries to users.

<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-surefire-plugin</artifactId>
<configuration>
<argLine>-Djava.util.logging.SimpleFormatter.format="%1$tY %1$tl:%1$tM:%1$tS.%1$tL %2$s %4$s: %5$s%6$s%n"</argLine>
</configuration>
</plugin>
</plugins>
</build>
</project>
Original file line number Diff line number Diff line change
Expand Up @@ -85,6 +85,8 @@
@RunWith(JUnit4.class)
public class BatcherImplTest {

private static final Logger logger = Logger.getLogger(BatcherImplTest.class.getName());

private static final ScheduledExecutorService EXECUTOR =
Executors.newSingleThreadScheduledExecutor();

Expand Down Expand Up @@ -871,6 +873,7 @@ public void testThrottlingBlocking() throws Exception {
public void run() {
batcherAddThreadHolder.add(Thread.currentThread());
batcher.add(1);
logger.info("Called batcher.add(1)");
}
});

Expand All @@ -885,20 +888,38 @@ public void run() {
} while (batcherAddThreadHolder.isEmpty()
|| batcherAddThreadHolder.get(0).getState() != Thread.State.WAITING);

long beforeGetCall = System.currentTimeMillis();
executor.submit(
() -> {
try {
Thread.sleep(throttledTime);
logger.info("Calling flowController.release");
blakeli0 marked this conversation as resolved.
Show resolved Hide resolved
flowController.release(1, 1);
logger.info("Called flowController.release");
} catch (InterruptedException e) {
}
});

try {
logger.info("Calling future.get(10 ms)");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not too familiar with Batcher or this test, but what is the significance of 10ms here? If a TimeoutException is expected, why can't we lower the future.get() block time?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the log (in the pull request description), the problem is that the TimeoutException isn't always thrown. Lowering the 10 ms to smaller number won't help.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. I read the original comment might not throw TimeoutException in a timely manner. which made me think that this was it wasn't thrown fast enough because .get(...) mau return in sub 10ms.

For my understanding, do you know what causes this: thread calling "future.get" was not being run in a timely manner. Is this just potentially a scheduling issue on a machine that has a single CPU/ few CPUs?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this just potentially a scheduling issue on a machine that has a single CPU/ few CPUs?

Yes.

Copy link
Contributor

@lqiu96 lqiu96 Dec 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, from my understanding it would be either case:

  • future.get() takes more than 10ms to return -> TimeoutException
  • future.get() returns a result -> Check that the actual waitTime was at least 10ms

I think that should suffice due to the variance in scheduling. CC: @blakeli0 Do you know of another option to handle this flakiness?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

future.get() returns a result within 10ms

No, I didn't observe that. What I observed in #1931 was "future.get() took more than 10ms to return normally, without TimeoutException".

Copy link
Contributor

@lqiu96 lqiu96 Dec 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, updated the comment above. The docs for Future.get() seem to suggest that it will throw the exception once 10ms passes, but I'm guessing that is just best effort for limited CPU.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's correct.

future.get(10, TimeUnit.MILLISECONDS);
assertWithMessage("adding elements to batcher should be blocked by FlowControlled").fail();
long afterGetCall = System.currentTimeMillis();
long actualWaitTimeMs = afterGetCall - beforeGetCall;

logger.info("future.get(10 ms) unexpectedly returned. Wait time: " + actualWaitTimeMs);
// In a flaky test troubleshooting
// (https://github.com/googleapis/sdk-platform-java/issues/1931), we observed that
// "future.get" method did not throw TimeoutException in this multithreaded test.
// It's because the thread calling "future.get" was not being run (i.e. in the wait queue of
// CPUs) in a timely manner.
// To avoid the flakiness, as long as the "future.get" does not return before the specified
// timeout, this assertion is considered as good.
assertWithMessage("adding elements to batcher should be blocked by FlowControlled")
.that(actualWaitTimeMs)
.isAtLeast(10);
} catch (TimeoutException e) {
// expected
logger.info("future.get(10 ms) timed out expectedly.");
}

try {
Expand Down
Loading