Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to stop load from web UI with 1.0.3 #1442

Closed
venkikonda opened this issue Jun 19, 2020 · 22 comments
Closed

Unable to stop load from web UI with 1.0.3 #1442

venkikonda opened this issue Jun 19, 2020 · 22 comments

Comments

@venkikonda
Copy link

venkikonda commented Jun 19, 2020

Hi Team ,

could you please help on below issue..

Description of issue:

Since upgrading to Locust 1.0.3 I often have difficulty stopping Locust from the web UI when running distributed. I click the Stop button, but the load doesn't actually stop. At best, it dips briefly but then resumes.

Expected behavior:
Load stops almost immediately.

Actual behavior:

It seems slaves stop their load and then rejoin the pool and then have load immediately sent to them again.

Steps to reproduce (for bug reports):

Run Locust distributed with multiple slaves
Begin a large load using the web UI
Wait several minutes after all users hatched
Attempt to stop load using the web UI

  • OS: ubuntu 16.0
  • Python version: 3.8
  • Locust version: 1.0.3
  • Locust command line that you ran:
  • Locust file contents (anonymized if necessary):

import random
from locust import HttpUser, task, SequentialTaskSet, between, tag

class IntranetTasks(SequentialTaskSet):
@tag('login')
@task
def on_start(self):
self.client.post("login.action?login=true", {"username":"***", "password":"})

@tag('NewGuy')
@task
def testPage(self):
	self.client.get("display/00001/2019/07/01/Hello%2C+I%27m+the+New+Guy")

@tag('Display')
@task
def testBlogPost(self):
	self.client.get("display/00008")

@tag('HomePage')
@task
def testIndex(self):
	self.client.get("")
	
@tag('NewCFO')
@task
def testCoronaPage(self):
	self.client.get("pages/viewpage.action?pageId=98895875")

@tag('logout')
@task
def on_stop(self):
	self.client.post("logout.action?logout=true", {"username":"*******", "password":"****"})

class WebUser(HttpUser):
wait_time = between(5, 9)
tasks = {IntranetTasks:1}

@venkikonda venkikonda added the bug label Jun 19, 2020
@venkikonda
Copy link
Author

Hi Team,

Any body help on this issue?
It seems stopping feature available in 1.0.3 but still not able to stop test to reduce users to zero.

Let me know if need to modify on above mentioned my python script file.

@venkikonda
Copy link
Author

Hi Team,

Any body looking into it?

@cyberw
Copy link
Collaborator

cyberw commented Jun 23, 2020

Hi! I ran into a similar (but maybe unrelated issue). Can you try running the locust branch called dont-wait-for-greenlets-to-die-if-forcing ? (just check out that branch and do pip install -e . in the repo directory)

@cyberw
Copy link
Collaborator

cyberw commented Jun 23, 2020

Hmm... I dont think my attempt was very good. I'd look into it more but I'm super busy atm...

The issue I have (not sure it is the same one as this) was most easily reproduced by just calling self.environment.runner.quit() in a user. It hangs indefinitely on line 178 in users.py, trying to kill the greenlet.

Probably this error was introduced in the refactoring for 1.0.

@cyberw
Copy link
Collaborator

cyberw commented Jun 23, 2020

Any ideas @heyman ? I think my attempted fix would do more harm than good :)

@cyberw
Copy link
Collaborator

cyberw commented Jun 23, 2020

I made a better fix (#1448) for my issue (unfortunately I think it is unrelated to this)

@cyberw
Copy link
Collaborator

cyberw commented Jun 30, 2020

Perhaps the issue is related to the use of tags? (a relatively new feature)

Do you still get the problem if you dont use tags?

@Trouler
Copy link

Trouler commented Jul 3, 2020

I am seeing the same problem, running Locust in distributed in AWS with EC2s.
Edit: I am NOT using tags.

@cyberw
Copy link
Collaborator

cyberw commented Jul 3, 2020

We need to narrow it down.

Have you tried a super-basic testplan? Are you also using SequentialTaskSet?

@Trouler
Copy link

Trouler commented Jul 3, 2020

No, I'm using normal TaskSet.
As I'm currently swamped at work, and with a major release coming up, I'm afraid I might not be able to do any specific testing as of now. I'll answer any questions you might have though.

@venkikonda
Copy link
Author

Hello Team ,

Any update on stopping feature issue in 1.0.3?

@cyberw
Copy link
Collaborator

cyberw commented Jul 16, 2020

As I said, we need to narrow it down. It works for me (at least with a basic test plan) so someone who can actually reproduce the issue needs to figure out when it does & doesnt work...

@Trouler
Copy link

Trouler commented Jul 21, 2020

Alright. I just got some time to test this out.

For me, it seems to be tied to using a sleep timer in my on_stop() in my TaskSet.
I'm using a normal time.sleep() here, and for some reason, the master seems to wait for a worker to report back here before telling next worker to shut down.

So shutting down my users are just taking longer than expected. I expect them all to stop what they're doing and enter their on_stop right away, effectively stopping their tasks (which does not seem to be the case). This behaviour might actually be expected(?) as I recently added the sleep timer. I've been swamped lately, so it's hard to remember when exactly I started seeing this issue.

It might just have been me adding this sleep and then noticing the issue. Can't speak for venkikonda though.

This is happening in a local normal run as well, so not only distributed.

The reason I'm running a sleep is because my users are sharing a global variable tied to the worker, and I do not want this to reset with the first worker going down if other workers are in the middle of a task.

@cyberw
Copy link
Collaborator

cyberw commented Jul 21, 2020

Interesting @Trouler . Can you share the most bsaic plan that reproduces the issue and show your locust command line? I’m particularly interested in if you are using —stop-timeout.

@Trouler
Copy link

Trouler commented Jul 21, 2020

There's probably a lot of unnecessary code in here. But I didn't want to kill it more than I had to.
I'm using the events to easier see the issue in the web UI, as I can see them being fired after having pressed stop.

import time
from locust import TaskSet, task, User, between, events

class MyCustomBehaviour(TaskSet):
    def on_stop(self):
        time.sleep(0.5)

    @task(1)
    def test_some_task(self):
        try:
            assert 1 == 1
            self.user.fire_success(
            request_type="Test",
            name="OnlyTest",
            response_time=0,
            response_length=0,
            )
        except:
            self.user.fire_failure(
            request_type="Test",
            name="OnlyTest",
            response_time=0,
            exception="RandomException",
            response_length=0,
            )


class MyCustomUser(User):
    #This is the abstract User class which should be subclassed.
    abstract = True

    def __init__(self, *args, **kwargs):
        super(MyCustomUser, self).__init__(*args, **kwargs)

    def fire_success(self, request_type, name, response_time, response_length):
        self.environment.events.request_success.fire(
            request_type=request_type,
            name=name,
            response_time=response_time,
            response_length=response_length
        )

    def fire_failure(self, request_type, name, response_time, exception, response_length):
        self.environment.events.request_failure.fire(
            request_type=request_type,
            name=name,
            response_time=response_time,
            exception=exception,
            response_length=response_length
        )


class TestUser(MyCustomUser):
    tasks = [MyCustomBehaviour]
    host = "http://localhost:8089/"
    wait_time = between(1, 5)

I am not using --stop-timeout. Testing locally I was not passing any extra arguments, I only ran locust -f filepath/file.py.
In my distributed tests I am only using the extra --host localhost --master for my master and --worker --master-host=MASTER_IP for my workers.

@cyberw
Copy link
Collaborator

cyberw commented Aug 29, 2020

hi! sorry for the lack of response. is this still an issue in latest version? (1.2.3)

That test plan is very "special". Can you reproduce it using a more normal one, based on for example HttpUser?

I'm guessing it does work for you in headless? (as stopping from web ui is what this ticket is about)

@Trouler
Copy link

Trouler commented Sep 8, 2020

Hi again!

Just came back from vacation. Our service was released during my vacation and as of now we are not doing any kind of load testing.
I did some quick tests to see if I could reproduce the issue, and the small testing I did seemed to work just fine, with the workers stopping as expected.

I can't answer you if you've fixed it or not, as I would have to test more, and we don't want to load test in any of our environments atm.

@cyberw
Copy link
Collaborator

cyberw commented Sep 20, 2020

I did see one strange thing in the original locust file: the @task tagging of on_stop. I added a warning for this in 9c8c73c

Not sure it relates to the issue, but we can hope :)

Closing this due to lack of activity.

@cyberw cyberw closed this as completed Sep 20, 2020
@cyberw cyberw added the invalid label Sep 20, 2020
@james4388
Copy link

@cyberw I still got this issue on 1.3.1 I use SequentialTaskSet if that help

@cyberw
Copy link
Collaborator

cyberw commented Oct 26, 2020

Interesting. Can you share your locustfile and command line?

@mbuotidem
Copy link

I just ran into this on 1.4.1. It's probably an issue with the code but on the off chance that it isn't, here is a locust file. Command line is locust --host=endpoint-ats.iot.us-west-2.amazonaws.com -f locustfile.py. I ran this with 1 user and hatch rate of 1. Clicking stop on the web ui did not stop it, I had manually do a keyboard interrupt with Ctrl + C.

@cyberw
Copy link
Collaborator

cyberw commented Nov 18, 2020

@mbuotidem In your case it may be related to mqtt. It may not be gevent-friendly (thus blocking the entire python process, so it cannot be stopped). Have you ever seen this issue when testing http?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants