Locust slaves eat all available memory when working with a failing service #816

lu4 · 2018-06-07T15:50:47Z

I'm using locust in a master / slave scenario, one master and ten slaves running on 3 machines (one machine for master and two machines for 5 + 5 slaves). This setup produces a load of ~ 100 reqs per second. When the tested service is turned off slave machines start to consume memory very rapidly. I suspect that the issue is with failures being logged into memory without ever releasing it.

cgoldberg · 2018-06-07T15:53:53Z

please fill out the fields in the issue template.

AnotherDevBoy · 2018-06-07T16:38:00Z

I experienced the same issue as well. In my case it was worse, because my orchestrator (I run locust in a cloud environment) kills instances when they hit RAM or HD limits.

This bug makes impossible to perform a long running soak test.

lu4 · 2018-06-08T06:00:22Z

Sorry, here it is:

Description of issue / feature request

LocustIO eats up all memory when it works over a failing service.

Expected behavior

Locustio should be tolerant to performing long scans for failing services

Actual behavior

LocustIO gobbles up all available memory rapidly when target service returns non 200 response.

Environment settings (for bug reports)

OS: Ubuntu 16.04.4 LTS (GNU/Linux 4.4.0-1060-aws x86_64)
Python version: 2.7.12
Locust version: Locust 0.8.1

Steps to reproduce (for bug reports)

Use locustio in master / slave setup (One master and ten slaves).
Start scanning a working service
Make your service return non ok 200 response (500 for example).
Observe memory state

alternative steps:

Use locustio in master / slave setup (One master and ten slaves).
Start scanning a working service
Turn off your service
Observe memory state

I'm not sure that it would be helpful but here is my locustfile.py, it won't run without my server, but I hope it would help to get the point:

import os
import json
import time

from locust import HttpLocust, TaskSet, task

application_json = {'content-type': 'application/json'}

base_url = os.environ.get('HOST', 'http://127.0.0.1:7000')
machine_id = os.environ.get('MACHINE_ID', '00000000-0000-0000-0000-000000000001')
retryTimeout = float(os.environ.get('RETRY_TIMEOUT', '10'))

print(machine_id)

class UserBehavior(TaskSet):
    jobs = []
    results = []

    jobTypes = None
    crawlers = None

    request_params = None
    request_headers = None

    def on_start(self):
        self.init()

    def init(self):
        job_types_url = base_url + "/api/job-types"

        response = self.client.get(job_types_url, name=job_types_url)

        while response.status_code != 200:
            time.sleep(10)

            response = self.client.get(job_types_url, name=job_types_url)

        self.jobTypes = json.loads(response.content)

        self.request_params = {}
        self.request_headers = {}

        for jobType in self.jobTypes:
            request_params = ''
            request_headers = {}

            credentials = None

            for crawler in jobType['crawlers']:
                if crawler['name'] == machine_id:
                    credentials = crawler

            if not (credentials is None):
                for param in credentials ['params']:
                    request_params += param
                for header in credentials ['headers']:
                    request_headers[header['name']] = header['value']

            self.request_params[jobType['name']] = request_params
            self.request_headers[jobType['name']] = request_headers

    @task(1)          # pick one of the pages randomly
    def work(self):
        while len(self.jobs) < 1:
            response = self.client.get(base_url + '/api/jobs')

            while response is None or response.content is None:
                time.sleep(retryTimeout)

                response = self.client.get(base_url + '/api/jobs')

            try:
                self.jobs = json.loads(response.content)
            except:
                self.jobs = []

            if len(self.jobs) < 1:
                time.sleep(retryTimeout)

        if 50 < len(self.results):
            response = self.client.post(base_url + '/api/jobs', headers = application_json, data = json.dumps(self.results))

            while response is None or response.content is None:
                time.sleep(retryTimeout)

                response = self.client.post(
                    base_url + '/api/jobs',
                    headers=application_json,
                    data=json.dumps(self.results)
                )

            self.results = []

        job = self.jobs.pop()
        job_type = self.jobTypes[job['type']]

        response = self.client.get(job['url'] + self.request_params[job_type['name']], name=job_type['name'], headers=self.request_headers[job_type['name']])

        if response.content is None:
            self.init()
        else:
            try:
                self.results.append({'job': job, 'json': json.loads(response.content)})
            except:
                self.results.append({'job': job, 'json': response.content})



class WebsiteUser(HttpLocust):
    task_set = UserBehavior
    min_wait = 64
    max_wait = 256

lu4 · 2018-06-08T06:14:51Z

Forgot to mention you Cory, @cgoldberg

Thank you in advance!

cgoldberg · 2018-06-08T13:45:19Z

self.results.append

hmm.. I think the memory usage you are seeing is because you are appending the response from every request to your results list. In that case I'd expect it to continuously eat memory... and the problem becomes worse when the server is not responding, because requests are generating errors very quickly.

If you don't store all results in your list, does the problem go away? If it does, lets close this issue. If it doesn't, please try to reproduce this with a minimal locustfile that doesn't store responses in each instance.

lu4 · 2018-06-11T15:52:47Z

@cgoldberg but before I make a request I check if number of results is greater than 50 and if so, then self.results are sent to server and cleared... So there is no way that self.results can be bigger than 50 and no way that it can eat the memory up because the variable is always cleared...

cgoldberg · 2018-06-11T17:56:34Z

If you create a much simpler reproduction example, I can look some more.. otherwise, I have no idea.

cgoldberg closed this as completed Jun 17, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Locust slaves eat all available memory when working with a failing service #816

Locust slaves eat all available memory when working with a failing service #816

lu4 commented Jun 7, 2018

cgoldberg commented Jun 7, 2018

AnotherDevBoy commented Jun 7, 2018

lu4 commented Jun 8, 2018 •

edited

Loading

lu4 commented Jun 8, 2018

cgoldberg commented Jun 8, 2018

lu4 commented Jun 11, 2018

cgoldberg commented Jun 11, 2018

Locust slaves eat all available memory when working with a failing service #816

Locust slaves eat all available memory when working with a failing service #816

Comments

lu4 commented Jun 7, 2018

cgoldberg commented Jun 7, 2018

AnotherDevBoy commented Jun 7, 2018

lu4 commented Jun 8, 2018 • edited Loading

Description of issue / feature request

Expected behavior

Actual behavior

Environment settings (for bug reports)

Steps to reproduce (for bug reports)

lu4 commented Jun 8, 2018

cgoldberg commented Jun 8, 2018

lu4 commented Jun 11, 2018

cgoldberg commented Jun 11, 2018

lu4 commented Jun 8, 2018 •

edited

Loading