Performance test is not working in Workload benchmark test #4298

pro-akim · 2023-07-07T16:47:33Z

Description

Performing: Release 4.6.0 - Pre-Alpha1 - Workload benchmarks metrics
The performance test is not functioning correctly in the Workload benchmark test.

Current behavior

When the test is triggered by the pipeline, the following issue occurs:

performance/test_cluster/test_cluster_performance/test_cluster_performance.py::test_cluster_performance FAILED

>           pytest.fail(f"Stats could not be retrieved, '{artifacts_path}' path may not exist, it is empty or it may not"
                        f" follow the proper structure.")
E           Failed: Stats could not be retrieved, '/mnt/efs/tmp/CLUSTER-Workload_benchmarks_metrics/B_263' path may not exist, it is empty or it may not follow the proper structure.

Expected behavior

The performance test should run smoothly without encountering any path-related problems.

The text was updated successfully, but these errors were encountered:

javiersanchz · 2023-12-11T17:44:27Z

UPDATE

I was testing the test with the same parameters as in the test (it seems that the problem comes from the way I search the path).
I'm still looking for the solution.

javiersanchz · 2023-12-18T17:36:34Z

UPDATE

I was looking at the .groovy on the test_cluster to see if the error could be coming from there.
I did some tests on the test and modifications

GGP1 · 2024-02-26T19:38:50Z

Reopening

Failed again in 4.8.0-beta2. Results: wazuh/wazuh#22126 (comment)

GGP1 · 2024-02-27T13:04:44Z

Closing

Was able to execute manually after parsing the cluster logs.

nico-stefani · 2024-07-24T16:23:31Z

Reopened due to the failure encountered in wazuh/wazuh#24894

workload-4.9.0-alpha3-artifacts.zip

I tried to execute it manually without success

python3 -m pytest test_cluster_performance.py --artifacts_path='/home/nstefani/Downloads/workload-4.9.0-alpha3-artifacts' --n_workers=25 --n_agents=50000 --html=report.html --self-contained-html
============================= test session starts ==============================
platform linux -- Python 3.9.16, pytest-7.1.2, pluggy-1.5.0
rootdir: /home/nstefani/git/wazuh-qa/tests, configfile: pytest.ini
plugins: html-3.1.1, metadata-3.1.1, testinfra-5.0.0
collected 1 item

test_cluster_performance.py F                                            [100%]

=================================== FAILURES ===================================
___________________________ test_cluster_performance ___________________________

artifacts_path = '/home/nstefani/Downloads/workload-4.9.0-alpha3-artifacts'
n_workers = 25, n_agents = 50000

    def test_cluster_performance(artifacts_path, n_workers, n_agents):
        """Check that a cluster environment did not exceed certain thresholds.
    
        This test obtains various statistics (mean, max, regression coefficient) from CSVs with
        data generated in a cluster environment (resources used and duration of tasks). These
        statistics are compared with thresholds established in the data folder.
    
        Args:
            artifacts_path (str): Path where CSVs with cluster information can be found.
            n_workers (int): Number of workers folders that are expected inside the artifacts path.
            n_agents (int): Number of agents in the cluster environment.
        """
        if None in (artifacts_path, n_workers, n_agents):
            pytest.fail("Parameters '--artifacts_path=<path> --n_workers=<n_workers> --n_agents=<n_agents>' are required.")
    
        # Check if there are threshold data for the specified number of workers and agents.
        selected_conf = f"{n_workers}w_{n_agents}a"
        if selected_conf not in configurations:
            pytest.fail(f"This is not a supported configuration: {selected_conf}. "
                        f"Supported configurations are: {', '.join(configurations.keys())}.")
    
        # Check if path exists and if expected number of workers matches what is found inside artifacts.
        try:
            cluster_info = ClusterEnvInfo(artifacts_path).get_all_info()
        except FileNotFoundError:
            pytest.fail(f"Path '{artifacts_path}' could not be found or it may not follow the proper structure.")
    
        if cluster_info.get('worker_nodes', 0) != int(n_workers):
            pytest.fail(f"Information of {n_workers} workers was expected inside the artifacts folder, but "
                        f"{cluster_info.get('worker_nodes', 0)} were found.")
    
        # Calculate stats from data inside artifacts path.
        data = {'tasks': ClusterCSVTasksParser(artifacts_path).get_stats(),
                'resources': ClusterCSVResourcesParser(artifacts_path).get_stats()}
    
        if not data['tasks'] or not data['resources']:
>           pytest.fail(f"Stats could not be retrieved, '{artifacts_path}' path may not exist, it is empty or it may not"
                        f" follow the proper structure.")
E           Failed: Stats could not be retrieved, '/home/nstefani/Downloads/workload-4.9.0-alpha3-artifacts' path may not exist, it is empty or it may not follow the proper structure.

test_cluster_performance.py:68: Failed
- generated html file: file:///home/nstefani/git/wazuh-qa/tests/performance/test_cluster/test_cluster_performance/report.html -
=========================== short test summary info ============================
FAILED test_cluster_performance.py::test_cluster_performance - Failed: Stats ...
============================== 1 failed in 0.57s ===============================

Rebits · 2024-08-06T10:07:03Z

@rafabailon it's necessary to review why no binary data was collected in build https://ci.wazuh.info/job/CLUSTER-Workload_benchmarks_metrics/590/console
This could be related to new changes included in the pipeline recently https://github.com/wazuh/wazuh-jenkins/pull/6608

rafabailon · 2024-08-06T11:34:21Z

Update

I've looked through the code and it seems that some files are missing. The error occurs when the user ClusterCSVResourcesParser is asked to use the following files: ['wazuh_clusterd', 'integrity_sync', 'wazuh_clusterd_child_1', 'wazuh_clusterd_child_2']. I've been able to verify that the only file that exists is 'integrity_sync'. It also asks to use the columns: ['USS(KB)', 'CPU(%)', 'FD']. These columns do not exist in the only file that exists.

The missing files are not referenced in the pipeline logs and there is no error in the artifacts indicating that something went wrong.

The changes in https://github.com/wazuh/wazuh-jenkins/pull/6608 should not affect this as the option is not checked in the pipeline execution.

I have launched the pipeline to continue the research: CLUSTER-Workload_benchmarks_metrics/604/

Note: The pipeline requires 5000 Agents and 25 Managers (too much for a test)

rafabailon · 2024-08-07T10:55:24Z

Update

The error is that before 4.9.0, the apid process was called wazuh-apid. Since 4.9.0, the process is called wazuh_apid. In the Jenkins pipeline, it is still listed as wazuh-apid. Since this process does not exist in 4.9.0, the monitoring script fails and does not generate the .csv files.

There are two possibilities to fix this error:

Add both options in the pipeline. In this case, you do not have to change the code. Care should be taken when launching the pipeline to use the correct value of apid depending on which version of Wazuh is used.
Validation in the code. In this case, I have created a PR with the necessary changes. When the monitoring script is to be executed, the Wazuh version is checked and the name of apid is changed based on this.

I have tested running the monitoring script locally to make sure this is the error. I have also run the pipeline with the changes in the code to verify that the necessary .csv files now appear in the artifacts.

rafabailon · 2024-08-08T06:13:49Z

Update

Before 4.9.0, the process name was wazuh-apid. Since 4.9.0-beta1, the name has been changed to wazuh_apid. However, in the Jenkins pipeline, the parameter is still wazuh-apid. Changing the parameter in Jenkins would not be a solution since, then, there would be problems when executing the pipeline for versions prior to 4.9.0. I have chosen to modify the name of the process in the pipeline code depending on which version of Wazuh is used.

Build: https://ci.wazuh.info/job/CLUSTER-Workload_benchmarks_metrics/615/
Artifacts: artifacts.zip

rafabailon · 2024-08-08T08:36:08Z

Update

I've made the suggested changes and created a new PR with the correct branch nomenclature

jseg380 · 2024-08-08T08:45:57Z

LGTM!

pro-akim added team/qa type/enhancement labels Jul 7, 2023

pro-akim mentioned this issue Jul 7, 2023

Release 4.6.0 - Pre-Alpha1 - Workload benchmarks metrics wazuh/wazuh#17716

Closed

1 task

pro-akim mentioned this issue Jul 25, 2023

Release 4.6.0 - Pre-Alpha 1 - Workload benchmarks metrics wazuh/wazuh#18017

Closed

1 task

davidjiglesias added type/bug level/task Task issue and removed team/qa type/enhancement labels Sep 4, 2023

fdalmaup mentioned this issue Sep 8, 2023

Release 4.6.0 - Alpha 1 - Workload benchmarks metrics wazuh/wazuh#18874

Closed

3 tasks

javiersanchz self-assigned this Dec 5, 2023

GGP1 mentioned this issue Dec 12, 2023

Release 4.8.0 - Alpha 1 - Workload benchmarks metrics wazuh/wazuh#20721

Closed

2 tasks

damarisg added the qa_known Issues that are already known by the QA team label Dec 13, 2023

javiersanchz mentioned this issue Dec 19, 2023

Fix test cluster performance #4780

Merged

EduLeon12 mentioned this issue Jan 11, 2024

Release 4.8.0 - Alpha 2 - Workload benchmarks metrics wazuh/wazuh#21407

Closed

1 task

Selutario closed this as completed Feb 8, 2024

GGP1 mentioned this issue Feb 26, 2024

Release 4.8.0 - Beta 2 - Workload benchmarks metrics wazuh/wazuh#22126

Closed

2 tasks

GGP1 reopened this Feb 26, 2024

GGP1 closed this as completed Feb 27, 2024

nico-stefani reopened this Jul 24, 2024

nico-stefani mentioned this issue Jul 24, 2024

Release 4.9.0 - Alpha 3 - Workload benchmarks metrics wazuh/wazuh#24894

Closed

2 tasks

This was referenced Jul 29, 2024

Fix cluster reliability test internal error #5620

Merged

Release 4.9.0 - Beta 1 - Workload benchmarks metrics wazuh/wazuh#25053

Closed

rafabailon self-assigned this Aug 6, 2024

rafabailon mentioned this issue Aug 6, 2024

Benchmarking tests: Dashboard Saturation Tests wazuh/wazuh#24684

Closed

2 tasks

juliamagan closed this as completed Aug 8, 2024

juliamagan removed the qa_known Issues that are already known by the QA team label Aug 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance test is not working in Workload benchmark test #4298

Performance test is not working in Workload benchmark test #4298

pro-akim commented Jul 7, 2023 •

edited

Loading

javiersanchz commented Dec 11, 2023

javiersanchz commented Dec 18, 2023

GGP1 commented Feb 26, 2024

GGP1 commented Feb 27, 2024

nico-stefani commented Jul 24, 2024 •

edited

Loading

Rebits commented Aug 6, 2024

rafabailon commented Aug 6, 2024 •

edited

Loading

rafabailon commented Aug 7, 2024

rafabailon commented Aug 8, 2024

rafabailon commented Aug 8, 2024

jseg380 commented Aug 8, 2024

Performance test is not working in Workload benchmark test #4298

Performance test is not working in Workload benchmark test #4298

Comments

pro-akim commented Jul 7, 2023 • edited Loading

Description

Current behavior

Expected behavior

javiersanchz commented Dec 11, 2023

javiersanchz commented Dec 18, 2023

UPDATE

GGP1 commented Feb 26, 2024

Reopening

GGP1 commented Feb 27, 2024

Closing

nico-stefani commented Jul 24, 2024 • edited Loading

Rebits commented Aug 6, 2024

rafabailon commented Aug 6, 2024 • edited Loading

Update

rafabailon commented Aug 7, 2024

Update

rafabailon commented Aug 8, 2024

Update

rafabailon commented Aug 8, 2024

Update

jseg380 commented Aug 8, 2024

pro-akim commented Jul 7, 2023 •

edited

Loading

nico-stefani commented Jul 24, 2024 •

edited

Loading

rafabailon commented Aug 6, 2024 •

edited

Loading