Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance test is not working in Workload benchmark test #4298

Closed
pro-akim opened this issue Jul 7, 2023 · 11 comments · Fixed by #4780
Closed

Performance test is not working in Workload benchmark test #4298

pro-akim opened this issue Jul 7, 2023 · 11 comments · Fixed by #4780
Assignees
Labels

Comments

@pro-akim
Copy link
Member

pro-akim commented Jul 7, 2023

Description

Performing: Release 4.6.0 - Pre-Alpha1 - Workload benchmarks metrics
The performance test is not functioning correctly in the Workload benchmark test.

Current behavior

When the test is triggered by the pipeline, the following issue occurs:

performance/test_cluster/test_cluster_performance/test_cluster_performance.py::test_cluster_performance FAILED

>           pytest.fail(f"Stats could not be retrieved, '{artifacts_path}' path may not exist, it is empty or it may not"
                        f" follow the proper structure.")
E           Failed: Stats could not be retrieved, '/mnt/efs/tmp/CLUSTER-Workload_benchmarks_metrics/B_263' path may not exist, it is empty or it may not follow the proper structure.

Expected behavior

The performance test should run smoothly without encountering any path-related problems.

@javiersanchz
Copy link
Member

UPDATE

  • I was testing the test with the same parameters as in the test (it seems that the problem comes from the way I search the path).
  • I'm still looking for the solution.

@damarisg damarisg added the qa_known Issues that are already known by the QA team label Dec 13, 2023
@javiersanchz
Copy link
Member

UPDATE

  • I was looking at the .groovy on the test_cluster to see if the error could be coming from there.
  • I did some tests on the test and modifications

@GGP1
Copy link
Member

GGP1 commented Feb 26, 2024

Reopening

Failed again in 4.8.0-beta2. Results: wazuh/wazuh#22126 (comment)

@GGP1 GGP1 reopened this Feb 26, 2024
@GGP1
Copy link
Member

GGP1 commented Feb 27, 2024

Closing

Was able to execute manually after parsing the cluster logs.

@GGP1 GGP1 closed this as completed Feb 27, 2024
@nico-stefani
Copy link
Member

nico-stefani commented Jul 24, 2024

Reopened due to the failure encountered in wazuh/wazuh#24894

workload-4.9.0-alpha3-artifacts.zip

I tried to execute it manually without success

python3 -m pytest test_cluster_performance.py --artifacts_path='/home/nstefani/Downloads/workload-4.9.0-alpha3-artifacts' --n_workers=25 --n_agents=50000 --html=report.html --self-contained-html
============================= test session starts ==============================
platform linux -- Python 3.9.16, pytest-7.1.2, pluggy-1.5.0
rootdir: /home/nstefani/git/wazuh-qa/tests, configfile: pytest.ini
plugins: html-3.1.1, metadata-3.1.1, testinfra-5.0.0
collected 1 item

test_cluster_performance.py F                                            [100%]

=================================== FAILURES ===================================
___________________________ test_cluster_performance ___________________________

artifacts_path = '/home/nstefani/Downloads/workload-4.9.0-alpha3-artifacts'
n_workers = 25, n_agents = 50000

    def test_cluster_performance(artifacts_path, n_workers, n_agents):
        """Check that a cluster environment did not exceed certain thresholds.
    
        This test obtains various statistics (mean, max, regression coefficient) from CSVs with
        data generated in a cluster environment (resources used and duration of tasks). These
        statistics are compared with thresholds established in the data folder.
    
        Args:
            artifacts_path (str): Path where CSVs with cluster information can be found.
            n_workers (int): Number of workers folders that are expected inside the artifacts path.
            n_agents (int): Number of agents in the cluster environment.
        """
        if None in (artifacts_path, n_workers, n_agents):
            pytest.fail("Parameters '--artifacts_path=<path> --n_workers=<n_workers> --n_agents=<n_agents>' are required.")
    
        # Check if there are threshold data for the specified number of workers and agents.
        selected_conf = f"{n_workers}w_{n_agents}a"
        if selected_conf not in configurations:
            pytest.fail(f"This is not a supported configuration: {selected_conf}. "
                        f"Supported configurations are: {', '.join(configurations.keys())}.")
    
        # Check if path exists and if expected number of workers matches what is found inside artifacts.
        try:
            cluster_info = ClusterEnvInfo(artifacts_path).get_all_info()
        except FileNotFoundError:
            pytest.fail(f"Path '{artifacts_path}' could not be found or it may not follow the proper structure.")
    
        if cluster_info.get('worker_nodes', 0) != int(n_workers):
            pytest.fail(f"Information of {n_workers} workers was expected inside the artifacts folder, but "
                        f"{cluster_info.get('worker_nodes', 0)} were found.")
    
        # Calculate stats from data inside artifacts path.
        data = {'tasks': ClusterCSVTasksParser(artifacts_path).get_stats(),
                'resources': ClusterCSVResourcesParser(artifacts_path).get_stats()}
    
        if not data['tasks'] or not data['resources']:
>           pytest.fail(f"Stats could not be retrieved, '{artifacts_path}' path may not exist, it is empty or it may not"
                        f" follow the proper structure.")
E           Failed: Stats could not be retrieved, '/home/nstefani/Downloads/workload-4.9.0-alpha3-artifacts' path may not exist, it is empty or it may not follow the proper structure.

test_cluster_performance.py:68: Failed
- generated html file: file:///home/nstefani/git/wazuh-qa/tests/performance/test_cluster/test_cluster_performance/report.html -
=========================== short test summary info ============================
FAILED test_cluster_performance.py::test_cluster_performance - Failed: Stats ...
============================== 1 failed in 0.57s ===============================

@Rebits
Copy link
Member

Rebits commented Aug 6, 2024

@rafabailon it's necessary to review why no binary data was collected in build https://ci.wazuh.info/job/CLUSTER-Workload_benchmarks_metrics/590/console
This could be related to new changes included in the pipeline recently https://github.com/wazuh/wazuh-jenkins/pull/6608

@rafabailon
Copy link
Member

rafabailon commented Aug 6, 2024

Update

I've looked through the code and it seems that some files are missing. The error occurs when the user ClusterCSVResourcesParser is asked to use the following files: ['wazuh_clusterd', 'integrity_sync', 'wazuh_clusterd_child_1', 'wazuh_clusterd_child_2']. I've been able to verify that the only file that exists is 'integrity_sync'. It also asks to use the columns: ['USS(KB)', 'CPU(%)', 'FD']. These columns do not exist in the only file that exists.

The missing files are not referenced in the pipeline logs and there is no error in the artifacts indicating that something went wrong.

The changes in https://github.com/wazuh/wazuh-jenkins/pull/6608 should not affect this as the option is not checked in the pipeline execution.

I have launched the pipeline to continue the research: CLUSTER-Workload_benchmarks_metrics/604/

Note: The pipeline requires 5000 Agents and 25 Managers (too much for a test)

@rafabailon
Copy link
Member

Update

The error is that before 4.9.0, the apid process was called wazuh-apid. Since 4.9.0, the process is called wazuh_apid. In the Jenkins pipeline, it is still listed as wazuh-apid. Since this process does not exist in 4.9.0, the monitoring script fails and does not generate the .csv files.

There are two possibilities to fix this error:

  • Add both options in the pipeline. In this case, you do not have to change the code. Care should be taken when launching the pipeline to use the correct value of apid depending on which version of Wazuh is used.

  • Validation in the code. In this case, I have created a PR with the necessary changes. When the monitoring script is to be executed, the Wazuh version is checked and the name of apid is changed based on this.

I have tested running the monitoring script locally to make sure this is the error. I have also run the pipeline with the changes in the code to verify that the necessary .csv files now appear in the artifacts.

@rafabailon
Copy link
Member

Update

Before 4.9.0, the process name was wazuh-apid. Since 4.9.0-beta1, the name has been changed to wazuh_apid. However, in the Jenkins pipeline, the parameter is still wazuh-apid. Changing the parameter in Jenkins would not be a solution since, then, there would be problems when executing the pipeline for versions prior to 4.9.0. I have chosen to modify the name of the process in the pipeline code depending on which version of Wazuh is used.

Build: https://ci.wazuh.info/job/CLUSTER-Workload_benchmarks_metrics/615/
Artifacts: artifacts.zip

@rafabailon
Copy link
Member

Update

I've made the suggested changes and created a new PR with the correct branch nomenclature

@jseg380
Copy link
Member

jseg380 commented Aug 8, 2024

LGTM!

@juliamagan juliamagan removed the qa_known Issues that are already known by the QA team label Aug 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
No open projects
Status: Done
Status: Done
Development

Successfully merging a pull request may close this issue.