Fix test cluster performance #4780

javiersanchz · 2023-12-19T12:15:08Z

Related issue
#4298

Description

This PR closes Performance test is not working in Workload benchmark test #4298
The error thrown by the performance test in the Workload benchmark test has been resolved
The syntax of the file list to be loaded has been changed because these were modified and it wasn't finding them
The Workload benchmark metrics artifacts mentioned in the issue were used for verification (Release 4.6.0 - Pre-Alpha1 - Workload benchmarks metrics wazuh#17716).
The error that was being thrown:

performance/test_cluster/test_cluster_performance/test_cluster_performance.py::test_cluster_performance FAILED

>           pytest.fail(f"Stats could not be retrieved, '{artifacts_path}' path may not exist, it is empty or it may not"
                        f" follow the proper structure.")
E           Failed: Stats could not be retrieved, '/mnt/efs/tmp/CLUSTER-Workload_benchmarks_metrics/B_263' path may not exist, it is empty or it may not follow the proper structure.

Once the changes were applied:

Test_cluster_performance

(qa) wazuh@javier:~/Git/wazuh-qa/wazuh-qa/tests/performance/test_cluster$ python3 -m pytest test_cluster_performance --artifacts_path='/home/wazuh/Descargas/artifacts' --n_workers=25 --n_agents=50000 --html=report.html --self-contained-html
=========================================================================================== test session starts ============================================================================================
platform linux -- Python 3.9.16, pytest-7.1.2, pluggy-1.0.0
rootdir: /home/wazuh/Git/wazuh-qa/wazuh-qa/tests, configfile: pytest.ini
plugins: metadata-2.0.4, testinfra-5.0.0, html-3.1.1
collected 1 item                                                                                                                                                                                           

test_cluster_performance/test_cluster_performance.py F                                                                                                                                               [100%]

================================================================================================= FAILURES =================================================================================================
_________________________________________________________________________________________ test_cluster_performance _________________________________________________________________________________________

artifacts_path = '/home/wazuh/Descargas/artifacts', n_workers = 25, n_agents = 50000

    def test_cluster_performance(artifacts_path, n_workers, n_agents):
        """Check that a cluster environment did not exceed certain thresholds.
    
        This test obtains various statistics (mean, max, regression coefficient) from CSVs with
        data generated in a cluster environment (resources used and duration of tasks). These
        statistics are compared with thresholds established in the data folder.
    
        Args:
            artifacts_path (str): Path where CSVs with cluster information can be found.
            n_workers (int): Number of workers folders that are expected inside the artifacts path.
            n_agents (int): Number of agents in the cluster environment.
        """
        if None in (artifacts_path, n_workers, n_agents):
            pytest.fail("Parameters '--artifacts_path=<path> --n_workers=<n_workers> --n_agents=<n_agents>' are required.")
    
        # Check if there are threshold data for the specified number of workers and agents.
        selected_conf = f"{n_workers}w_{n_agents}a"
        if selected_conf not in configurations:
            pytest.fail(f"This is not a supported configuration: {selected_conf}. "
                        f"Supported configurations are: {', '.join(configurations.keys())}.")
    
        # Check if path exists and if expected number of workers matches what is found inside artifacts.
        try:
            cluster_info = ClusterEnvInfo(artifacts_path).get_all_info()
        except FileNotFoundError:
            pytest.fail(f"Path '{artifacts_path}' could not be found or it may not follow the proper structure.")
    
        if cluster_info.get('worker_nodes', 0) != int(n_workers):
            pytest.fail(f"Information of {n_workers} workers was expected inside the artifacts folder, but "
                        f"{cluster_info.get('worker_nodes', 0)} were found.")
    
        # Calculate stats from data inside artifacts path.
        data = {'tasks': ClusterCSVTasksParser(artifacts_path).get_stats(),
                'resources': ClusterCSVResourcesParser(artifacts_path).get_stats()}
    
        if not data['tasks'] or not data['resources']:
            pytest.fail(f"Stats could not be retrieved, '{artifacts_path}' path may not exist, it is empty or it may not"
                        f" follow the proper structure.")
    
        # Compare each stat with its threshold.
        for data_name, data_stats in data.items():
            for phase, files in data_stats.items():
                for file, columns in files.items():
                    for column, nodes in columns.items():
                        for node_type, stats in nodes.items():
                            for stat, value in stats.items():
                                th_value = configurations[selected_conf][data_name][phase][file][column][node_type][stat]
                                if value[1] >= th_value:
                                    exceeded_thresholds.append({'value': value[1], 'threshold': th_value, 'stat': stat,
                                                                'column': column, 'node': value[0], 'file': file,
                                                                'phase': phase})
    
        try:
>           assert not exceeded_thresholds, 'Some thresholds were exceeded:\n- ' + '\n- '.join(
                '{stat} {column} {value} >= {threshold} ({node}, {file}, {phase})'.format(**item) for item in
                exceeded_thresholds)
E               AssertionError: Some thresholds were exceeded:
E                 - reg_cof FD 0.023233327721228512 >= 0.02 (worker_8, wazuh-clusterd, setup_phase)
E                 - mean FD 117.15853658536585 >= 103.4 (master, wazuh-clusterd, setup_phase)
E                 - reg_cof FD 0.5827280708755685 >= 0.33 (worker_16, wazuh-clusterd, stable_phase)
E                 - mean FD 70.8 >= 59 (master, wazuh-clusterd, stable_phase)
E                 - max FD 120.0 >= 70.5 (master, wazuh-clusterd, stable_phase)
E               assert not [{'column': 'FD', 'file': 'wazuh-clusterd', 'node': 'worker_8', 'phase': 'setup_phase', ...}, {'column': 'FD', 'file':...ase': 'stable_phase', ...}, {'column': 'FD', 'file': 'wazuh-clusterd', 'node': 'master', 'phase': 'stable_phase', ...}]

test_cluster_performance/test_cluster_performance.py:85: AssertionError
------------------------------------------------------------------------------------------- Captured stdout call -------------------------------------------------------------------------------------------

Setup phase took 0:20:03s (2023/07/06 14:54:59 - 2023/07/06 15:15:02).
Stable phase took 0:04:52s (2023/07/06 15:15:02 - 2023/07/06 15:19:54).
------------------------------------------------- generated html file: file:///home/wazuh/Git/wazuh-qa/wazuh-qa/tests/performance/test_cluster/report.html -------------------------------------------------
========================================================================================= short test summary info ==========================================================================================
FAILED test_cluster_performance/test_cluster_performance.py::test_cluster_performance - AssertionError: Some thresholds were exceeded:
============================================================================================ 1 failed in 0.73s =============================================================================================

The failure obtained in the output now is expected because certain thresholds have been exceeded, as mentioned in th test description

GGP1

Even though the PR description contains a test that runs without issues, the path was passed manually and we cannot assert that the workload benchmark metrics pipeline is doing that correctly.

The error indicated that the artifacts_path was incorrect, something that wasn't changed here. So we should run the pipeline and validate that the script is executed as expected.

Also, please update the changelog and commit messages to comply with the check requirements.

GGP1

Even though there were no changes to the pipeline were the error occurred, the team decided that running the performance test manually was enough.

Please modify the commits so the comply with the convention.

Change path syntax artifacts_path should be fix: Change path syntax artifacts_path
Add changes to CHANGELOG should be docs: Add changes to CHANGELOG

GGP1

LGTM

Selutario

Looks good. However, the release type is patch in this issue so this should be pointing to 4.8.2 branch instead.

CHANGELOG.md

javiersanchz self-assigned this Dec 19, 2023

GGP1 self-requested a review December 19, 2023 14:46

GGP1 requested changes Dec 19, 2023

View reviewed changes

GGP1 requested changes Dec 21, 2023

View reviewed changes

javiersanchz force-pushed the feature/4298-fix-performance-test branch from c9a1139 to 756f33d Compare December 22, 2023 10:48

javiersanchz requested a review from GGP1 December 22, 2023 11:00

GGP1 previously approved these changes Dec 22, 2023

View reviewed changes

Selutario suggested changes Jan 25, 2024

View reviewed changes

CHANGELOG.md Outdated Show resolved Hide resolved

javiersanchz dismissed GGP1’s stale review via a89eb90 January 26, 2024 11:19

javiersanchz force-pushed the feature/4298-fix-performance-test branch from 756f33d to a89eb90 Compare January 26, 2024 11:19

javiersanchz changed the base branch from master to 4.8.2 January 26, 2024 11:20

Selutario previously approved these changes Jan 26, 2024

View reviewed changes

javiersanchz dismissed Selutario’s stale review via 8595425 January 29, 2024 16:31

javiersanchz force-pushed the feature/4298-fix-performance-test branch from a89eb90 to 8595425 Compare January 29, 2024 16:31

javiersanchz changed the base branch from 4.8.2 to 4.8.0 January 29, 2024 16:31

fix: change files syntax artifacts_path

aa4a27a

javiersanchz force-pushed the feature/4298-fix-performance-test branch from 8595425 to aa4a27a Compare January 29, 2024 17:36

rauldpm approved these changes Jan 29, 2024

View reviewed changes

davidjiglesias merged commit 578027a into 4.8.0 Feb 8, 2024
4 checks passed

davidjiglesias deleted the feature/4298-fix-performance-test branch February 8, 2024 14:19

fdalmaup mentioned this pull request Feb 23, 2024

Release 4.7.3 - Release Candidate 1 - Workload benchmarks metrics wazuh/wazuh#22023

Closed

2 tasks

fdalmaup mentioned this pull request Mar 4, 2024

Review Workload pipeline CSV parser methods #5068

Closed

javiersanchz mentioned this pull request May 24, 2024

Release 4.7.5 - Release Candidate 1 - Workload benchmarks metrics wazuh/wazuh#23598

Closed

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix test cluster performance #4780

Fix test cluster performance #4780

javiersanchz commented Dec 19, 2023

GGP1 left a comment

GGP1 left a comment

GGP1 left a comment

Selutario left a comment

Fix test cluster performance #4780

Fix test cluster performance #4780

Conversation

javiersanchz commented Dec 19, 2023

Description

GGP1 left a comment

Choose a reason for hiding this comment

GGP1 left a comment

Choose a reason for hiding this comment

GGP1 left a comment

Choose a reason for hiding this comment

Selutario left a comment

Choose a reason for hiding this comment