Create rationale for each metric type #3578

Sebastiaan127001 · 2022-03-23T19:26:26Z

For each issue or warning, tools like SonarQube and CheckMarx have an informational button or tooltip that says: "why is this an issue?".
It would be helpful if the user could see for each metric why measuring this is important or how this metric contributes to the overall quality.

Proposed rationales:

accessibility violations: Accessibility is essential for developers and organizations that want to create high quality websites and web tools, and not exclude people from using their products and services
Commented out code: Programmers should not comment out code as it bloats programs and reduces readability.
Unused code should be deleted and can be retrieved from source control history if required.
Complex units: Software products where more source code resides in units with high logical complexity are deemed to be harder to maintain. To maximize the rating of a product for the unit complexity property, the software producer should avoid units with high complexity.
https://www.softwareimprovementgroup.com/wp-content/uploads/2021-SIG-TUViT-Evaluation-Criteria-Trusted-Product-Maintainability-Guidance-for-producers.pdf
Dependencies: Failure to update dependencies makes your product increasingly difficult to maintain and it can bring security risks.
Optional:
- Your product can malfunction
- You will not be able to use new features added in the latest versions
- You may miss out on performance improvements provided by updates
- Security issue fixes can be missed or delayed
- Maintenance overheads of old versions could be reduced
- Bug fixes are often contained in the new versions
Duplicated lines: That most of the time, when there are lots of duplicated lines, there are few unit tests (see “Risk of regression” in table 4.1). See https://livebook.manning.com/book/sonarqube-in-action/chapter-4/69. Duplication also contributes to LOC and complexity because having multiple copies of anything, whether data or algorithms, not only means more work when there are changes, but also could mean that you end up with some outdated copies, which is the most dangerous side effect of duplication. In short this will impact maintainability. Software products with less (textual) duplication are deemed to be easier to maintain. The software producer should avoid multiple occurrences of the same fragments of code. https://www.softwareimprovementgroup.com/wp-content/uploads/2021-SIG-TUViT-Evaluation-Criteria-Trusted-Product-Maintainability-Guidance-for-producers.pdf
Failed CI-jobs: Failed jobs or pipelines will impact the CI/CD process and overall quality of the software product [HOW/WHY WILL THIS IMPACT]
Issues: todo
Long units: Software products where more source code resides in large units are deemed to be harder to maintain. The software producer should avoid large units. https://www.softwareimprovementgroup.com/wp-content/uploads/2021-SIG-TUViT-Evaluation-Criteria-Trusted-Product-Maintainability-Guidance-for-producers.pdf
Manual test duration: The starting point is that the software produced is, as much as possible, tested automatically and that conscious choices have been made about the code that is not tested automatically. In order to be able to make these choices properly, it is important that the part of the code which is not affected by automated testing is relatively small, so that the required amount of manual testing remains limited and the risks of manual testing restricted.
(Still, the above should say more about the duration and how that aspect is a risk to quality)
Manual test execution: By measuring the number of manual test cases that have not been tested recently, gives an overview of the testcoverage.
Many parameters: The software producer should avoid units with large interfaces, as units with large interfaces are deemed to be harder to maintain. The size of the interface of a unit can be quantified as the number of parameters (also known as formal arguments) that are defined in the signature or declaration of a unit.
Merge requests: todo
Metrics: todo
Missing metrics: When there are metrics missing, important information could be missing. This metric uses the subject type to determine the absent metrics that might cause ‘blind spots’ in the quality report.
Performancetest duration: todo
Performancetest stability: The percentage of the progress of the test a trend break is noticeable. If the application starts generating faults or high response times, the percentage of the duration at which the trend break occurs is reported. 100%=stable, 75%=breaks at 75% of planned duration. Only applies to endurance testing. This test is an indication to find memory leaks or similar resources. This measure is meant to monitor if an application will continue running stable during an x hour endurance test (duurtest). The Trend break 'stability' measure should be 100%. If response times or error rates rise this algorithm will put a percentage of the test this anomaly or break was detected. If the stability shows a trend break at or after 10% of the test, the Trend break 'stability' will show 10%. This measure should always be 100%.
Scalability: This metric tells at what percentage of the test the application 'breaks'. Usually, a test environment is not scaled to production levels, so use this metric to compare it with previous tests. If this metric is low or declines over subsequent tests, the margins are low or decreasing. This measure should be <100% to actually see margins to increase or decrease over tests and it only applies to stress testing.
The question is: do your margins give you enough headroom for processing spikes in traffic? This metric gives an answer to that question. The value of this metric should be compared to previous tests, so the trend is more useful to evaluate than the value itself. If the scalability is decreasing, the ability of your application to digest traffic spikes is decreasing. Be sure to create a stress test that will push the application beyond its limits to actually see the application break during a stress test.
Security warnings: todo
Sentiment: The SPACE of Developer Productivity:
Satisfaction is how fulfilled developers feel with their work, team, tools, or culture; well-being is how healthy and happy they are, and how their work impacts it. Measuring satisfaction and well-being can be beneficial for understanding productivity and perhaps even for predicting it. For example, productivity and satisfaction are correlated, and it is possible that satisfaction could serve as a leading indicator for productivity; a decline in satisfaction and engagement could signal upcoming burnout and reduced productivity.
https://queue.acm.org/detail.cfm?id=3454124
Size (LOC): Size is one of the important attributes of a software product. Lines of Code (SLOC or LOC) is one of the most widely used sizing metrics in industry. The amount of effort needed to maintain a software system is related to the technical quality of the source code of that system. Estimation of effort is complicated and challenging task in software industry. Project size is a measure of problem complexity in terms of effort and time required to develop the products. SLOC is typically used to predict the amount of effort that will be required to develop a program, as well as to estimate programming productivity or complexity once the software is produced. https://ieeexplore.ieee.org/document/7012875
Source up-to-dateness: Checks whether a source is updated regularly for recent results
Source version: By checking regularly, the version of a specific source, we keep up to date
Suppressed violations: By measuring the number of violations suppressed in the source we could keep track on (un)wanted suppressions which are not visible (suppressed) anymore by/in tooling
Test branch coverage: todo
Test cases: todo
Test line coverage: todo
Tests: todo
Time remaining: todo
Unmerged branches: By measuring the number of branches that have not been merged to the default branch we keep the code base clean and we prevent code not being merged to the production branch.
Unused CI-jobs: To keep the number of ci-jobs clean and uncluttered
User story points: To report on the amount of ready user stories (work stock)
Velocity: To keep track on the velocity of the team
Violation remediation effort: todo
Violations: todo

fniessink · 2022-04-13T14:49:54Z

Implementation:

Add an optional rationale field to the Metric meta class (components/server/src/external/data_model/meta/metric.py)
Add rationales to the metrics in the data model (components/server/src/external/data_model/metrics.py).
Make sure the rationale is passed from the server endpoint(s) to the frontend.
Add the rationales to the UI: each metric (that has a rationale) gets an additional "Rationale" popup that contains the header "Why measure {metric name}?" and the text of the rationale. URLs are clickable. Other markup is out of scope.
Add the rationale to the exported documentation that is published on ReadTheDocs (docs/src/create_metrics_and_sources_md.py).

Sebastiaan127001 added the Feature New, enhanced, or removed feature label Mar 23, 2022

fniessink self-assigned this Apr 28, 2022

fniessink mentioned this issue Apr 29, 2022

3578 add rationale to metric types #3770

Merged

fniessink closed this as completed in #3770 May 5, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create rationale for each metric type #3578

Create rationale for each metric type #3578

Sebastiaan127001 commented Mar 23, 2022 •

edited by fniessink

Loading

fniessink commented Apr 13, 2022 •

edited

Loading

Create rationale for each metric type #3578

Create rationale for each metric type #3578

Comments

Sebastiaan127001 commented Mar 23, 2022 • edited by fniessink Loading

fniessink commented Apr 13, 2022 • edited Loading

Sebastiaan127001 commented Mar 23, 2022 •

edited by fniessink

Loading

fniessink commented Apr 13, 2022 •

edited

Loading