Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BiCGStab OMP issue #1563

Open
blegouix opened this issue Mar 8, 2024 · 14 comments
Open

BiCGStab OMP issue #1563

blegouix opened this issue Mar 8, 2024 · 14 comments

Comments

@blegouix
Copy link

blegouix commented Mar 8, 2024

Hello,

This issue is the continuation of #1561 with more details. We use BiCGStab to solve a quasi-tridiagonal pds problem with different executors (Serial, OMP and CUDA). We observe a numerical issue with OMP. In our test bench build upon those solvers (which involve more than just Ginkgo) we evaluate an error to 1e-16 with Serial or CUDA, but ~1e-10 with OMP even with OMP_NUM_THREADS=1.

The problem disappears just by replacing the OMP executor with a Serial one.

I link the matrix and the rhs which is problematic. Note that the size of the matrix has an impact (no problem with n=10 or n=100 ie.)

csr.txt
rhs.txt

Please let me know if it is enough for you to reproduce the problem.

I tried with gcc and clang, with Ginkgo 1.7.0 and current develop branch.

Valgrind does not reveals anything and both cases (with serial or omp executor) "converge" in 3 iterations.

Issue from our side: CExA-project/ddc#332
Problematic ginkgo apply call (all ginkgo stuff is in this file): https://github.com/CExA-project/ddc/blob/cc2942283213cc700b3bd8c09fb00346621997e3/include/ddc/kernels/splines/matrix_sparse.hpp#L207

(create_gko_exec<Kokkos::Serial>() is a gko::ReferenceExecutor::create(); while gko_exec is a gko::OmpExecutor::create();.)

Regards

@MarcelKoch
Copy link
Member

From the code you linked, I could see that you are using a block Jacobi preconditioner, with block size 32 for OMP and reference. Have you tried using a block size of 1 and see if any differences appear? Nevertheless, it could still hint at some issue in our OMP implementation.

@blegouix
Copy link
Author

blegouix commented Mar 8, 2024

It is indeed fixing the issue.

@MarcelKoch
Copy link
Member

I've run our benchmarks with the provided matrices. If I used the same residual reduction as in your code 1e-19 I also get a difference in the final residual:

  • reference/cuda: 1e-16
  • omp: 1e-13

Perhaps just for clarification, by specifying only the reduction factor, the iteration will be stopped when ||r|| < eps ||b||, where b is the right-hand-side. In my experience using eps = 1e-19 is very small, and not reachable just with double precision. Maybe you want to use the absolut norm criteria ||r|| < eps, by adding .with_baseline(gko::stop::mode::absolute).

Interestingly, the difference disappears if mode::initial_resnorm is used, which corresponds to ||r|| < eps ||r_0||.

@blegouix
Copy link
Author

blegouix commented Mar 8, 2024

Ok thank you, I indeed though I was using an absolute norm critera, whereas the "residual reduction" terminology was quite clear.

I let you close the issue depending if you think it would require an improvement from your side or not.

Thank you for the quick answer.

@blegouix
Copy link
Author

@MarcelKoch how did you "run your benchmarks with the provided matrices" ? I see you have some benchmark using SuiteSparse but I see no way to use local data. Should I upload my matrix in SuiteSparse ? You did it already ?

@MarcelKoch
Copy link
Member

No I didn't upload anything and it's not necessary to run our benchmarks.

Our benchmarks are only documented as part of reproducibility for some paper. How to use them individually is not really documented. To give you a broad summary:

First you need to define the input data in a json file. The json file has to consist of exactly one array, and within that array the test cases are defined. For example:

[
  {
    "filename": "path/to/your/matrix",
    "rhs": "path/to/your/rhs"
  },
  { ... }
]

The files have to be in matrix market format.

Some benchmarks require extra some extra fields. For example the solver benchmarks requires the field "optional": {"spmv": "format/such/as/csr"}.

Then you can call a benchmark and pass in the input via stdin, i.e.

solver < input.json

You can get the options that are available for a given benchmark by using --help.

The output of our benchmarks is again json, and it is printed to stdout, while our status messages are printed to stderr. So, you can store the output with

solver < input.json > output.json

@blegouix
Copy link
Author

Great. It will be very usefull.

So, coming back to the current issue, stop using the preconditionner indeed solved the problem with the matrices I provided at first, but we still have some difficulties with OMP. So preconditionner was just part of the problem. I made a minimal reproducer for it:

https://github.com/blegouix/gko_omp_reprod/tree/main

Note that using the benchmarks on the matrices does not reproduces the problem, maybe because of the lower precision of numbers in the rhs.mtx file.

@blegouix
Copy link
Author

blegouix commented Apr 3, 2024

@MarcelKoch sorry to ping you but can you confirm you have seen my message ? As I did not open a new issue and this one is quite old

@upsj
Copy link
Member

upsj commented Apr 3, 2024

If you apply the following patch and export your files using write_binary instead of write, you can store the rhs vector exactly

diff --git a/benchmark/solver/solver_common.hpp b/benchmark/solver/solver_common.hpp
index 19e718e08..541a9f662 100644
--- a/benchmark/solver/solver_common.hpp
+++ b/benchmark/solver/solver_common.hpp
@@ -288,7 +288,7 @@ struct SolverGenerator : DefaultSystemGenerator<> {
     {
         if (config.contains("rhs")) {
             std::ifstream rhs_fd{config["rhs"].get<std::string>()};
-            return gko::read<Vec>(rhs_fd, std::move(exec));
+            return gko::read_generic<Vec>(rhs_fd, std::move(exec));
         } else {
             gko::dim<2> vec_size{system_matrix->get_size()[0], FLAGS_nrhs};
             if (FLAGS_rhs_generation == "1") {

@blegouix
Copy link
Author

blegouix commented Apr 3, 2024

Great. I can now reproduce the problem in your benchmark:

~/ginkgo/build/benchmark/solver$ ./solver -solvers bicgstab -executor omp < matrix.json                                                                                                                                                                                                                                       [28/1904]
This is Ginkgo 1.8.0 (develop)                                                            
    running with core module 1.8.0 (develop)                                              
    the reference module is  1.8.0 (develop)                                              
    the OpenMP    module is  1.8.0 (develop)                                              
    the CUDA      module is  1.8.0 (develop)                                              
    the HIP       module is  not compiled                                                 
    the DPCPP     module is  not compiled                                                 
Running on omp(0)                                                                         
Running with 2 warm iterations and 1 running iterations                                
The random seed for right hand sides is 42                                                
Running bicgstab with 1000 iterations and residual goal of 1.000000e-06                                                                                                              
The number of right hand sides is 1                                                       
Running test case /home/cart3sianbear/gko_omp_reprod/csr.mtx
Matrix is of size (10, 10)
        Running solver: bicgstab      
[                                      
    {                     
        "filename": "/home/cart3sianbear/gko_omp_reprod/csr.mtx",
        "rhs": "/home/cart3sianbear/gko_omp_reprod/build/rhs_binary",     
        "optimal": {                                                                      
            "spmv": "csr"                                                                 
        },                                                                                
        "solver": {                                                                       
            "bicgstab": {                                                                 
                "recurrent_residuals": [                                                  
                    0.14234837634095868,                                                  
                    0.14234837634095868   
                ],                                                                                                                                                                   
                "true_residuals": [                                                       
                    0.14234837634095868,                                                  
                    0.11108041181890674                                                   
                ],                                                                        
                "implicit_residuals": [                                                   
                    0.14234837634095868,                                                  
                    0.14234837634095868                                                   
                ],                                                                        
                "iteration_timestamps": [                                                 
                    0.000335548,                                                          
                    0.000767963                                                           
                ],                                                                        
                "rhs_norm": 2.23606797749979,
                "generate": {                                                             
                    "components": {                                                       
                        "generate(gko::solver::Bicgstab<double>::Factory)": 9.521e-06,
                        "generate(gko::matrix::IdentityFactory<double>)": 3.2340000000000003e-06,
                        "free": 4.942e-06,                                                
                        "overhead": 2.7524000000000003e-05
                    },                                                                    
                    "time": 2.0208e-05                                                    
                },                                                                        
                "apply": {                                                                
                    "components": {                                                       
                        "apply(gko::solver::Bicgstab<double>)": 1.929e-06,
                        "iteration": 0.000106693,
                        "allocate": 8.380000000000001e-06,
                        "dense::fill": 3.3087000000000004e-05,
                        "bicgstab::initialize": 1.4282000000000001e-05,
                        "advanced_apply(gko::matrix::Csr<double, int>)": 1.793e-05,
                        "csr::advanced_spmv": 1.8948000000000002e-05,
                        "dense::compute_norm2_dispatch": 0.00012472,
                        "free": 2.176e-06,                                                
                        "copy(gko::matrix::Dense<double>,gko::matrix::Dense<double>)": 1.3353e-05,
                        "dense::copy": 2.245e-05,
                        "dense::compute_conj_dot_dispatch": 0.000152076,
                        "check(gko::stop::Combined)": 1.4499000000000001e-05,
                        "check(gko::stop::ResidualNorm<double>)": 1.5558000000000002e-05,
                        "residual_norm::residual_norm": 8.1724e-05,
                        "check(gko::stop::Iteration)": 1.838e-06,
                        "bicgstab::step_1": 1.7100000000000002e-05,
                        "apply(gko::matrix::Identity<double>)": 6.95e-06,
                        "apply(gko::matrix::Csr<double, int>)": 6.9960000000000004e-06,
                        "csr::spmv": 1.6373e-05,
                        "bicgstab::step_2": 2.0842e-05,
                        "bicgstab::finalize": 1.4678e-05,
                        "overhead": 7.525500000000001e-05
                    },
                    "iterations": 0,
                    "time": 0.000532362
                },
                "preconditioner": {},
                "forward_error": 0.1359763191472606,
                "residual_norm": 0.12028732996140494,
                "repetitions": 1,
                "completed": true
            }
        },
        "rows": 10,
        "cols": 10
    }
]

I also have updated the reproducer to export the binary file.

We have noticed the problem looks absent with Gmres

@blegouix
Copy link
Author

blegouix commented Apr 3, 2024

rhs_binary.txt
csr.txt

@MarcelKoch
Copy link
Member

Hi @blegouix, I just wanted to let you know that I'm able to reproduce your issue with the files you provided, but I might not find time this week to investigate this further.

@blegouix
Copy link
Author

Hello, is there any progress on this issue ?

@blegouix
Copy link
Author

blegouix commented Jul 30, 2024

Another example of the bug, this time with a 4x4 diagonal matrix. Works with OMP+CG, works with CUDA+BICGSTAB, do not works with OMP+BICGSTAB:
mat.txt
rhs.txt

It tends to converge but stops too early. Let think that the problem is in the stopping criterion ?

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants