Scaling in cplex interface #116

cdiener · 2017-07-19T19:58:31Z

Hi, I experienced that modifying the objective in the cplex interface scales badly.

For a model with about variables setting the objective takes about half of the time when optimizing (see "cumtime" column which is the time the function call and all sub-calls takes):

 1174604 function calls (1050578 primitive calls) in 0.698 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.698    0.698 {built-in method builtins.exec}
        1    0.000    0.000    0.698    0.698 <string>:1(<module>)
        1    0.000    0.000    0.698    0.698 parsimonious.py:27(pfba)
        1    0.003    0.003    0.698    0.698 parsimonious.py:138(_pfba_optlang)
        1    0.000    0.000    0.471    0.471 parsimonious.py:102(add_pfba)
        2    0.000    0.000    0.415    0.207 cplex_interface.py:713(objective)
        4    0.000    0.000    0.295    0.074 _subinterfaces.py:3852(set_linear)

However for a model with ~200.000 variables it now takes 95% of the time.

         40679344 function calls (36037663 primitive calls) in 257.649 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000  257.649  257.649 {built-in method builtins.exec}
        1    0.000    0.000  257.649  257.649 <string>:1(<module>)
        1    0.000    0.000  257.649  257.649 community.py:275(optimize)
        2    0.015    0.007  249.575  124.788 cplex_interface.py:713(objective)
        4    0.009    0.002  244.498   61.125 _subinterfaces.py:3852(set_linear)

I did not observe the same effect with the other solver interfaces.

KristianJensen · 2017-08-09T10:00:32Z

I cannot replicate this. Can you provide a snippet?

cdiener · 2017-08-09T16:44:51Z

Sure. I hope it is ok to use cobrapy, much easier to set up the test case...

In [1]: from cobra.test import create_test_model

In [2]: from cobra.io import read_sbml_model

In [3]: from cobra.flux_analysis import pfba

In [4]: import cProfile

In [5]: small = create_test_model("textbook")

In [6]: small.solver = "cplex"

In [7]: large = read_sbml_model("/home/cdiener/Downloads/recon_2.2.xml")
cobra/io/sbml.py:235 UserWarning: M_h_c appears as a reactant and product RE3453C
cobra/io/sbml.py:235 UserWarning: M_h_c appears as a reactant and product RE3459C
cobra/io/sbml.py:235 UserWarning: M_h_x appears as a reactant and product FAOXC24C22x
cobra/io/sbml.py:235 UserWarning: M_h_c appears as a reactant and product HAS1
cobra/io/sbml.py:235 UserWarning: M_h2o_x appears as a reactant and product PROFVSCOAhc

In [8]: large.solver = "cplex"

In [9]: cProfile.run("pfba(small)", sort="cumtime")
         43637 function calls (39034 primitive calls) in 0.035 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.035    0.035 {built-in method builtins.exec}
        1    0.000    0.000    0.035    0.035 <string>:1(<module>)
        1    0.000    0.000    0.035    0.035 parsimonious.py:27(pfba)
        1    0.000    0.000    0.035    0.035 parsimonious.py:138(_pfba_optlang)
        1    0.000    0.000    0.021    0.021 parsimonious.py:102(add_pfba)
        2    0.000    0.000    0.019    0.010 cplex_interface.py:713(objective)
        5    0.002    0.000    0.012    0.002 basic.py:405(atoms)
        1    0.000    0.000    0.012    0.012 solver.py:96(set_objective)
        1    0.000    0.000    0.012    0.012 model.py:978(__exit__)
        1    0.000    0.000    0.012    0.012 context.py:32(reset)
        1    0.000    0.000    0.012    0.012 solver.py:158(reset)
        4    0.000    0.000    0.009    0.002 _subinterfaces.py:3927(set_linear)
       ...

In [10]: cProfile.run("pfba(large)", sort="cumtime")
         3445392 function calls (3067877 primitive calls) in 2.891 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    2.891    2.891 {built-in method builtins.exec}
        1    0.000    0.000    2.891    2.891 <string>:1(<module>)
        1    0.000    0.000    2.890    2.890 parsimonious.py:27(pfba)
        1    0.001    0.001    2.890    2.890 parsimonious.py:138(_pfba_optlang)
        2    0.001    0.000    2.103    1.051 cplex_interface.py:713(objective)
        1    0.001    0.001    1.761    1.761 parsimonious.py:102(add_pfba)
        4    0.001    0.000    1.737    0.434 _subinterfaces.py:3927(set_linear)
      ...

The set_linear function takes up about 1/4 of the time for the small model (100 reactions) but half of the time for the large model (10000) reactions. For a model with 100,000 reactions it takes up more than 90% of the function call. I once read in the cplex docs that they don't recommend using named variables for large models since it will be slower, don't know if that is related though...

Might not be relevant for a large user basis though, so no high priority here. I'm working with microbial community models and can't run pFBA with cplex in decent time on those due to this scaling problem.

KristianJensen · 2017-08-10T11:00:15Z

Ok, that helped. Now I see the problem, it's actually the cplex function that's slow. As far as I can see the issue is, as you pointed out, that referencing variables based on names is slow. The main part of the time in set_linear is spent running get_indices. It seems like cplex doesn't keep a mapping between names and indices, since the time it takes to look up the index of a name scales roughly linearly with its position.

To get around this I guess we could use the name -> index mapping that's already being kept by the .variables Container object. I'm not quite sure though if there would be cases where the two indices wouldn't be the same...

cdiener · 2017-08-10T20:44:59Z

Maybe the indices would change if you delete a variable? I agree that it might be better to have optlang manage the variable -> mapping as in glpk. However, I think it's not high priority for now since it would be quite a lot of work...

KristianJensen · 2017-08-11T15:13:46Z

I've managed to fix this specific issue with a relatively simple change. Modifying the objective should now scale linearly with the number of variables in the objective instead of quadratically. The get_indices issue might also be a problem in other cases where a function is called with a large number of variables/constraints, but changing the objective is probably where this happens most frequently.

Setting a pFBA-like objective on a model with 200,000 variables now takes a few seconds, comparable to GLPK.

KristianJensen mentioned this issue Aug 11, 2017

Various bug fixes #126

Merged

KristianJensen closed this as completed Aug 21, 2017

cdiener mentioned this issue Nov 30, 2021

New class LinearConstraintsMatrix for better model building performance #239

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scaling in cplex interface #116

Scaling in cplex interface #116

cdiener commented Jul 19, 2017 •

edited

Loading

KristianJensen commented Aug 9, 2017

cdiener commented Aug 9, 2017 •

edited

Loading

KristianJensen commented Aug 10, 2017

cdiener commented Aug 10, 2017

KristianJensen commented Aug 11, 2017

Scaling in cplex interface #116

Scaling in cplex interface #116

Comments

cdiener commented Jul 19, 2017 • edited Loading

KristianJensen commented Aug 9, 2017

cdiener commented Aug 9, 2017 • edited Loading

KristianJensen commented Aug 10, 2017

cdiener commented Aug 10, 2017

KristianJensen commented Aug 11, 2017

cdiener commented Jul 19, 2017 •

edited

Loading

cdiener commented Aug 9, 2017 •

edited

Loading