Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scaling in cplex interface #116

Closed
cdiener opened this issue Jul 19, 2017 · 5 comments
Closed

Scaling in cplex interface #116

cdiener opened this issue Jul 19, 2017 · 5 comments

Comments

@cdiener
Copy link
Member

cdiener commented Jul 19, 2017

Hi, I experienced that modifying the objective in the cplex interface scales badly.

For a model with about variables setting the objective takes about half of the time when optimizing (see "cumtime" column which is the time the function call and all sub-calls takes):

 1174604 function calls (1050578 primitive calls) in 0.698 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.698    0.698 {built-in method builtins.exec}
        1    0.000    0.000    0.698    0.698 <string>:1(<module>)
        1    0.000    0.000    0.698    0.698 parsimonious.py:27(pfba)
        1    0.003    0.003    0.698    0.698 parsimonious.py:138(_pfba_optlang)
        1    0.000    0.000    0.471    0.471 parsimonious.py:102(add_pfba)
        2    0.000    0.000    0.415    0.207 cplex_interface.py:713(objective)
        4    0.000    0.000    0.295    0.074 _subinterfaces.py:3852(set_linear)

However for a model with ~200.000 variables it now takes 95% of the time.

         40679344 function calls (36037663 primitive calls) in 257.649 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000  257.649  257.649 {built-in method builtins.exec}
        1    0.000    0.000  257.649  257.649 <string>:1(<module>)
        1    0.000    0.000  257.649  257.649 community.py:275(optimize)
        2    0.015    0.007  249.575  124.788 cplex_interface.py:713(objective)
        4    0.009    0.002  244.498   61.125 _subinterfaces.py:3852(set_linear)

I did not observe the same effect with the other solver interfaces.

@KristianJensen
Copy link
Contributor

I cannot replicate this. Can you provide a snippet?

@cdiener
Copy link
Member Author

cdiener commented Aug 9, 2017

Sure. I hope it is ok to use cobrapy, much easier to set up the test case...

In [1]: from cobra.test import create_test_model

In [2]: from cobra.io import read_sbml_model

In [3]: from cobra.flux_analysis import pfba

In [4]: import cProfile

In [5]: small = create_test_model("textbook")

In [6]: small.solver = "cplex"

In [7]: large = read_sbml_model("/home/cdiener/Downloads/recon_2.2.xml")
cobra/io/sbml.py:235 UserWarning: M_h_c appears as a reactant and product RE3453C
cobra/io/sbml.py:235 UserWarning: M_h_c appears as a reactant and product RE3459C
cobra/io/sbml.py:235 UserWarning: M_h_x appears as a reactant and product FAOXC24C22x
cobra/io/sbml.py:235 UserWarning: M_h_c appears as a reactant and product HAS1
cobra/io/sbml.py:235 UserWarning: M_h2o_x appears as a reactant and product PROFVSCOAhc

In [8]: large.solver = "cplex"

In [9]: cProfile.run("pfba(small)", sort="cumtime")
         43637 function calls (39034 primitive calls) in 0.035 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.035    0.035 {built-in method builtins.exec}
        1    0.000    0.000    0.035    0.035 <string>:1(<module>)
        1    0.000    0.000    0.035    0.035 parsimonious.py:27(pfba)
        1    0.000    0.000    0.035    0.035 parsimonious.py:138(_pfba_optlang)
        1    0.000    0.000    0.021    0.021 parsimonious.py:102(add_pfba)
        2    0.000    0.000    0.019    0.010 cplex_interface.py:713(objective)
        5    0.002    0.000    0.012    0.002 basic.py:405(atoms)
        1    0.000    0.000    0.012    0.012 solver.py:96(set_objective)
        1    0.000    0.000    0.012    0.012 model.py:978(__exit__)
        1    0.000    0.000    0.012    0.012 context.py:32(reset)
        1    0.000    0.000    0.012    0.012 solver.py:158(reset)
        4    0.000    0.000    0.009    0.002 _subinterfaces.py:3927(set_linear)
       ...

In [10]: cProfile.run("pfba(large)", sort="cumtime")
         3445392 function calls (3067877 primitive calls) in 2.891 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    2.891    2.891 {built-in method builtins.exec}
        1    0.000    0.000    2.891    2.891 <string>:1(<module>)
        1    0.000    0.000    2.890    2.890 parsimonious.py:27(pfba)
        1    0.001    0.001    2.890    2.890 parsimonious.py:138(_pfba_optlang)
        2    0.001    0.000    2.103    1.051 cplex_interface.py:713(objective)
        1    0.001    0.001    1.761    1.761 parsimonious.py:102(add_pfba)
        4    0.001    0.000    1.737    0.434 _subinterfaces.py:3927(set_linear)
      ...

The set_linear function takes up about 1/4 of the time for the small model (100 reactions) but half of the time for the large model (10000) reactions. For a model with 100,000 reactions it takes up more than 90% of the function call. I once read in the cplex docs that they don't recommend using named variables for large models since it will be slower, don't know if that is related though...

Might not be relevant for a large user basis though, so no high priority here. I'm working with microbial community models and can't run pFBA with cplex in decent time on those due to this scaling problem.

@KristianJensen
Copy link
Contributor

Ok, that helped. Now I see the problem, it's actually the cplex function that's slow. As far as I can see the issue is, as you pointed out, that referencing variables based on names is slow. The main part of the time in set_linear is spent running get_indices. It seems like cplex doesn't keep a mapping between names and indices, since the time it takes to look up the index of a name scales roughly linearly with its position.

To get around this I guess we could use the name -> index mapping that's already being kept by the .variables Container object. I'm not quite sure though if there would be cases where the two indices wouldn't be the same...

@cdiener
Copy link
Member Author

cdiener commented Aug 10, 2017

Maybe the indices would change if you delete a variable? I agree that it might be better to have optlang manage the variable -> mapping as in glpk. However, I think it's not high priority for now since it would be quite a lot of work...

@KristianJensen
Copy link
Contributor

I've managed to fix this specific issue with a relatively simple change. Modifying the objective should now scale linearly with the number of variables in the objective instead of quadratically. The get_indices issue might also be a problem in other cases where a function is called with a large number of variables/constraints, but changing the objective is probably where this happens most frequently.

Setting a pFBA-like objective on a model with 200,000 variables now takes a few seconds, comparable to GLPK.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants