-
Notifications
You must be signed in to change notification settings - Fork 227
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Degeneracy & Multiple TS #1055
Degeneracy & Multiple TS #1055
Conversation
Before this branch it was virtually impossible to pick up the propane low T ketohydroperoxide feedback loop without a library due to a multiple TS issue with a key chain branching reaction. I can confirm that this pull request enables RMG to now pick up this key pathway at appropriate temperatures and feasible tolerances. |
rmgpy/molecule/molecule.py
Outdated
atom.id = i | ||
if start == -1: | ||
# use entire range of integers to label atoms | ||
atom.id = randint(-2**15,2**15) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we want to be able to generate entirely random atom ID's rather than just increment them from a set value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
...and then if they're not unique (not self.atomIDValid()
) we make them sequentially increasing from a random start? (Suggesting sequential is fine and it needn't be random?). Just using sequential sounds simpler to me, unless Matt and I are both missing something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question. Generating IDs from a starting value (like previously going 1 to N) is almost certainly going to cause clashes of atom IDs after two labeled atoms are merged in generateReactions. Generating IDs over the entire range of integer values is much more likely to reduce the clashing IDs when merging two molecules (which is what random IDs seeks to accomplish). When a clash occurs, we would need to reassign IDs properly for all resonance structures (possibly remaking resonance structures which is quite time intensive)
One alternative to completely random numbers could be to label starting from -2^-15 till 2^15 and then loop back over after labeling ~65,000 atoms (not resetting after each call). If you know that this would reduce conflicts and/or computational time, let me know.
The atomIDs cannot be set post-merging since this can result in incorrect degeneracy values when trying to generate reactions when multiple resonance structures are present. Lets say you have two reactants reacting together, and each has multiple resonance structures. In generateReactions, all combinations of resonance structures will be merged and the possible reactions output. Then findDegeneracies
would use the atomIDs for all the sets of products to find the degeneracy. If different resonance structures are labeled differently (which can happen when relabeling after the merging of structures), the degeneracy could be higher than expected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can make a class variable that tracks the entire range of integers to avoid overlap, and then all the atom IDs can be added sequentially. That seems more intuitive than random numbers and probably will save computational time by not checking atomIDValid
when setting the IDs. Do you think that would be preferable? Any other ideas on how to avoid conflicting IDs when merging without random Numbers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that I understand what the IDs are doing, I like @goldmanm's last suggestion of a class variable that gradually increments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I only skimmed through the changes and made a few minor comments. I'm very excited to try it out.
Have you tested to see if pdep is affected at all? I ran into some issues with it when I tried changing generateReactions and deflate/inflate to work with species objects.
rmgpy/reaction.py
Outdated
else: | ||
degeneracyRatio = (new*1.0) / self._degeneracy | ||
self.kinetics.changeRate(degeneracyRatio) | ||
logging.debug('did not modify A factor when modifying degeneracy since the reaction rate was not set') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this line be indented less one level?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
put an else statement there in squashed commit.
rmgpy/kinetics/arrhenius.pyx
Outdated
@@ -423,10 +423,10 @@ cdef class PDepArrhenius(PDepKineticsModel): | |||
|
|||
""" | |||
|
|||
def __init__(self, pressures=None, arrhenius=None, highPlimit=None, Tmin=None, Tmax=None, Pmin=None, Pmax=None, comment=''): | |||
def __init__(self, pressures=None, arrhenius=[], highPlimit=None, Tmin=None, Tmax=None, Pmin=None, Pmax=None, comment=''): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's generally undesirable to have mutable default arguments. Was there a reason for this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was an old commit from awhile ago (which I fixed), I am not sure how it got in here. I will remove this commit from the PR.
rmgpy/molecule/molecule.py
Outdated
else: | ||
atom.id = i + start | ||
if not self.atomIDValid(): | ||
self.assignAtomIDs(start=randint(-2**15,2**14)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Upper limit should be 2**15
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I put it lower to ensure it could still count up, but this will be replaced with a more intuitive method.
|
||
def test_degeneracy_keeps_track_of_both_rate_rules_from_resonance_isomers(self): | ||
""" | ||
rxns that have multiple resonance structures hitting different rate rules should |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfinished comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed comment in squashed commit.
rmgpy/data/kinetics/family.py
Outdated
elif isinstance(products[0],Species): | ||
products = [product.molecule for product in products] | ||
else: | ||
raise TypeError('products input to __generateReactions must be Species or Reaction Objects') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this say Species or Molecule Objects
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup! Fixed in squashed commit.
rmgpy/rmg/react.py
Outdated
sameTemplate = False | ||
for rxn in rxnList1: | ||
isomorphic = rxn0.isIsomorphic(rxn,checkIdentical=False) | ||
identical = rxn0.isIsomorphic(rxn,checkIdentical=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can check identical only if isomorphic is true or check isomorphic only if identical is false. Either way should reduce the number of calls made.
- If two reactions are identical, then they are isomorphic.
- If two reactions are not isomorphic, then they are not identical.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great idea! This would be an awesome speedup. Added this in a squash commit (for the second case under the assumption that not isomorphic
is more common than is identical
)
elif comparison_method(list1[0], list2[1]) \ | ||
and comparison_method(list1[1], list2[0]): | ||
return True | ||
elif len(list1) == len(list2) == 3: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't it be much faster and shorter to iterate through the lists and pop out entries that are the same from both lists as you go. I believe the below code would reduce the maximum calls to comparison_method to 6 rather than the current 18.
list1Temp = []
list2Temp = []
indexList = []
k = 0
while list1 != []:
entry = list1[k]
for i,entry2 in enumerate(list2):
if comparison_method(entry,entry2):
temp = list1.pop(k)
list1temp.append(temp)
temp = list2.pop(i)
list2temp.append(temp)
indexList.append(i)
break
else: #loop doesn't break, something is missing, templist2 is incomplete
for j in xrange(0,len(list2)):
if not j in indexList:
indexList.append(j)
list1 = list1Temp+list1
list2 = list2Temp+list2
list2 = [list2[i] for i in indexList]
return False
k += 1
#loop didn't break, list1 is fine, resort list2 so it is unchanged
list2 = [list2Temp[i] for i in indexList]
list1 = list1Temp
return True
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or wait you don't even need to avoid copying the entries as long as you don't deepcopy them do you?
list1Temp = list1[:]
list2Temp = list2[:]
for entry in list1Temp:
for entry2 in list2Temp:
if comparison_method(entry,entry2):
list2Temp.remove(entry2)
break
else: #loop doesn't break, something is missing
return False
#all loops completed
return True
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something must be wrong with this last example. Although I'm not 100% sure what it's trying to achieve, will the while
loop ever end? You never remove anything fromlist1Temp
. In any case, the alternative is not obviously faster, with lots of list manipulation, creation, popping, etc. I have some thoughts on probable efficiency gains, but before sharing them can anyone tell me if this matters? Does it get called a lot? (sorry for not being aware of the context - the discussion caught my eye).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, yes, sorry lost that in translation, I think you could actually just do "for entry in list1Temp" since you don't need k anymore. I thought this would be faster under the assumption that most of our time is spent in the isomorphism checks this avoids (maximum for 2-2 lists drops from 4 to 3, maximum for 3-3 lists drops from 18 to 6). Is that not the case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rwest, this method is called every time Reaction isomorphism is checked, which is done in degeneracy calculations, adding reactions to the edge, etc. I think it is relatively commonly called (though this is somewhat subjective without a profiling comparison). This method modularized what already existed in reaction.isIsomorphic
, though it would be awesome to get it working faster too.
@KEHANG, Is there any listing of previous profiling to see how important the Reaction.isIsomorphic
method is?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this notebook profiles the recommendations to isisomorphic by @mjohnson541 for H_abstraction reactions. It seems to be faster most of the time by about 10%, though sometimes it is slower (which could be due to other processes demanding resources). There should likely be more benefit if we are looking at 3x3 reactions. Feel free to play around with the notebook, and let me know if you'd like the suggested algorithm implemented here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could also avoid defining the function by passing the function pointer instead
if checkIdentical:
comparison_method = molecule.isIdentical
else:
comparison_method = molecule.isIsomorphic
I'm really not sure how significant the time savings would be, but the above timings suggest we do have some noticeable overhead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the new approach, but I don't think the time savings would be as significant as you would expect. See the table below for a comparison of the number of function calls. Note that list.remove
performs an isomorphism comparison with each item in the list until it finds the one it needs to remove. Using enumerate
along with del
instead of list.remove
might help a little bit.
reactants | case | if/elif method | loop method |
---|---|---|---|
1 | 1 comparison call | 1 comparison call 1 list.remove call |
|
2 | best | 2 comparison calls | 2 comparison calls 2 list.remove calls |
2 | worst | 4 comparison calls | 3 comparison calls 2 list.remove calls |
3 | best | 3 comparison calls | 3 comparison calls 3 list.remove calls |
3 | worst | 18 comparison calls | 6 comparison calls 3 list.remove calls |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
implemented the updated deletion algorithm, in the ipynb
and still no improvement in efficiency :(
rmgpy/rmg/react.py
Outdated
""" | ||
# These duplicates should be combined and the reaction degeneracy should be increased | ||
# Reaction degeneracy is only increased if two reaction are isomorphic but resulted | ||
# from different transition states. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So if I came across this comment randomly in the code without reading the function I would have thought the above comments implied that multiple TS's were simply treated as degeneracies for purposes of this function especially considering the docstring doesn't mention multiple TS's being added as duplicate reactions. Can you make that a bit more clear?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was my comment from earlier changes, before we decided to tackle multiple transition states. In this comment, different transition states refers to ones that are the effectively identical but involving a different set of atoms, which i guess you could also call degenerate transition states.
It should probably be changed now that we're also dealing with actually different transition states.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated in the most recent commit.
from rmgpy.reaction import ReactionError | ||
|
||
for rxn in reactionList: | ||
if _isomorphicSpeciesList(rxn.reactants, reactants): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason we can't add some boolean property to the reaction object during rate estimation (such as "fromReverse") so we already know if it's the reverse template or not and we don't have to do all of these isomorphism checks?
to allow independent testing of various unittests, the database must be removed after each testing set, to avoid multiple modular level variables from existing. Solvation parameters were also cleared theend of modules using solvation refactored rmgTest and modelTest to reduce database generation
this commit checks to see that the IDs are already properly named before renaming the atom ids. It also modified the numbering of assignAtomID to assign random numbers
Analogous to Species.isIsomorphic() Calls Molecule.isIdentical()
separated out the adding of kinetics step from the enlarge method, to enable easier testing of the method. Also moved the adding of kinetics attribute till after degeneracy is modified to avoid multiple counting of degeneracy.
modified the protocol for setting degeneracy to get RMG to update the kinetics rate constant to keep track of the degeneracy and reaction rates. Also moved setting degeneracy in copy methods to before setting kinetics as this would double-count degeneracy.
9f345c9
to
6186300
Compare
degeneracy calculation was removed from __generateReactions since it is moving to a standalone method.
separated method to compare isomorphism of multiple species to reduce code repitition. Added option to use isIdentical instead of isIsomorphic.
This commit makes the test for multiple deterministic kinetics function. and activates it for the testing suite.
changed default resonance structure generation to keep isomorphic structures since they are needed to proper degeneracy and symmetry values
separated setting reverse reactions for ownReverse family, so it can be called after degeneracy has been found (reducing work load), and for easier testing (+ wrote a test method for this reaction). this required adding H_Abstraction to kineticsTest and removing forbidden structures from kinetics test so all reactions can be found. added tests that test that findDegeneracies returns multiple transition states and that it is able to deal with aromatic structures
Modified the method to remove reactions with the same template and family, including reverse direction for ownReverse families. It should keep reactions with multiple transition states (unless an identical template was found). Also wrote three test methods to test for keeping separate transition states or removing the redundant reaction
Allow input of species objects to the products variable of __generateReactions since this would reduce resonanceIsomer generation
994c820
to
e5fc727
Compare
This commit recreates the degeneracy algorithm in react.py. It now can: 1. take into account atom IDs when determining degeneracy 1. keep multiple transition states for the same reaction 1. use isomorphic resonance structures when comparing reactions Max Liu created the base for this method.
Method written in react.py to ensure that given two reactants, the IDs on all of their atoms are different. This is necessary to trace atom IDs in products (after the structures are merged).
When reactions are found with the reverse template, they often have the degeneracy of the reverse template. This method ensures the reaction has the degeneracy of the forward template by using the family.calculateDegeneracy method.
reactants which are identical should have half the reaction rate when using the degeneracy method, since the translational component of their rates is already taken into account in their concentration, so them swapping places doesn't have a real effect on reaction rate. The algorithm to reduce degeneracy was placed in a separate method in the react module to be applied separately from finding degeneracies.
refactored the react module to lump together reactions based on species as opposed to molecule. This is necessary for obtaining accurate degeneracy values for reactions with multiple resonance isomers. deflating now also takes species objects
modified calculatDegeneracy to account for changes in degeneracy: * uses species objects if possible * looks at templates when multiple transition states present * calls the updated degeneracy algorithm * contains option to not reduce same reactant degeneracy (since this should only occur once).
This PR solves two related problems in RMG: reaction degeneracy values that take into account resonance structures and allows multiple transition states to be stored in RMG's output file. This PR will solve underterminable kinetics problems (#494) , muptiple TS (#323, #142), and proper degeneracy values (#876, #833, #619, #552) . These two problems both involve updates to the degeneracy-finding algorithm, which is why they are in one PR. The major modification for this pull request is the order in which the generate reactions steps take place. The reactions of all resonance structures are now generated before algorithms like degeneracy or finding reverse kinetics are run. In addition, the degeneracy algorithm was redone to take into account which atoms have actually been switched (this correct the excessive degneracy from certain classes # ) and the templates that are matched (to allow multiple transition states). In addition, the adding of new reactions to CoreEdgeReactionModel also checks the family and template, and marks duplicate reactions when they exist (allowing for multiple TS and removing underterminatee kinetics problems)
This PR is ready for code review. Testing this could involve merging it and running a job of interest, though I think RMG-Tests is already doing that. If it is merged, PR #323 can be removed.