Degeneracy & Multiple TS #1055

goldmanm · 2017-06-16T12:08:52Z

This PR solves two related problems in RMG: reaction degeneracy values that take into account resonance structures and allows multiple transition states to be stored in RMG's output file. This PR will solve underterminable kinetics problems (#494) , muptiple TS (#323, #142), and proper degeneracy values (#876, #833, #619, #552) . These two problems both involve updates to the degeneracy-finding algorithm, which is why they are in one PR. The major modification for this pull request is the order in which the generate reactions steps take place. The reactions of all resonance structures are now generated before algorithms like degeneracy or finding reverse kinetics are run. In addition, the degeneracy algorithm was redone to take into account which atoms have actually been switched (this correct the excessive degneracy from certain classes # ) and the templates that are matched (to allow multiple transition states). In addition, the adding of new reactions to CoreEdgeReactionModel also checks the family and template, and marks duplicate reactions when they exist (allowing for multiple TS and removing underterminatee kinetics problems)

redefines the degeneracy variable as a private float. When degeneracy is updated, the kinetic rate (if set) is too.
modularizes parts of generateReactions - finding rxn.reverse and degeneracy are separate methods
runs reactions betwween all resonance structures of species before finding degeneracy
utilizes reaction templates to find and keep multiple transition states (as separate reactions)
uses atomIDs to identify when a reaction path has already been added to degeneracy
adding proper cleanup to many test methods (including solvation database cleanup)
some code cleanup and other modularization of methods.

This PR is ready for code review. Testing this could involve merging it and running a job of interest, though I think RMG-Tests is already doing that. If it is merged, PR #323 can be removed.

mjohnson541 · 2017-06-16T16:08:54Z

Before this branch it was virtually impossible to pick up the propane low T ketohydroperoxide feedback loop without a library due to a multiple TS issue with a key chain branching reaction. I can confirm that this pull request enables RMG to now pick up this key pathway at appropriate temperatures and feasible tolerances.

mjohnson541 · 2017-06-16T16:26:24Z

rmgpy/molecule/molecule.py

-            atom.id = i
+            if start == -1:
+                # use entire range of integers to label atoms
+                atom.id = randint(-2**15,2**15)


Why do we want to be able to generate entirely random atom ID's rather than just increment them from a set value?

...and then if they're not unique (not self.atomIDValid()) we make them sequentially increasing from a random start? (Suggesting sequential is fine and it needn't be random?). Just using sequential sounds simpler to me, unless Matt and I are both missing something.

Good question. Generating IDs from a starting value (like previously going 1 to N) is almost certainly going to cause clashes of atom IDs after two labeled atoms are merged in generateReactions. Generating IDs over the entire range of integer values is much more likely to reduce the clashing IDs when merging two molecules (which is what random IDs seeks to accomplish). When a clash occurs, we would need to reassign IDs properly for all resonance structures (possibly remaking resonance structures which is quite time intensive)

One alternative to completely random numbers could be to label starting from -2^-15 till 2^15 and then loop back over after labeling ~65,000 atoms (not resetting after each call). If you know that this would reduce conflicts and/or computational time, let me know.

The atomIDs cannot be set post-merging since this can result in incorrect degeneracy values when trying to generate reactions when multiple resonance structures are present. Lets say you have two reactants reacting together, and each has multiple resonance structures. In generateReactions, all combinations of resonance structures will be merged and the possible reactions output. Then findDegeneracies would use the atomIDs for all the sets of products to find the degeneracy. If different resonance structures are labeled differently (which can happen when relabeling after the merging of structures), the degeneracy could be higher than expected.

I can make a class variable that tracks the entire range of integers to avoid overlap, and then all the atom IDs can be added sequentially. That seems more intuitive than random numbers and probably will save computational time by not checking atomIDValid when setting the IDs. Do you think that would be preferable? Any other ideas on how to avoid conflicting IDs when merging without random Numbers?

Now that I understand what the IDs are doing, I like @goldmanm's last suggestion of a class variable that gradually increments.

mliu49

I only skimmed through the changes and made a few minor comments. I'm very excited to try it out.

Have you tested to see if pdep is affected at all? I ran into some issues with it when I tried changing generateReactions and deflate/inflate to work with species objects.

mliu49 · 2017-06-16T16:14:35Z

rmgpy/reaction.py

+            else:
+                degeneracyRatio = (new*1.0) / self._degeneracy
+            self.kinetics.changeRate(degeneracyRatio)
+            logging.debug('did not modify A factor when modifying degeneracy since the reaction rate was not set')


Should this line be indented less one level?

put an else statement there in squashed commit.

mliu49 · 2017-06-16T16:19:44Z

rmgpy/kinetics/arrhenius.pyx

@@ -423,10 +423,10 @@ cdef class PDepArrhenius(PDepKineticsModel):

    """

-    def __init__(self, pressures=None, arrhenius=None, highPlimit=None, Tmin=None, Tmax=None, Pmin=None, Pmax=None, comment=''):
+    def __init__(self, pressures=None, arrhenius=[], highPlimit=None, Tmin=None, Tmax=None, Pmin=None, Pmax=None, comment=''):


I think it's generally undesirable to have mutable default arguments. Was there a reason for this change?

This was an old commit from awhile ago (which I fixed), I am not sure how it got in here. I will remove this commit from the PR.

mliu49 · 2017-06-16T16:22:26Z

rmgpy/molecule/molecule.py

+            else:
+                atom.id = i + start
+        if not self.atomIDValid():
+            self.assignAtomIDs(start=randint(-2**15,2**14))


Upper limit should be 2**15?

I put it lower to ensure it could still count up, but this will be replaced with a more intuitive method.

mliu49 · 2017-06-16T18:13:24Z

rmgpy/data/kinetics/kineticsTest.py

+
+    def test_degeneracy_keeps_track_of_both_rate_rules_from_resonance_isomers(self):
+        """
+        rxns that have multiple resonance structures hitting different rate rules should 


Unfinished comment?

fixed comment in squashed commit.

mliu49 · 2017-06-16T18:15:08Z

rmgpy/data/kinetics/family.py

+            elif isinstance(products[0],Species):
+                products = [product.molecule for product in products]
+            else:
+                raise TypeError('products input to __generateReactions must be Species or Reaction Objects')


Should this say Species or Molecule Objects?

Yup! Fixed in squashed commit.

mliu49 · 2017-06-16T18:26:25Z

rmgpy/rmg/react.py

+                sameTemplate = False
+                for rxn in rxnList1:
+                    isomorphic = rxn0.isIsomorphic(rxn,checkIdentical=False)
+                    identical = rxn0.isIsomorphic(rxn,checkIdentical=True)


I think you can check identical only if isomorphic is true or check isomorphic only if identical is false. Either way should reduce the number of calls made.

If two reactions are identical, then they are isomorphic.

If two reactions are not isomorphic, then they are not identical.

Great idea! This would be an awesome speedup. Added this in a squash commit (for the second case under the assumption that not isomorphic is more common than is identical)

mjohnson541 · 2017-06-16T19:16:35Z

rmgpy/reaction.py

+        elif comparison_method(list1[0], list2[1]) \
+                    and comparison_method(list1[1], list2[0]):
+            return True
+    elif len(list1) == len(list2) == 3:


Wouldn't it be much faster and shorter to iterate through the lists and pop out entries that are the same from both lists as you go. I believe the below code would reduce the maximum calls to comparison_method to 6 rather than the current 18.

list1Temp = [] list2Temp = [] indexList = [] k = 0 while list1 != []: entry = list1[k] for i,entry2 in enumerate(list2): if comparison_method(entry,entry2): temp = list1.pop(k) list1temp.append(temp) temp = list2.pop(i) list2temp.append(temp) indexList.append(i) break else: #loop doesn't break, something is missing, templist2 is incomplete for j in xrange(0,len(list2)): if not j in indexList: indexList.append(j) list1 = list1Temp+list1 list2 = list2Temp+list2 list2 = [list2[i] for i in indexList] return False k += 1 #loop didn't break, list1 is fine, resort list2 so it is unchanged list2 = [list2Temp[i] for i in indexList] list1 = list1Temp return True

Or wait you don't even need to avoid copying the entries as long as you don't deepcopy them do you?

list1Temp = list1[:] list2Temp = list2[:] for entry in list1Temp: for entry2 in list2Temp: if comparison_method(entry,entry2): list2Temp.remove(entry2) break else: #loop doesn't break, something is missing return False #all loops completed return True

Something must be wrong with this last example. Although I'm not 100% sure what it's trying to achieve, will the while loop ever end? You never remove anything fromlist1Temp. In any case, the alternative is not obviously faster, with lots of list manipulation, creation, popping, etc. I have some thoughts on probable efficiency gains, but before sharing them can anyone tell me if this matters? Does it get called a lot? (sorry for not being aware of the context - the discussion caught my eye).

Oh, yes, sorry lost that in translation, I think you could actually just do "for entry in list1Temp" since you don't need k anymore. I thought this would be faster under the assumption that most of our time is spent in the isomorphism checks this avoids (maximum for 2-2 lists drops from 4 to 3, maximum for 3-3 lists drops from 18 to 6). Is that not the case?

@rwest, this method is called every time Reaction isomorphism is checked, which is done in degeneracy calculations, adding reactions to the edge, etc. I think it is relatively commonly called (though this is somewhat subjective without a profiling comparison). This method modularized what already existed in reaction.isIsomorphic, though it would be awesome to get it working faster too.

@KEHANG, Is there any listing of previous profiling to see how important the Reaction.isIsomorphic method is?

this notebook profiles the recommendations to isisomorphic by @mjohnson541 for H_abstraction reactions. It seems to be faster most of the time by about 10%, though sometimes it is slower (which could be due to other processes demanding resources). There should likely be more benefit if we are looking at 3x3 reactions. Feel free to play around with the notebook, and let me know if you'd like the suggested algorithm implemented here.

You could also avoid defining the function by passing the function pointer instead

if checkIdentical: comparison_method = molecule.isIdentical else: comparison_method = molecule.isIsomorphic

I'm really not sure how significant the time savings would be, but the above timings suggest we do have some noticeable overhead.

I like the new approach, but I don't think the time savings would be as significant as you would expect. See the table below for a comparison of the number of function calls. Note that list.remove performs an isomorphism comparison with each item in the list until it finds the one it needs to remove. Using enumerate along with del instead of list.remove might help a little bit.

reactants case if/elif method loop method

1 1 comparison call 1 comparison call
1 list.remove call

2 best 2 comparison calls 2 comparison calls
2 list.remove calls

2 worst 4 comparison calls 3 comparison calls
2 list.remove calls

3 best 3 comparison calls 3 comparison calls
3 list.remove calls

3 worst 18 comparison calls 6 comparison calls
3 list.remove calls

implemented the updated deletion algorithm, in the ipynb and still no improvement in efficiency :(

mjohnson541 · 2017-06-19T20:27:18Z

rmgpy/rmg/react.py

+    """
+    # These duplicates should be combined and the reaction degeneracy should be increased
+    # Reaction degeneracy is only increased if two reaction are isomorphic but resulted
+    # from different transition states.


So if I came across this comment randomly in the code without reading the function I would have thought the above comments implied that multiple TS's were simply treated as degeneracies for purposes of this function especially considering the docstring doesn't mention multiple TS's being added as duplicate reactions. Can you make that a bit more clear?

This was my comment from earlier changes, before we decided to tackle multiple transition states. In this comment, different transition states refers to ones that are the effectively identical but involving a different set of atoms, which i guess you could also call degenerate transition states.

It should probably be changed now that we're also dealing with actually different transition states.

updated in the most recent commit.

mjohnson541 · 2017-06-19T20:37:52Z

rmgpy/rmg/react.py

+    from rmgpy.reaction import ReactionError
+
+    for rxn in reactionList:
+        if _isomorphicSpeciesList(rxn.reactants, reactants):


Is there a reason we can't add some boolean property to the reaction object during rate estimation (such as "fromReverse") so we already know if it's the reverse template or not and we don't have to do all of these isomorphism checks?

to allow independent testing of various unittests, the database must be removed after each testing set, to avoid multiple modular level variables from existing. Solvation parameters were also cleared theend of modules using solvation refactored rmgTest and modelTest to reduce database generation

this commit checks to see that the IDs are already properly named before renaming the atom ids. It also modified the numbering of assignAtomID to assign random numbers

Analogous to Species.isIsomorphic() Calls Molecule.isIdentical()

separated out the adding of kinetics step from the enlarge method, to enable easier testing of the method. Also moved the adding of kinetics attribute till after degeneracy is modified to avoid multiple counting of degeneracy.

modified the protocol for setting degeneracy to get RMG to update the kinetics rate constant to keep track of the degeneracy and reaction rates. Also moved setting degeneracy in copy methods to before setting kinetics as this would double-count degeneracy.

degeneracy calculation was removed from __generateReactions since it is moving to a standalone method.

separated method to compare isomorphism of multiple species to reduce code repitition. Added option to use isIdentical instead of isIsomorphic.

This commit makes the test for multiple deterministic kinetics function. and activates it for the testing suite.

changed default resonance structure generation to keep isomorphic structures since they are needed to proper degeneracy and symmetry values

separated setting reverse reactions for ownReverse family, so it can be called after degeneracy has been found (reducing work load), and for easier testing (+ wrote a test method for this reaction). this required adding H_Abstraction to kineticsTest and removing forbidden structures from kinetics test so all reactions can be found. added tests that test that findDegeneracies returns multiple transition states and that it is able to deal with aromatic structures

Modified the method to remove reactions with the same template and family, including reverse direction for ownReverse families. It should keep reactions with multiple transition states (unless an identical template was found). Also wrote three test methods to test for keeping separate transition states or removing the redundant reaction

Allow input of species objects to the products variable of __generateReactions since this would reduce resonanceIsomer generation

This commit recreates the degeneracy algorithm in react.py. It now can: 1. take into account atom IDs when determining degeneracy 1. keep multiple transition states for the same reaction 1. use isomorphic resonance structures when comparing reactions Max Liu created the base for this method.

Method written in react.py to ensure that given two reactants, the IDs on all of their atoms are different. This is necessary to trace atom IDs in products (after the structures are merged).

When reactions are found with the reverse template, they often have the degeneracy of the reverse template. This method ensures the reaction has the degeneracy of the forward template by using the family.calculateDegeneracy method.

reactants which are identical should have half the reaction rate when using the degeneracy method, since the translational component of their rates is already taken into account in their concentration, so them swapping places doesn't have a real effect on reaction rate. The algorithm to reduce degeneracy was placed in a separate method in the react module to be applied separately from finding degeneracies.

refactored the react module to lump together reactions based on species as opposed to molecule. This is necessary for obtaining accurate degeneracy values for reactions with multiple resonance isomers. deflating now also takes species objects

modified calculatDegeneracy to account for changes in degeneracy: * uses species objects if possible * looks at templates when multiple transition states present * calls the updated degeneracy algorithm * contains option to not reduce same reactant degeneracy (since this should only occur once).

goldmanm requested a review from mjohnson541 June 16, 2017 12:09

goldmanm self-assigned this Jun 16, 2017

mjohnson541 reviewed Jun 16, 2017

View reviewed changes

mliu49 reviewed Jun 16, 2017

View reviewed changes

mjohnson541 reviewed Jun 16, 2017

View reviewed changes

mjohnson541 reviewed Jun 19, 2017

View reviewed changes

goldmanm added the Status: Ready for Review PR is complete and ready to be reviewed label Jun 23, 2017

goldmanm mentioned this pull request Jun 23, 2017

Adapting kinetics for with multiple transition states #323

Closed

goldmanm force-pushed the degeneracy branch from 4ada618 to 24b8b96 Compare June 23, 2017 19:47

goldmanm force-pushed the degeneracy branch from 24b8b96 to 891eb1c Compare June 23, 2017 19:54

goldmanm and others added 5 commits June 23, 2017 16:00

added method to reduce atomID renaming

90cce7b

this commit checks to see that the IDs are already properly named before renaming the atom ids. It also modified the numbering of assignAtomID to assign random numbers

Add isIdentical method to Species

8941bb4

Analogous to Species.isIsomorphic() Calls Molecule.isIdentical()

add species label as smiles string if species.__str__ called

7616512

modularized adding kinetics to reactions in cerm.enlarge

26510a5

separated out the adding of kinetics step from the enlarge method, to enable easier testing of the method. Also moved the adding of kinetics attribute till after degeneracy is modified to avoid multiple counting of degeneracy.

goldmanm force-pushed the degeneracy branch 2 times, most recently from 9f345c9 to 6186300 Compare June 23, 2017 20:17

goldmanm added 8 commits June 23, 2017 16:20

remove degeneracy calculations from __generateReactions

515e493

degeneracy calculation was removed from __generateReactions since it is moving to a standalone method.

refactor reaction.isIsomorphic and allow identical checking

d63b998

separated method to compare isomorphism of multiple species to reduce code repitition. Added option to use isIdentical instead of isIsomorphic.

undo work-in-progress test for rmgTest

5d21715

This commit makes the test for multiple deterministic kinetics function. and activates it for the testing suite.

make default resonance structures isomorphic

8e358b4

changed default resonance structure generation to keep isomorphic structures since they are needed to proper degeneracy and symmetry values

Allow input of species' objects to __generateReactions(products)

f9586e1

Allow input of species objects to the products variable of __generateReactions since this would reduce resonanceIsomer generation

small type & name changes in saving training reactions

e647ca9

goldmanm force-pushed the degeneracy branch 2 times, most recently from 994c820 to e5fc727 Compare June 23, 2017 20:34

goldmanm force-pushed the degeneracy branch from e5fc727 to 996ef1e Compare June 23, 2017 20:37

goldmanm added 6 commits June 23, 2017 16:38

method to ensure species list has all different IDs

7774526

Method written in react.py to ensure that given two reactants, the IDs on all of their atoms are different. This is necessary to trace atom IDs in products (after the structures are merged).

goldmanm force-pushed the degeneracy branch from 996ef1e to 7ed8a6c Compare June 23, 2017 20:38

mliu49 merged commit e117feb into master Jun 23, 2017

mliu49 deleted the degeneracy branch June 23, 2017 20:44

goldmanm mentioned this pull request Jul 28, 2017

Only core degeneracy is accurate #1113

Closed

mliu49 mentioned this pull request Oct 18, 2017

Do not consider template differences from resonance structures as multiple TS #1217

Merged

alongd mentioned this pull request Nov 16, 2017

Removed unnecessary forbidded groups ReactionMechanismGenerator/RMG-database#223

Merged

mliu49 mentioned this pull request Jun 11, 2018

New reaction filtering and trimolecular reactions #1338

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Degeneracy & Multiple TS #1055

Degeneracy & Multiple TS #1055

goldmanm commented Jun 16, 2017

mjohnson541 commented Jun 16, 2017

mjohnson541 Jun 16, 2017

rwest Jun 16, 2017

goldmanm Jun 16, 2017

goldmanm Jun 16, 2017

rwest Jun 16, 2017

mliu49 left a comment

mliu49 Jun 16, 2017

goldmanm Jun 16, 2017

mliu49 Jun 16, 2017

goldmanm Jun 16, 2017

mliu49 Jun 16, 2017

goldmanm Jun 16, 2017

mliu49 Jun 16, 2017

goldmanm Jun 16, 2017

mliu49 Jun 16, 2017

goldmanm Jun 16, 2017

mliu49 Jun 16, 2017

goldmanm Jun 16, 2017

mjohnson541 Jun 16, 2017 •

edited

Loading

mjohnson541 Jun 16, 2017 •

edited

Loading

rwest Jun 17, 2017

mjohnson541 Jun 17, 2017

goldmanm Jun 18, 2017

goldmanm Jun 19, 2017

mjohnson541 Jun 20, 2017

mliu49 Jun 20, 2017

goldmanm Jun 23, 2017

mjohnson541 Jun 19, 2017

mliu49 Jun 19, 2017

goldmanm Jun 23, 2017

mjohnson541 Jun 19, 2017

reactants	case	if/elif method	loop method
1		1 comparison call	1 comparison call 1 list.remove call
2	best	2 comparison calls	2 comparison calls 2 list.remove calls
2	worst	4 comparison calls	3 comparison calls 2 list.remove calls
3	best	3 comparison calls	3 comparison calls 3 list.remove calls
3	worst	18 comparison calls	6 comparison calls 3 list.remove calls

Degeneracy & Multiple TS #1055

Degeneracy & Multiple TS #1055

Conversation

goldmanm commented Jun 16, 2017

mjohnson541 commented Jun 16, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mliu49 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mjohnson541 Jun 16, 2017 • edited Loading

Choose a reason for hiding this comment

mjohnson541 Jun 16, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mjohnson541 Jun 16, 2017 •

edited

Loading

mjohnson541 Jun 16, 2017 •

edited

Loading