Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Degeneracy & Multiple TS #1055

Merged
merged 20 commits into from
Jun 23, 2017
Merged

Degeneracy & Multiple TS #1055

merged 20 commits into from
Jun 23, 2017

Conversation

goldmanm
Copy link
Contributor

This PR solves two related problems in RMG: reaction degeneracy values that take into account resonance structures and allows multiple transition states to be stored in RMG's output file. This PR will solve underterminable kinetics problems (#494) , muptiple TS (#323, #142), and proper degeneracy values (#876, #833, #619, #552) . These two problems both involve updates to the degeneracy-finding algorithm, which is why they are in one PR. The major modification for this pull request is the order in which the generate reactions steps take place. The reactions of all resonance structures are now generated before algorithms like degeneracy or finding reverse kinetics are run. In addition, the degeneracy algorithm was redone to take into account which atoms have actually been switched (this correct the excessive degneracy from certain classes # ) and the templates that are matched (to allow multiple transition states). In addition, the adding of new reactions to CoreEdgeReactionModel also checks the family and template, and marks duplicate reactions when they exist (allowing for multiple TS and removing underterminatee kinetics problems)

  1. redefines the degeneracy variable as a private float. When degeneracy is updated, the kinetic rate (if set) is too.
  2. modularizes parts of generateReactions - finding rxn.reverse and degeneracy are separate methods
  3. runs reactions betwween all resonance structures of species before finding degeneracy
  4. utilizes reaction templates to find and keep multiple transition states (as separate reactions)
  5. uses atomIDs to identify when a reaction path has already been added to degeneracy
  6. adding proper cleanup to many test methods (including solvation database cleanup)
  7. some code cleanup and other modularization of methods.

This PR is ready for code review. Testing this could involve merging it and running a job of interest, though I think RMG-Tests is already doing that. If it is merged, PR #323 can be removed.

@goldmanm goldmanm self-assigned this Jun 16, 2017
@mjohnson541
Copy link
Contributor

Before this branch it was virtually impossible to pick up the propane low T ketohydroperoxide feedback loop without a library due to a multiple TS issue with a key chain branching reaction. I can confirm that this pull request enables RMG to now pick up this key pathway at appropriate temperatures and feasible tolerances.

atom.id = i
if start == -1:
# use entire range of integers to label atoms
atom.id = randint(-2**15,2**15)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we want to be able to generate entirely random atom ID's rather than just increment them from a set value?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...and then if they're not unique (not self.atomIDValid()) we make them sequentially increasing from a random start? (Suggesting sequential is fine and it needn't be random?). Just using sequential sounds simpler to me, unless Matt and I are both missing something.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. Generating IDs from a starting value (like previously going 1 to N) is almost certainly going to cause clashes of atom IDs after two labeled atoms are merged in generateReactions. Generating IDs over the entire range of integer values is much more likely to reduce the clashing IDs when merging two molecules (which is what random IDs seeks to accomplish). When a clash occurs, we would need to reassign IDs properly for all resonance structures (possibly remaking resonance structures which is quite time intensive)

One alternative to completely random numbers could be to label starting from -2^-15 till 2^15 and then loop back over after labeling ~65,000 atoms (not resetting after each call). If you know that this would reduce conflicts and/or computational time, let me know.

The atomIDs cannot be set post-merging since this can result in incorrect degeneracy values when trying to generate reactions when multiple resonance structures are present. Lets say you have two reactants reacting together, and each has multiple resonance structures. In generateReactions, all combinations of resonance structures will be merged and the possible reactions output. Then findDegeneracies would use the atomIDs for all the sets of products to find the degeneracy. If different resonance structures are labeled differently (which can happen when relabeling after the merging of structures), the degeneracy could be higher than expected.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can make a class variable that tracks the entire range of integers to avoid overlap, and then all the atom IDs can be added sequentially. That seems more intuitive than random numbers and probably will save computational time by not checking atomIDValid when setting the IDs. Do you think that would be preferable? Any other ideas on how to avoid conflicting IDs when merging without random Numbers?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that I understand what the IDs are doing, I like @goldmanm's last suggestion of a class variable that gradually increments.

Copy link
Contributor

@mliu49 mliu49 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only skimmed through the changes and made a few minor comments. I'm very excited to try it out.

Have you tested to see if pdep is affected at all? I ran into some issues with it when I tried changing generateReactions and deflate/inflate to work with species objects.

else:
degeneracyRatio = (new*1.0) / self._degeneracy
self.kinetics.changeRate(degeneracyRatio)
logging.debug('did not modify A factor when modifying degeneracy since the reaction rate was not set')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this line be indented less one level?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

put an else statement there in squashed commit.

@@ -423,10 +423,10 @@ cdef class PDepArrhenius(PDepKineticsModel):

"""

def __init__(self, pressures=None, arrhenius=None, highPlimit=None, Tmin=None, Tmax=None, Pmin=None, Pmax=None, comment=''):
def __init__(self, pressures=None, arrhenius=[], highPlimit=None, Tmin=None, Tmax=None, Pmin=None, Pmax=None, comment=''):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's generally undesirable to have mutable default arguments. Was there a reason for this change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was an old commit from awhile ago (which I fixed), I am not sure how it got in here. I will remove this commit from the PR.

else:
atom.id = i + start
if not self.atomIDValid():
self.assignAtomIDs(start=randint(-2**15,2**14))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Upper limit should be 2**15?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I put it lower to ensure it could still count up, but this will be replaced with a more intuitive method.


def test_degeneracy_keeps_track_of_both_rate_rules_from_resonance_isomers(self):
"""
rxns that have multiple resonance structures hitting different rate rules should
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfinished comment?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed comment in squashed commit.

elif isinstance(products[0],Species):
products = [product.molecule for product in products]
else:
raise TypeError('products input to __generateReactions must be Species or Reaction Objects')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this say Species or Molecule Objects?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup! Fixed in squashed commit.

sameTemplate = False
for rxn in rxnList1:
isomorphic = rxn0.isIsomorphic(rxn,checkIdentical=False)
identical = rxn0.isIsomorphic(rxn,checkIdentical=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can check identical only if isomorphic is true or check isomorphic only if identical is false. Either way should reduce the number of calls made.

  • If two reactions are identical, then they are isomorphic.
  • If two reactions are not isomorphic, then they are not identical.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great idea! This would be an awesome speedup. Added this in a squash commit (for the second case under the assumption that not isomorphic is more common than is identical)

elif comparison_method(list1[0], list2[1]) \
and comparison_method(list1[1], list2[0]):
return True
elif len(list1) == len(list2) == 3:
Copy link
Contributor

@mjohnson541 mjohnson541 Jun 16, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it be much faster and shorter to iterate through the lists and pop out entries that are the same from both lists as you go. I believe the below code would reduce the maximum calls to comparison_method to 6 rather than the current 18.

list1Temp = []
list2Temp = []
indexList = []
k = 0
while list1 != []:
   entry = list1[k]
   for i,entry2 in enumerate(list2):
       if comparison_method(entry,entry2):
           temp = list1.pop(k)
           list1temp.append(temp)
           temp = list2.pop(i)
           list2temp.append(temp)
           indexList.append(i)
           break
   else: #loop doesn't break, something is missing, templist2 is incomplete
         for j in xrange(0,len(list2)):
             if not j in indexList:
                indexList.append(j)
         list1 = list1Temp+list1
         list2 = list2Temp+list2
         list2 = [list2[i] for i in indexList]
         return False
   k += 1
#loop didn't break, list1 is fine, resort list2 so it is unchanged
list2 = [list2Temp[i] for i in indexList]
list1 = list1Temp
return True

Copy link
Contributor

@mjohnson541 mjohnson541 Jun 16, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or wait you don't even need to avoid copying the entries as long as you don't deepcopy them do you?

list1Temp = list1[:]
list2Temp = list2[:]
for entry in list1Temp:
   for entry2 in list2Temp:
       if comparison_method(entry,entry2):
           list2Temp.remove(entry2)
           break
   else: #loop doesn't break, something is missing
         return False
#all loops completed
return True

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something must be wrong with this last example. Although I'm not 100% sure what it's trying to achieve, will the while loop ever end? You never remove anything fromlist1Temp. In any case, the alternative is not obviously faster, with lots of list manipulation, creation, popping, etc. I have some thoughts on probable efficiency gains, but before sharing them can anyone tell me if this matters? Does it get called a lot? (sorry for not being aware of the context - the discussion caught my eye).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, yes, sorry lost that in translation, I think you could actually just do "for entry in list1Temp" since you don't need k anymore. I thought this would be faster under the assumption that most of our time is spent in the isomorphism checks this avoids (maximum for 2-2 lists drops from 4 to 3, maximum for 3-3 lists drops from 18 to 6). Is that not the case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rwest, this method is called every time Reaction isomorphism is checked, which is done in degeneracy calculations, adding reactions to the edge, etc. I think it is relatively commonly called (though this is somewhat subjective without a profiling comparison). This method modularized what already existed in reaction.isIsomorphic, though it would be awesome to get it working faster too.

@KEHANG, Is there any listing of previous profiling to see how important the Reaction.isIsomorphic method is?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this notebook profiles the recommendations to isisomorphic by @mjohnson541 for H_abstraction reactions. It seems to be faster most of the time by about 10%, though sometimes it is slower (which could be due to other processes demanding resources). There should likely be more benefit if we are looking at 3x3 reactions. Feel free to play around with the notebook, and let me know if you'd like the suggested algorithm implemented here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could also avoid defining the function by passing the function pointer instead

if checkIdentical:
    comparison_method = molecule.isIdentical
else:
    comparison_method = molecule.isIsomorphic

I'm really not sure how significant the time savings would be, but the above timings suggest we do have some noticeable overhead.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the new approach, but I don't think the time savings would be as significant as you would expect. See the table below for a comparison of the number of function calls. Note that list.remove performs an isomorphism comparison with each item in the list until it finds the one it needs to remove. Using enumerate along with del instead of list.remove might help a little bit.

reactants case if/elif method loop method
1 1 comparison call 1 comparison call
1 list.remove call
2 best 2 comparison calls 2 comparison calls
2 list.remove calls
2 worst 4 comparison calls 3 comparison calls
2 list.remove calls
3 best 3 comparison calls 3 comparison calls
3 list.remove calls
3 worst 18 comparison calls 6 comparison calls
3 list.remove calls

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

implemented the updated deletion algorithm, in the ipynb and still no improvement in efficiency :(

"""
# These duplicates should be combined and the reaction degeneracy should be increased
# Reaction degeneracy is only increased if two reaction are isomorphic but resulted
# from different transition states.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if I came across this comment randomly in the code without reading the function I would have thought the above comments implied that multiple TS's were simply treated as degeneracies for purposes of this function especially considering the docstring doesn't mention multiple TS's being added as duplicate reactions. Can you make that a bit more clear?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was my comment from earlier changes, before we decided to tackle multiple transition states. In this comment, different transition states refers to ones that are the effectively identical but involving a different set of atoms, which i guess you could also call degenerate transition states.

It should probably be changed now that we're also dealing with actually different transition states.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated in the most recent commit.

from rmgpy.reaction import ReactionError

for rxn in reactionList:
if _isomorphicSpeciesList(rxn.reactants, reactants):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason we can't add some boolean property to the reaction object during rate estimation (such as "fromReverse") so we already know if it's the reverse template or not and we don't have to do all of these isomorphism checks?

to allow independent testing of various unittests, the database
must be removed after each testing set, to avoid multiple modular level
variables from existing. Solvation parameters were also cleared  theend
of modules using solvation

refactored rmgTest and modelTest  to reduce database generation
goldmanm and others added 5 commits June 23, 2017 16:00
this commit checks to see that the IDs are already properly named
before renaming the atom ids. It also modified the numbering of
assignAtomID to assign random numbers
Analogous to Species.isIsomorphic()
Calls Molecule.isIdentical()
separated out the adding of kinetics step from the enlarge method,
to enable easier testing of the method. Also moved the adding of
kinetics attribute till after degeneracy is modified to avoid multiple
counting of degeneracy.
modified the protocol for setting degeneracy to get RMG to update
the kinetics rate constant to keep track of the degeneracy and
reaction rates. Also moved setting degeneracy in copy methods to before
setting kinetics as this would double-count degeneracy.
@goldmanm goldmanm force-pushed the degeneracy branch 2 times, most recently from 9f345c9 to 6186300 Compare June 23, 2017 20:17
degeneracy calculation was removed from __generateReactions since it is
moving to a standalone method.
separated method to compare isomorphism of multiple species to reduce
code repitition. Added option to use isIdentical instead of
isIsomorphic.
This commit makes the test for multiple deterministic kinetics function.
and activates it for the testing suite.
changed default resonance structure generation to keep isomorphic
structures since they are needed to proper degeneracy and symmetry
values
separated setting reverse reactions for ownReverse family, so it
can be called after degeneracy has been found (reducing work load),
and for easier testing (+ wrote a test method for this reaction).

this required adding H_Abstraction to kineticsTest and removing
forbidden structures from kinetics test so all reactions can be found.

added tests that test that findDegeneracies returns multiple transition
states and that it is able to deal with aromatic structures
Modified the method to remove reactions with the same template and
family, including reverse direction for ownReverse families. It should
keep reactions with multiple transition states (unless an identical
template was found).

Also wrote three test methods to test for keeping separate transition
states or removing the redundant reaction
Allow input of species objects to the products variable of
__generateReactions since this would reduce resonanceIsomer generation
@goldmanm goldmanm force-pushed the degeneracy branch 2 times, most recently from 994c820 to e5fc727 Compare June 23, 2017 20:34
This commit recreates the degeneracy algorithm in react.py. It now can:

1. take into account atom IDs when determining degeneracy
1. keep multiple transition states for the same reaction
1. use isomorphic resonance structures when comparing reactions

Max Liu created the base for this method.
Method written in react.py to ensure that given two reactants, the IDs
on all of their atoms are different. This is necessary to trace atom IDs
in products (after the structures are merged).
When reactions are found with the reverse template, they often have the
degeneracy of the reverse template. This method ensures the reaction has
the degeneracy of the forward template by using the
family.calculateDegeneracy method.
reactants which are identical should have half the reaction rate
when using the degeneracy method, since the translational component
of their rates is already taken into account in their concentration,
so them swapping places doesn't have a real effect on reaction
rate.

The algorithm to reduce degeneracy was placed in a separate method in
the react module to be applied separately from finding degeneracies.
refactored the react module to lump together reactions based on species
as opposed to molecule. This is necessary for obtaining accurate
degeneracy values for reactions with multiple resonance isomers.

deflating now also takes species objects
modified calculatDegeneracy to account for changes in degeneracy:

* uses species objects if possible
* looks at templates when multiple transition states present
* calls the updated degeneracy algorithm
* contains option to not reduce same reactant degeneracy (since this
should only occur once).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Ready for Review PR is complete and ready to be reviewed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants