Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AlkylStructureFragmenter Implementation #10

Draft
wants to merge 103 commits into
base: production
Choose a base branch
from

Conversation

Mila1004
Copy link
Collaborator

@Mila1004 Mila1004 commented Jul 7, 2023

First steps for a complete alkyl structure fragmentation.
Currently only contains fragmentation logic for cyclic structures and non-cyclic side-chains.

Mila1004 and others added 30 commits April 20, 2023 15:54
@JonasSchaub JonasSchaub added the enhancement New feature or request label Apr 19, 2024
…onjugatedPiSystemFragmenter classes; WIP: test coverage, not clear why Build fails;
@JonasSchaub
Copy link
Collaborator

I guess the instance variables in your test class are creating problems.
You can try fixing that by adding a static initializer setting the default locale to English to your test class:

static {
        Locale.setDefault(Locale.of("en", "GB"));
    }

If I do that, I am "only" getting file not found exceptions because the test resources are not imported correctly.

Copy link

sonarcloud bot commented Apr 26, 2024

Quality Gate Failed Quality Gate failed

Failed conditions
7.6% Duplication on New Code (required ≤ 3%)

See analysis details on SonarCloud

for (IAtomContainer tmpAC : anACSet.atomContainers()) {
AtomContainerManipulator.percieveAtomTypesAndConfigureAtoms(tmpAC);
tmpAdder.addImplicitHydrogens(tmpAC);
Kekulization.kekulize(tmpAC);
Copy link
Collaborator Author

@Mila1004 Mila1004 Apr 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems the kekulization of the resulting fragments after successful fragmentation leads to an exception originating from unset implicit hydrogens, even though they should be added by the CDKHydrogenAdder before starting kekulization. The kekulization of the expected fragments does not throw any exceptions as the aromaticity is already set by directly reading from a file.

Copy link
Collaborator

@JonasSchaub JonasSchaub Apr 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apparently, you have two fragments of SMILES code "*C(*)*" in your resulting fragments set. I have looked at the detected atom types and their implicit hydrogen counts. Somehow, I don't know how, they are treated differently here. In the first structure, every R-/pseudo-atom has a hydrogen count of 0, and in the second structure, it is "null", causing the issue.But the CDKHydrogenAdder should ignore all of the pseudo atoms (see here: https://github.com/cdk/cdk/blob/01d7ab0ea4f5f0f590b42cbb2c45e32e2cf41066/base/valencycheck/src/main/java/org/openscience/cdk/tools/CDKHydrogenAdder.java#L98). An interesting bug or feature: if you use the atom-specific method, it does not check for pseudo-atoms and treats them "right", i.e. assigns an implicit hydrogen count of 0 to them. Then, your test does not throw an exception but the assertions fails.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The kekulization of the expected fragments does not throw any exceptions as the aromaticity is already set by directly reading from a file.

Partly true, kekulization of the expected fragments is successful because explicit bond orders can be assigned correctly. We can discuss the theoretical background on Friday.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have looked at the detected atom types and their implicit hydrogen counts. Somehow, I don't know how, they are treated differently here. In the first structure, every R-/pseudo-atom has a hydrogen count of 0, and in the second structure, it is "null", causing the issue.

I saw that too and was very confused by it.

An interesting bug or feature: if you use the atom-specific method, it does not check for pseudo-atoms and treats them "right", i.e. assigns an implicit hydrogen count of 0 to them.

I will test this too in the coming days.

We can discuss the theoretical background on Friday.

Gladly, I would like to know the underlying theory.

Copy link

sonarcloud bot commented Sep 5, 2024

Quality Gate Failed Quality Gate failed

Failed conditions
5.7% Duplication on New Code (required ≤ 3%)

See analysis details on SonarCloud

@JonasSchaub
Copy link
Collaborator

@Mila1004, when you have time, please merge the current production branch into your branch and adjust your code to the changes made there in the meantime.

Copy link
Collaborator

@JonasSchaub JonasSchaub left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incomplete review, will continue tomorrow.

/**
* Conjugated Pi System Fragmenter
*/
private IMoleculeFragmenter ConjPiSysF;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The two new fragmenter variables should go to the final class constants like the others.

TODO: (01|08|24): -implement spiro carbon detection + ring integrity setting
-overhaul test class (with new functionalities)
-large scale testing of fragmentation (regarding correct fragmentation and performance)
*/
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finish to dos or move them to GH issues or discussion

//</editor-fold>
//
//<editor-fold desc="Public Properties Get">
@Override
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code fold is not working for me (some others as well). It helps to add a line break between the fold declaration and the "'at'Override".

/**
* Default boolean value for determining which single ring detection method should be used.
*/
public static final boolean ALTERNATIVE_SINGLE_RING_DETECTION_SETTING_DEFAULT = false;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to discuss the setting naming

/**
* A property that has a constant boolean value defining if rings should be dissected further.
*/
private final SimpleBooleanProperty keepRingsSetting;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are all these settings/properties used now? Why the "constant value" statement?

}
}
} catch (Exception anException) {
AlkylStructureFragmenter.this.logger.log(Level.WARNING,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need to log

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And the logger is static, so there should be no "this" inbetween.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And I will stop commenting "no need to log" now.

} else {
//<editor-fold desc="RingSearch (isolated)">
//for spiro detection: set array property for each atom belonging to x rings
//--> if more than one ring --> check for connected atoms to determine if spiro config
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, the setting switches between detecting isolated rings or detecting all rings again using MCB? Does this actually make a difference?

IAtomContainer tmpIsolatedMultiBondsContainer = new AtomContainer();
IAtomContainer tmpTertQuatCarbonContainer = new AtomContainer();
IAtomContainer tmpSpiroCarbonContainer = new AtomContainer();
IAtomContainer tmpSpiroResidueContainer = new AtomContainer();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spiro residue container is unused

if (!((boolean) tmpAtom.getProperty(AlkylStructureFragmenter.INTERNAL_ASF_RING_MARKER_KEY)
|| (boolean) tmpAtom.getProperty(AlkylStructureFragmenter.INTERNAL_ASF_CONJ_PI_MARKER_KEY))) {
if (tmpAtom.getProperty(AlkylStructureFragmenter.INTERNAL_ASF_TERTIARY_CARBON_PROPERTY_KEY)) {
tmpRingFragmentationContainer.addAtom(tmpAtom);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why use the ring fragmentation container here when you have a dedicated container for tertiary and quaternary carbons?

if (!this.keepRingsSetting.get()) {
ArrayList<Integer> tmpList = tmpAtom.getProperty(AlkylStructureFragmenter.INTERNAL_ASF_RING_ATOM_LIST_KEY);
if (tmpList.size() > 1) {
tmpSpiroCarbonContainer.addAtom(tmpAtom);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, if the atom is part of multiple isolated rings, it must be a spiro carbon. But what if the mcb ring detection is used?

@JonasSchaub
Copy link
Collaborator

@Mila1004 where did you get the example molecules in the MOL and SD files from? There should be a source/reference given somewhere.

Copy link
Collaborator

@JonasSchaub JonasSchaub left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the fragmenter, it's mostly small things but the test class needs to be improved a lot.

}
ArrayList<Integer> tmpEndList = tmpEndAtom.getProperty(AlkylStructureFragmenter.INTERNAL_ASF_RING_ATOM_LIST_KEY);
if (tmpEndList.size() > 1) {
tmpIsEndSpiro = true;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see above

//check maxchainlength
switch (tmpMaxChainLengthInteger) {
default -> {
//restrictions > 1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(style) default case should be the last one in a switch

default -> {
//restrictions > 1
for (IAtomContainer tmpAtomContainer : tmpChainACSet.atomContainers()) {
tmpDissectedChainACSet.add(this.separateDisconnectedStructures(this.dissectLinearChain(tmpAtomContainer,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

separating method calls into multiple lines makes it easier to debug.

case 0 -> {
//if int maxChainLength gives 0 throw IllegalArgumentException
//not happy with how this works, preferably a gui warning would be nice
AlkylStructureFragmenter.this.logger.log(Level.WARNING, "Illegal restriction argument", new IllegalArgumentException());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be defined in the constructor where you initialise the setting properties. Check the other fragmenters.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, the default value is 0, right? Defaults should not raise exceptions but should be generally well-suited valaues!

IAtomContainer tmpReturnAC = new AtomContainer();
//starts at 1 for usability, see aLength: 1on1 translation of input to counter
int tmpCounter = 1;
Iterator<IBond> tmpBondIterator = anAC.bonds().iterator();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you really rely on the iterator starting at one end of the chain, traversing along it, and ending at the other end?

} catch (FileNotFoundException e) {
throw new RuntimeException(e);
} catch (URISyntaxException e) {
throw new RuntimeException(e);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again, no need to catch. I will not comment this down below.


//<editor-fold desc="@Test Custom Methods">
@Test
public void detectRingsWithMCBTest() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test does n your functionality but the cycle finder.


}
@Test
public void detectRingsWithRingSearchTest() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, tests of the used CDK functionalities do not belong here.

return tmpACSet;
}
//</editor-fold>

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As stated in the comment above from the last review, this test class needs more step-by-step examples with actual molecules (check the SugarRemovalUtility test).

tmpFragmentList = tmpFragmenter.fragmentMolecule(tmpOriginalMolecule);
SmilesGenerator tmpGenerator = new SmilesGenerator(SmiFlavor.Canonical);
for (IAtomContainer tmpFragment : tmpFragmentList) {
System.out.println(tmpGenerator.create(tmpFragment) + " "
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assert instead of print.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants