Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement DeepDDS #53

Merged
merged 24 commits into from
Feb 3, 2022
Merged

Implement DeepDDS #53

merged 24 commits into from
Feb 3, 2022

Conversation

kkaris
Copy link
Contributor

@kkaris kkaris commented Jan 19, 2022

Closes #19

Adds the DeepDDS model implementation

  • Code passes all tests
  • Unit tests provided for these changes
  • Documentation and docstrings added for these changes

Changes

  • Add DeepDDS
  • Add new file, deepdds_examples.py containing an example

@kkaris
Copy link
Contributor Author

kkaris commented Jan 19, 2022

This branch obviously need more work, but for now I have this question:

  • There are two version of this model, one using GAT and one using GCN for the drug feature embedding. Is the idea to implement both of them in the same module and also duplicate examples, tests etc? Or do we pick one of the implementations?

Another comment: it would be nice to have the shapes/sized of the input data either available somewhere as importable constants or automatically set by the pipeline or some wrapper to the pipeline. I think @cthoyt is working on something along these lines in #52.

@benedekrozemberczki
Copy link
Contributor

What could be done is a parameter which deploys gat/gcn in the main model. That way you can test the model with various parameter settings in the tests and a single example enough. What do you think?

Copy link
Contributor

@benedekrozemberczki benedekrozemberczki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Thank you for your contribution. The molecular encoders could be parametrically deployed and that way you could switch between GCN and GAT.

:param dropout:
"""
super(DeepDDS, self).__init__()
self.cell_mlp = MLP(input_dim=context_feature_size, hidden_dims=[2048, 512, context_output_size])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be parametrized.

chemicalx/models/deepdds.py Outdated Show resolved Hide resolved
chemicalx/models/deepdds.py Show resolved Hide resolved
chemicalx/models/deepdds.py Outdated Show resolved Hide resolved
@benedekrozemberczki
Copy link
Contributor

@kkaris thanks to @cthoyt now we have the default atomic feature count for the datasets.

@cthoyt
Copy link
Contributor

cthoyt commented Jan 19, 2022

I think the reason this is broken is the PackedGraph object has a shape of torch.Size([149868, 149868, 4]) instead of being a list of batch_size graphs. I think the solution is to apply a MaxReadout() at the end (which I think is the same as max pooling, but not sure)

@benedekrozemberczki
Copy link
Contributor

Max readout pools the vectors and creates something that is batch_size X dim. It is a scattering transform - a way to think about it is a group by aggregate where nodes are grouped together by the source graph.

https://github.com/rusty1s/pytorch_scatter

The figure here is pretty great.

@cthoyt cthoyt marked this pull request as ready for review February 3, 2022 16:42
Copy link
Contributor

@benedekrozemberczki benedekrozemberczki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. That failure of the CI on the default values is weird.


from .base import UnimplementedModel
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is beautifully done.


__all__ = [
"DeepDDS",
]


class DeepDDS(UnimplementedModel):
class DeepDDS(Model):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to mention both.

):
"""Instantiate the DeepDDS model.

:param context_feature_size:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very detailed.

dataset = DrugCombDB()
model = DeepDDS(
context_feature_size=dataset.context_channels,
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

@cthoyt cthoyt changed the title Implementation of the DeepDDS model Implement DeepDDS Feb 3, 2022
@cthoyt
Copy link
Contributor

cthoyt commented Feb 3, 2022

The failure is because there are only a subset of special parameters (i.e., drug_channels, context_channels) that are allowed to not have a default. Fixed in 2a8e7ef

@benedekrozemberczki
Copy link
Contributor

@cthoyt Can I merge?

@cthoyt cthoyt merged commit 400a983 into AstraZeneca:main Feb 3, 2022
@kkaris kkaris deleted the deepdds-19 branch February 13, 2022 20:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add the DeepDDS model
3 participants