-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add UMAP embedding visualisation script #55
Conversation
A note for later: One idea for quantifying each encoder's clustering ability would be to report each method's Dunn Index and Silhouette value. |
LGTM so far. I'll add a property to the datamodules for mapping labels to their corresponding text where possible. |
How are devices handled? Normally the |
…ix ndarray shape bug
Great point. I completely missed that the trainer wasn't being instantiated here. No wonder my initial runs of the script were so slow. In this commit, I've added GPU support for both Overall, it looks like the |
visualise.py
script that lets one load any pre-trained encoder checkpoint, embed an entire dataset as a collection of graph embeddings, and then plot the collected embeddings using a Gaussian flavour of UMAP.python proteinworkshop/visualise.py ckpt_path=$MY_PT_CA_BB_CKPT_PATH plot_filepath=output_visualisations/pt_encoder_fold_superfamily.png dataset=fold_superfamily encoder=gcpnet features=ca_bb task=multiclass_graph_classification
, this yields the top 20 most common superfamilies being identified in the figure's legend.