Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NormalizingFlow #649

Closed

Conversation

RasmusOrsoe
Copy link
Collaborator

@RasmusOrsoe RasmusOrsoe commented Dec 20, 2023

This PR adds compatibility for normalizing flows to graphnet. Specifically, this PR does the following:

  1. Adds NormalizingFlow in graphnet.models.flows as a new generic model class which all normalizing flows are intended to inherit from. These models are assumed to output a single tensor containing the transformed coordinates and the jacobian associated with the transformation. For book-keeping reasons, the column indices that identifies which columns are transformed coordinates and jacobians, these models have properties coordinate_columns and jacobian_columns. These properties are used to split the tensor for loss computation (see diagram).
  2. Adds a normalizing flow model called INGA which is similar to what is presented in https://arxiv.org/abs/1906.04032 . It relies on splines and the spline functions are currently put in graphnet.models.flows.spline_blocks and can be used both inside and outside the context of NormalizingFlows (although likely most relevant there).
  3. Adds an example of training INGA under examples/04_training/05_train_normalizing_flow.py. Here the syntax of training the normalizing flow is shown. The example trains INGA to learn a mapping from the rather complicated distribution of the pulse attributes (xyz, time, charge, etc) into a multivariate gaussian distribution.
  4. Introduces slight changes to the inheritance structure of our loss functions: the generic base-class LossFunction no longer has an abstract _forward method; this is left for implementation of further problem specific subclasses. This is introduced here because not all machine learning paradigms follow our usual prediction, target argument structure; which is indeed the case for normalizing flows.
  5. Created a new LossFunction subclass called StandardLossFunction which all of our current loss functions now inherit from. These loss functions all follow the prediction, target argument structure.
  6. Created a new LossFunction subclass FlowLossFunction which follow the prediction, jacobian argument structure.
  7. Created MultivariateGaussianFlowLoss, subclassed of FlowLossFunction, which calculates the nllh of a multivariate Gaussian.
  8. Slight changes to StandardModel that ties everything together. While working on this PR i noticed a few list comprehensions were hard to read; so I took the liberty of writing them out explicitly and adding a few comments here and there.

A diagram showing the new inheritance structure for loss functions is shown here:

subclasses

A diagram showing the overall data flow and usage of the column indices for normalizing flows can be seen here:

inga drawio

After Christmas, I'll make a new PR that adds the ability use NormalizingFlows in a generative sense and an example of how this is done. I decided not to include it here to avoid making this PR larger than it already is.

Copy link
Collaborator

@Aske-Rosted Aske-Rosted left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have now taken a look at the paper and the code. I can not, with my somewhat limited understanding of the method find any faults with the code.

I have a question regarding coordinate_columns and jacobian_columns. The way i read the code these have to be determined by the user when defining the task. How is a user to determine these values are they partly or fully determined by the dimensions of the input data or is this a hyper-parameter of the network?

@RasmusOrsoe
Copy link
Collaborator Author

I have now taken a look at the paper and the code. I can not, with my somewhat limited understanding of the method find any faults with the code.

I have a question regarding coordinate_columns and jacobian_columns. The way i read the code these have to be determined by the user when defining the task. How is a user to determine these values are they partly or fully determined by the dimensions of the input data or is this a hyper-parameter of the network?

Hey @Aske-Rosted thanks for taking a look. Let me clarify the role of these two sets of indices. A model architecture, such as DynEdge or INGA has to output a single tensor in GraphNeT. This tensor is passed directly to the Task. Because NormalizingFlows output both a transformed sample y and the jacobian associated with the transformation, jacobian, the output of NormalizingFlow is defined as output = torch.concat([y, jacobian],dim = 1).

When output is passed to Task, it needs to be split back into it's y and jacobian components in order to evaluate the loss function. These two sets of indices allows you to split the tensor correctly in the Task.

The two sets of indices are defined by the NormalizingFlow, by construction, a NormalizingFlow in GraphNeT has these two indices as class attributes that you can expect to exist. In the training example for INGA, you can see them being used directly when defining the task:

    # Building model
    flow = INGA(
        nb_inputs=graph_definition.nb_outputs,
        n_knots=120,
        num_blocks=4,
        b=100,
        c=100,
    )
    task = StandardFlowTask(
        target_labels=graph_definition.output_feature_names,
        loss_function=MultivariateGaussianFlowLoss(),
        coordinate_columns=flow.coordinate_columns,
        jacobian_columns=flow.jacobian_columns,
    )

Let me know if this clears things up.

@RasmusOrsoe
Copy link
Collaborator Author

@AMHermansen @AMHermansen, @Aske-Rosted mentioned at our last dev meeting that he would prefer a third pair of eyes to take a look at this PR. If at all possible, could you set aside some time to give this PR a read?

Things to look for include:

  • Are there parts of the code that you'd like to see have more comments/ better doc strings?
  • Do you agree with the overall implementation strategy?
  • Are there unit tests you'd like to see added?
  • Obvious oversights on my part: grammatical errors, unclear variable names, etc.

I'm happy to explain any part of this PR in greater detail, either here or in a dedicated zoom call.

Copy link
Collaborator

@ArturoLlorente ArturoLlorente left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went throught the paper and the implementation seems nice!
I still have some doubts that maybe you can explain me in person about the paper itself.


from pytorch_lightning.loggers import WandbLogger
import numpy as np
import pandas as pd
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WandbLogger, numpy and pandas imports are not used.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

knot_x = self._transform_to_internal_coordinates(knot_x_bins)
knot_y = self._transform_to_internal_coordinates(knot_y_bins)
knot_x = torch.nn.functional.pad(knot_x, (1, 0))
knot_x[:, 0] = -self.b - self._eps
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the purpose of self._eps?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have removed self._eps. In a former version of the code, this was a parameter i used to debug the code. Thank you for spotting it!

k = self._find_spline_idx(knot_y, y)
assert (
max(k) + 1 <= knot_y.shape[1]
), f"""{knot_y.shape} vs. {max(k) + 1}"""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a more descriptive error here would help.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assert statement has been removed; the one above it effectively checks the same. Thanks for spotting it!

@@ -33,10 +33,11 @@ def __init__(self, **kwargs: Any) -> None:
@final
def forward( # type: ignore[override]
self,
prediction: Tensor,
predictions: Tensor,
target: Tensor,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here set the target: Tensor = None then it is not needed to specify the value None in the Task

@RasmusOrsoe
Copy link
Collaborator Author

@ArturoLlorente @Aske-Rosted thank you for the comments! I have now updated the PR to reflect your feedback.

@ArturoLlorente
Copy link
Collaborator

ArturoLlorente commented Mar 8, 2024

There are some CodeClimate issues that I think are easy to solve (mainly sintax ones).
Rather than that, the PR looks good in my eyes.
Nice implementation!

Copy link
Collaborator

@Aske-Rosted Aske-Rosted left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good to me. Maybe check that this is up to date with the main branch before merging. (I think there is a limit to how long we should wait for a third pair of eyes which was just a precaution)

@RasmusOrsoe
Copy link
Collaborator Author

Closing this in favor of #728.

@RasmusOrsoe RasmusOrsoe closed this Aug 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants