Refactor model graph and allow suppressing dim lengths #7392

wd60622 · 2024-06-26T08:51:08Z

Description

Pulled any graph related information into two methods:

get_plates: Get plate meta information and the nodes that are associated with each plate
edges: Edges between nodes as a list[tuple[VarName, VarName]]

The get_plates methods returns a list of Plate objects which store all the variable information. That data include:

DimInfo with stores the dim names and lengths
NodeInfo which stores the model variable and it's NodeType in the graph (introduced in Allow customizing style of model_graph nodes #7302)
Plate which is a collection of the DimInfo and list[NodeInfo]

With list[tuple[VarName, VarName]] and list[Plate], a user can now make use of the exposed make_graph and make_networkx functions to create customized graphviz or networkx graphs.

The previous behavior of model_to_graphviz and model_to_networkx is still maintained. However, there is a new include_dim_lengths parameter that can be used to include the dim lengths in the plate labels.

The previous issue #6335 behavior has changed to now include all the variables on a plate with dlen instead of var_name_dim{d}. (See examples below)

Related Issue

Closes ENH: Customizable plate labels #7319
Related to

Checklist

Checked that the pre-commit linting/style checks pass
Included tests that prove the fix is effective or that the new feature works
Added necessary documentation (docstrings and/or example notebooks)
If you are a pro: each commit corresponds to a relevant logical change

Type of change

📚 Documentation preview 📚: https://pymc--7392.org.readthedocs.build/en/7392/

pymc/model_graph.py

ricardoV94 · 2024-06-26T09:00:37Z

pymc/model_graph.py

+            # parents is a set of rv names that precede child rv nodes
+            for parent in parents:
+                yield child.replace(":", "&"), parent.replace(":", "&")
+
    def make_graph(


Should make_graph and make_networkx now be functions that take plates and edges as inputs?

That would just remove calling get_plates and edges methods. Don't have much of a preference

It would make it more modular, in that if you find a way to create your own plates and edges, you can just pass it to the functions that then display it?

Yeah, sure. I think that makes sense then.

The dictionary of {PlateMeta : set[NodeMeta]} is a bit weird and hard to work with. i.e. set is not subscritable and looking up by PlateMeta key is a bit tricky.

I was thinking of having another object, Plate which would be:

@dataclass class Plate: plate_meta: PlateMeta nodes: list[NodeMeta]

and that would be in the input to make_graph and make_networkx instead. Making the signature: (plates: list[Plate], edges: list[tuple[str, str]], ...)

Also, does it make sense as a method still? Do you see model_to_graphviz taking this input as well?

Lost track of the specific methods we're discussing. My low resolution guess was that once we have the plates / edges we can just pass them to a function that uses those to render graphviz or networkx graphs. Let me know if you were asking about something else or see a problem (or no point) with that

Sounds good. Let me push something up and you can give feedback

Just pushed.
If user has arbitrary list[Plate] and list[tuple[VarName, VarName]] then they can use make_graph or make_networkx in order to make the graphviz or networkx, respectively.
pm.model_to_graphviz and pm.model_to_networkx are still wrappers.
ModelGraph class can be used to create the plates and edges in the previous manner if desired with the get_plates and edges methods

pymc/model_graph.py

wd60622 · 2024-06-26T10:22:26Z

pymc/model_graph.py

+            # parents is a set of rv names that precede child rv nodes
+            for parent in parents:
+                yield child.replace(":", "&"), parent.replace(":", "&")
+
    def make_graph(


That would just remove calling get_plates and edges methods. Don't have much of a preference

wd60622 · 2024-06-26T10:24:51Z

pymc/model_graph.py

                # must be preceded by 'cluster' to get a box around it
+                plate_label = create_plate_label(plate_meta, include_size=include_shape_size)


Noticing that the plate_label actually depends on var_name in case of previous "{var_name}_dim{d}". However, the plate_label is required before the looping of all_var_names. i.e. all_var_names is assumed to be one element? Maybe that should be an explicit case?

Didn't manage to follow, can you explain again?

The graph.subgraph name of "cluster" + plate_label is dependent on the var_name which used to be constructed in the get_plates method (the previous keys of dictionary where the plate_label).

However, after the subgraph is constructed, the all_var_names is looped over. This is assuming that all_var_names is only one element since the plate_label is used in the subgraph name.

ricardoV94 · 2024-06-27T09:26:16Z

pymc/model_graph.py

+        for plate in self.get_plates(var_names):
+            plate_meta = plate.meta
+            all_vars = plate.variables
+            if plate_meta.names or plate_meta.sizes:


Can we simplify? Could plate_meta be None for the scalar variables?

This logic would still be needed somewhere. Likely in get_plates then.
How about having the __bool__ method for Plate class that does this logic.
Then would act like None and read like:

if plate_meta: # Truthy if sizes or names # plate_meta has sizes or names that are not empty tuples

You have that information when you defined the plate.meta no? Can't you do it immediately?

Yes. I changed to have it happen in the get_plates methods. Scalars will have Plate(meta=None, variables=[...])

I think it's enough to check for sizes? It is not possible for a plate to have names, but not sizes?

We should rename those to dim_names, and dim_lengths. And perhaps use None for dim_lengths for which we don't know the name?

So IIUC, scalars should belong to a "Plate" with dim_names = (), and dim_lengths = ()?

And now I understand your approach and I think it was better like you did. The __bool__ sounds fine as well!

Sorry I got confused by the names of the things

ricardoV94 · 2024-06-27T10:31:47Z

pymc/model_graph.py

@@ -49,6 +49,9 @@ class PlateMeta:
    def __hash__(self):
        return hash((self.names, self.sizes))

+    def __bool__(self) -> bool:
+        return len(self.sizes) > 0 or len(self.names) > 0


What is a plate without names?

with pm.Model(): pm.Normal("y", shape=3)

y has sizes but no dim names.
Currently creates Plate(meta=PlateMeta(names=(), sizes=(5, )), variables=[NodeMeta(var=y, node_type=...)])

Think there should be some cases to test now that this logic is exposed. Will be much easier to confirm

names here are dim names

What happens for a Deterministic with dims=("test_dim", None)? Apparently we still allow None dims for things that are not RVs

That y should be names=(None,) ?

I'm thinking of pm.Deterministic("x", np.zeros((3, 3)), dims=("hello", None)) and pm.Deterministic("y", np.zeros((3, 3)), dims=(None, "hello"). We don't want to put those in the same plate because dims can't be repeated, so they are definitely different things?

Can we add a test for that?

Added a test. I had to wrap the data in as_tensor_variable or I'd get an error saying the data needs name attribute

pymc/model_graph.py

ricardoV94 · 2024-06-27T10:35:51Z

pymc/model_graph.py

+                plate_meta = PlateMeta(
+                    names=tuple(names),
+                    sizes=tuple(sizes),
+                )


I don't understand this tbh. Are we creating one plate per variable? But a plate can contain multiple variables?

Also names is ambiguous, it is dim_names? We should name it like that to distinguish from var_names?
Also sizes -> dim_lengths

Ah plates are hashable... so you mutate the same thing...

Just working with what was there. The historical { str: set[VarName] } is created with loop which I changed to { PlateMeta : set[NodeMeta] }
But switched to list[Plate] ultimately.
Ideally, there could be more straight-foward path to list[Plate]

This logic feels rather convoluted to be honest. Maybe we can take a step back and see what is actually needed.
Step1: Collect the dim names and dim lengths of every variable we want to plot. This seems simple enough, and we can do in a loop
Step2: Merge variables that have identical dim_names and dim_lengths into "plates". The hashable Plate thing may be a good trick to achieve that, or just a defaultdict with keys: tuple[dim_names, dim_lengths]

Would the code be more readable if we didn't try to do both things at once?

Edit: Updated comment above

defaultdict with dims_names and dim_lengths is same as what is currently happening. But there is a wrapper class around it. Personally, I find the class helpful and more user friendly. But I could be wrong

For instance,

Plate( DimInfo(names=("obs", "covariate"), sizes=(10, 5)), variables=[ NodeInfo(X, node_type=DATA), NodeInfo(X_transform, node_type=DETERMINISTIC), NodeInfo(tvp, node_type=FREE_RV), ] )

over

(("obs", "covariate"), (10, 5), (X, X_transform, tvp), (DATA, DETERMINSTIC, FREE_RV))

lines up a bit better in my mind that the first two are related objects and the last two are related objects as well

Sure, just thinking of how easy we make it for users to define their custom stuff. Either way seems manageable

wd60622 · 2024-06-27T10:47:07Z

The name Plate and PlateMeta come from the historical get_plates method of ModelGraph. However, get_plates also get scalars which were "" before and now Plate(meta=None, variables=[...])

Is Plate still a good name? It is a collection of variables all with the same dims. Plate in my mind is Bayesian graphical model and might deviate with the scalars.
PlateMeta might be more suited as DimsMeta since the names and sizes are the dims of the variables

Any thoughts here on terminology?

ricardoV94 · 2024-06-27T10:49:11Z

I'm okay with Plate or Cluster. Why the Meta in it?

pymc/model_graph.py

wd60622 · 2024-06-27T11:01:16Z

I'm okay with Plate or Cluster. Why the Meta in it?

Meta would be information about the variables / plate to construct a plate label. Previously it was always " x ".join([f"{dname} ({dlen})" for ...]
Meta just provides the parts to construct based on components presented before

ricardoV94 · 2024-06-27T11:06:48Z

I don't love the word meta, it's too abstract. Plate.dim_names, Plate.dim_lengths, Plate.vars? or Plate.var_names if that's what we are storing

wd60622 · 2024-06-27T11:13:08Z

I don't love the word meta, it's too abstract. Plate.dim_names, Plate.dim_lengths, Plate.vars? or Plate.var_names if that's what we are storing

I think itd be nice to keep the names and sizes together since they are related. How about DimInfo

ricardoV94 · 2024-06-27T11:16:26Z

Is the question whether we represent a data structure that looks like (in terms of access): ((dims_names, dim_lengths), var_names) vs (dim_names, dim_lengths, var_names)? Seems like a tiny detail. I have a slight preference for having it flat but up to you

ricardoV94 · 2024-06-27T11:19:53Z

This PR refreshed my mind that #6485 and #7048 exist.

To summarize: We can have variables that have entries in named_vars_to_dims of type tuple[str | None, ...]. We can also have variables that don't show up in named_vars_to_dims at all? Which is odd, since we already allow None to represent unknown dims, so all variables could conceivable have entries (or we would not allow None).

Then dims can have coords or not, but always have dim_lengths, which always work when we do the fast_eval for dim_lengths, so that's not a problem that shows up here. I think that doesn't matter here for us. Just mentioning in case I brought it up by mistake in my comments.

ricardoV94 · 2024-06-27T11:28:32Z

pymc/model_graph.py

+            plate_label = create_plate_label(
+                plate.variables[0].var.name,
+                plate.meta,
+                include_size=include_shape_size,
+            )


Should create_plate_label now take plate_formatters that among other things decides on whether to include_size?

Yeah, I think that is fair. Where do you view that being exposed?

I exposed the create_plate_label in both make_graph and make_networkx. However, left it out in the model_to_graphviz and model_to_networkx functions.

If a user defines Callable[[DimInfo], str] function, then that can be used in the first two, more general functions

pymc/model_graph.py

wd60622 · 2024-06-28T08:11:52Z

Is the question whether we represent a data structure that looks like (in terms of access): ((dims_names, dim_lengths), var_names) vs (dim_names, dim_lengths, var_names)? Seems like a tiny detail. I have a slight preference for having it flat but up to you

There is also the NodeType which is why I went for the small dataclass wrapper that contains TensorVariable and the preprocessed label. I think have a small data structure isn't the end of the world but also helps structure the problem a bit more. The user can clearly see what is part of the new data structures in my mind

tests/test_model_graph.py

pymc/model_graph.py

tests/test_model_graph.py

wd60622 · 2024-06-28T10:17:50Z

Need to

cover the {var_name}_dim{d} case still BUG: Deterministic variables with dims containing None break model_to_graphviz #6335. Unless the naming should changed?
Fix the previous tests

The 6335 comes up with this example:

# Current main branch
coords = {
    "obs": range(5),
}
with pm.Model(coords=coords) as model:
    data = pt.as_tensor_variable(
        np.ones((5, 3)),
        name="data",
    )
    pm.Deterministic("C", data, dims=("obs", None))
    pm.Deterministic("D", data, dims=("obs", None))
    pm.Deterministic("E", data, dims=("obs", None))

pm.model_to_graphviz(model)

Result:

Which makes sense that they will not be on the same plate, right?

wd60622 · 2024-06-28T10:24:12Z

I did just catch this bug: It comes from the make_compute_graph which causes a self loop

from pymc.model_graph import ModelGraph

coords = {
    "obs": range(5),
}
with pm.Model(coords=coords) as model:
    data = pt.as_tensor_variable(
        np.ones((5, 3)),
        name="C",
    )
    pm.Deterministic("C", data, dims=("obs", None))

error_compute_graph = ModelGraph(model).make_compute_graph() # defaultdict(set, {"C": {"C"}})
# Visualize error:
pm.model_to_graphviz(model)

Result:

Shall I make a separate issue?

ricardoV94 · 2024-06-28T10:51:39Z

I think they should be in the same plate, because in the absense of dims, the shape is used to cluster RVs?

ricardoV94 · 2024-06-28T10:52:09Z

Self loop is beautiful :)

wd60622 · 2024-06-29T06:09:25Z

I think they should be in the same plate, because in the absense of dims, the shape is used to cluster RVs?

How should the {var_name}_dim{d} be handled then to put them on the same plate?

Just "dim{d} ({dlen})"?

ricardoV94 · 2024-06-29T06:11:25Z

Just the length? how does a plate without any dims look like?

I imagine the mix would be 50 x trial(30) or however the trial dim is usually displayed.

WDYT?

wd60622 · 2024-06-30T07:06:17Z

Just the length? how does a plate without any dims look like?

I imagine the mix would be 50 x trial(30) or however the trial dim is usually displayed.

WDYT?

This mixing of dlen and "{dname} ({dlen})" is what I had in mind. That is the current behavior.

Here are some examples:

import numpy as np
import pymc as pm
import pytensor.tensor as pt

coords = {
    "obs": range(5),
}
with pm.Model(coords=coords) as model:
    data = pt.as_tensor_variable(
        np.ones((5, 3)),
        name="data",
    )
    C = pm.Deterministic("C", data, dims=("obs", None))
    D = pm.Deterministic("D", data, dims=("obs", None))
    E = pm.Deterministic("E", data, dims=("obs", None))

pm.model_to_graphviz(model)

# Same as above
pm.model_to_graphviz(model, include_dim_lengths=False)

And larger example with various items:

import numpy as np
import pymc as pm
import pytensor.tensor as pt

coords = {
    "obs": range(5),
    "covariates": ["X1", "X2", "X3"],
}
with pm.Model(coords=coords) as model: 
    data1 = pt.as_tensor_variable(
        np.ones((5, 3)),
        name="data1",
    )
    data2 = pt.as_tensor_variable(
        np.ones((5, 3)),
        name="data2",
    )
    C = pm.Deterministic("C", data1, dims=("obs", None))
    CT = pm.Deterministic("CT", C.T, dims=(None, "obs"))
    D = pm.Deterministic("D", C @ CT, dims=("obs", "obs"))

    E = pm.Deterministic("E", data2, dims=("obs", None))
    beta = pm.Normal("beta", dims="covariates")
    pm.Deterministic("product", E[:, None, :] * beta[:, None], dims=("obs", None, "covariates"))

pm.model_to_graphviz(model)

pymc/model_graph.py

codecov · 2024-07-03T12:24:10Z

Codecov Report

Attention: Patch coverage is 76.66667% with 28 lines in your changes missing coverage. Please review.

Project coverage is 92.18%. Comparing base (7af0a87) to head (e30f6d9).
Report is 19 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #7392      +/-   ##
==========================================
- Coverage   92.19%   92.18%   -0.01%     
==========================================
  Files         103      103              
  Lines       17214    17249      +35     
==========================================
+ Hits        15870    15901      +31     
- Misses       1344     1348       +4

Files	Coverage Δ
pymc/model_graph.py	`87.25% <76.66%> (-0.13%)`	⬇️

... and 5 files with indirect coverage changes

ricardoV94 · 2024-07-03T12:27:02Z

Thanks @wd60622

add PlateMeta and NodeMeta

be7a7f5

ricardoV94 reviewed Jun 26, 2024

View reviewed changes

pymc/model_graph.py Outdated Show resolved Hide resolved

remove dim info and add kwargs

f96502f

ricardoV94 reviewed Jun 26, 2024

View reviewed changes

wd60622 commented Jun 26, 2024

View reviewed changes

wd60622 added 2 commits June 27, 2024 08:25

wrap each plate in single class

962eab8

no plate for scalars

38428de

ricardoV94 reviewed Jun 27, 2024

View reviewed changes

pull out methods into functions

f492a03

ricardoV94 reviewed Jun 27, 2024

View reviewed changes

wd60622 commented Jun 27, 2024

View reviewed changes

pymc/model_graph.py Show resolved Hide resolved

ricardoV94 reviewed Jun 27, 2024

View reviewed changes

pymc/model_graph.py Outdated Show resolved Hide resolved

ricardoV94 mentioned this pull request Jun 27, 2024

Reasses dims without coords #7048

Open

ricardoV94 reviewed Jun 27, 2024

View reviewed changes

pymc/model_graph.py Outdated Show resolved Hide resolved

wd60622 added 3 commits June 28, 2024 09:42

change name and loop over at begining

d1b5390

test none dim

6667557

rename away from meta

559dc42

test for scalar case

aec7ae5

ricardoV94 reviewed Jun 28, 2024

View reviewed changes

tests/test_model_graph.py Outdated Show resolved Hide resolved

ricardoV94 reviewed Jun 28, 2024

View reviewed changes

pymc/model_graph.py Outdated Show resolved Hide resolved

ricardoV94 reviewed Jun 28, 2024

View reviewed changes

tests/test_model_graph.py Show resolved Hide resolved

wd60622 added 3 commits June 28, 2024 12:03

dim info with empty tuples

6d8b2ee

test square None dim case and change scalar expected

382a573

change sizes to lengths

a2e9e60

remove var_name parameter

2411da0

workthrough mypy

633a8cc

wd60622 mentioned this pull request Jul 1, 2024

BUG: make_compute_graph creates self loop #7397

Open

ricardoV94 reviewed Jul 3, 2024

View reviewed changes

pymc/model_graph.py Show resolved Hide resolved

ricardoV94 reviewed Jul 3, 2024

View reviewed changes

pymc/model_graph.py Outdated Show resolved Hide resolved

pymc/model_graph.py Outdated Show resolved Hide resolved

pymc/model_graph.py Outdated Show resolved Hide resolved

ricardoV94 marked this pull request as ready for review July 3, 2024 10:48

wd60622 added 3 commits July 3, 2024 12:49

use inline in generator

950409d

adjust previous tests

b9bcf92

get rid of protocol

e30f6d9

ricardoV94 approved these changes Jul 3, 2024

View reviewed changes

ricardoV94 merged commit f719796 into pymc-devs:main Jul 3, 2024
22 checks passed

ricardoV94 changed the title ~~Abstract Graph Iteration~~ Refactor model graph and allow suppressing dim lengths Jul 3, 2024

ricardoV94 added the maintenance label Jul 3, 2024

wd60622 deleted the abstract-graph-iteration branch July 3, 2024 12:29

		# must be preceded by 'cluster' to get a box around it
		plate_label = create_plate_label(plate_meta, include_size=include_shape_size)

Refactor model graph and allow suppressing dim lengths #7392

Refactor model graph and allow suppressing dim lengths #7392

Conversation

wd60622 commented Jun 26, 2024 • edited Loading

Description

Related Issue

Checklist

Type of change

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wd60622 Jun 27, 2024 • edited Loading

Choose a reason for hiding this comment

ricardoV94 Jun 27, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ricardoV94 Jun 27, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ricardoV94 Jun 27, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ricardoV94 Jun 27, 2024 • edited Loading

Choose a reason for hiding this comment

ricardoV94 Jun 27, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ricardoV94 Jun 27, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wd60622 commented Jun 27, 2024

ricardoV94 commented Jun 27, 2024

wd60622 commented Jun 27, 2024 • edited Loading

ricardoV94 commented Jun 27, 2024 • edited Loading

wd60622 commented Jun 27, 2024

ricardoV94 commented Jun 27, 2024

ricardoV94 commented Jun 27, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wd60622 commented Jun 28, 2024

wd60622 commented Jun 28, 2024 • edited Loading

wd60622 commented Jun 28, 2024

ricardoV94 commented Jun 28, 2024

ricardoV94 commented Jun 28, 2024

wd60622 commented Jun 29, 2024 • edited Loading

ricardoV94 commented Jun 29, 2024 • edited Loading

wd60622 commented Jun 30, 2024

codecov bot commented Jul 3, 2024

Codecov Report

ricardoV94 commented Jul 3, 2024

wd60622 commented Jun 26, 2024 •

edited

Loading

wd60622 Jun 27, 2024 •

edited

Loading

ricardoV94 Jun 27, 2024 •

edited

Loading

ricardoV94 Jun 27, 2024 •

edited

Loading

ricardoV94 Jun 27, 2024 •

edited

Loading

ricardoV94 Jun 27, 2024 •

edited

Loading

ricardoV94 Jun 27, 2024 •

edited

Loading

ricardoV94 Jun 27, 2024 •

edited

Loading

wd60622 commented Jun 27, 2024 •

edited

Loading

ricardoV94 commented Jun 27, 2024 •

edited

Loading

wd60622 commented Jun 28, 2024 •

edited

Loading

wd60622 commented Jun 29, 2024 •

edited

Loading

ricardoV94 commented Jun 29, 2024 •

edited

Loading