Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add node-removal methods linear in node degree #1083

Merged
merged 6 commits into from
Apr 5, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,7 @@ rand_pcg.workspace = true
rayon.workspace = true
serde = { version = "1.0", features = ["derive"] }
serde_json = "1.0"
smallvec = { version = "1.0", features = ["union"] }
rustworkx-core = { path = "rustworkx-core", version = "=0.15.0" }

[dependencies.pyo3]
Expand Down
10 changes: 10 additions & 0 deletions releasenotes/notes/remode-node-by-key-9ec75b5cf589319e.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
---
features:
- |
Added the :meth:`.PyDiGraph.remove_node_retain_edges_by_id` and
:meth:`~.PyDiGraph.remove_node_retain_edges_by_key` methods, which provide a node-removal that
is linear in the degree of the node, as opposed to quadratic like
:meth:`~.PyDiGraph.remove_node_retain_edges`. These methods require, respectively, that the
edge weights are referentially identical if they should be retained (``a is b``, in Python), or
that you can supply a ``key`` function that produces a Python-hashable result that is used to
do the equality matching between input and output edges.
14 changes: 12 additions & 2 deletions rustworkx/rustworkx.pyi
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@ from typing import (
ValuesView,
Mapping,
overload,
Hashable,
)
from abc import ABC
from rustworkx import generators # noqa
Expand Down Expand Up @@ -1340,8 +1341,17 @@ class PyDiGraph(Generic[_S, _T]):
self,
node: int,
/,
use_outgoing: bool | None = ...,
condition: Callable[[_S, _S], bool] | None = ...,
use_outgoing: bool = ...,
condition: Callable[[_T, _T], bool] | None = ...,
) -> None: ...
def remove_node_retain_edges_by_id(self, node: int, /) -> None: ...
def remove_node_retain_edges_by_key(
self,
node: int,
/,
key: Callable[[_T], Hashable] | None = ...,
*,
use_outgoing: bool = ...,
) -> None: ...
def remove_nodes_from(self, index_list: Sequence[int], /) -> None: ...
def subgraph(
Expand Down
218 changes: 215 additions & 3 deletions src/digraph.rs
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,8 @@ use indexmap::IndexSet;

use rustworkx_core::dictmap::*;

use smallvec::SmallVec;

use pyo3::exceptions::PyIndexError;
use pyo3::gc::PyVisit;
use pyo3::prelude::*;
Expand Down Expand Up @@ -978,6 +980,13 @@ impl PyDiGraph {
/// By default the data/weight on edges into the removed node will be used
/// for the retained edges.
///
/// This function has a minimum time complexity of :math:`\mathcal O(e_i e_o)`, where
/// :math:`e_i` and :math:`e_o` are the numbers of incoming and outgoing edges respectively.
/// If your ``condition`` can be cast as an equality between two hashable quantities, consider
/// using :meth:`remove_node_retain_edges_by_key` instead, or if your ``condition`` is
/// referential object identity of the edge weights, consider
/// :meth:`remove_node_retain_edges_by_id`.
///
/// :param int node: The index of the node to remove. If the index is not
/// present in the graph it will be ingored and this function willl have
/// no effect.
Expand All @@ -993,7 +1002,7 @@ impl PyDiGraph {
///
/// would only retain edges if the input edge to ``node`` had the same
/// data payload as the outgoing edge.
#[pyo3(text_signature = "(self, node, /, use_outgoing=None, condition=None)")]
#[pyo3(text_signature = "(self, node, /, use_outgoing=False, condition=None)")]
#[pyo3(signature=(node, use_outgoing=false, condition=None))]
pub fn remove_node_retain_edges(
&mut self,
Expand Down Expand Up @@ -1039,8 +1048,181 @@ impl PyDiGraph {
for (source, target, weight) in edge_list {
self._add_edge(source, target, weight)?;
}
self.graph.remove_node(index);
self.node_removed = true;
self.node_removed = self.graph.remove_node(index).is_some();
Ok(())
}

/// Remove a node from the graph and add edges from predecessors to successors in cases where
/// an incoming and outgoing edge have the same weight by Python object identity.
///
/// This function has a minimum time complexity of :math:`\mathcal O(e_i + e_o)`, where
/// :math:`e_i` is the number of incoming edges and :math:`e_o` the number of outgoing edges
/// (the full complexity depends on the number of new edges to be created).
///
/// Edges will be added between all pairs of predecessor and successor nodes that have the same
/// weight. As a consequence, any weight which appears only on predecessor edges will not
/// appear in the output, as there are no successors to pair it with.
///
/// :param int node: The index of the node to remove. If the index is not present in the graph
/// it will be ignored and this function will have no effect.
#[pyo3(signature=(node, /))]
pub fn remove_node_retain_edges_by_id(&mut self, py: Python, node: usize) -> PyResult<()> {
// As many indices as will fit inline within the minimum inline size of a `SmallVec`. Many
// use cases of this likely only have one inbound and outbound edge with each id anyway.
const INLINE_SIZE: usize =
2 * ::std::mem::size_of::<usize>() / ::std::mem::size_of::<NodeIndex>();
Comment on lines +1072 to +1073
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like that this is future proof and supports 32bit platforms well. For the foreseeable future NodeIndex is a u32. We looked at making it a usize a while ago but the extra memory overhead (on 64bit platforms) to support > 4 billion nodes or edges wasn't worth it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mostly I just don't like magic numbers and I don't trust myself to remember what they mean haha

let new_node_list = || SmallVec::<[NodeIndex; INLINE_SIZE]>::new();
let node_index = NodeIndex::new(node);
let in_edges = {
let mut in_edges = HashMap::new();
for edge in self
.graph
.edges_directed(node_index, petgraph::Direction::Incoming)
{
in_edges
.entry(PyAnyId(edge.weight().clone_ref(py)))
.or_insert_with(new_node_list)
.push(edge.source());
}
in_edges
};
let mut out_edges = {
let mut out_edges = HashMap::new();
for edge in self
.graph
.edges_directed(node_index, petgraph::Direction::Outgoing)
{
out_edges
.entry(PyAnyId(edge.weight().clone_ref(py)))
.or_insert_with(new_node_list)
.push(edge.target());
}
out_edges
};

for (weight, in_edges_subset) in in_edges {
let out_edges_subset = match out_edges.remove(&weight) {
Some(out_edges_key) => out_edges_key,
None => continue,
};
for source in in_edges_subset {
for target in out_edges_subset.iter() {
self._add_edge(source, *target, weight.clone_ref(py))?;
}
}
}
self.node_removed = self.graph.remove_node(node_index).is_some();
Ok(())
}

/// Remove a node from the graph and add edges from predecessors to successors in cases where
/// an incoming and outgoing edge have the same weight by Python object equality.
///
/// This function has a minimum time complexity of :math:`\mathcal O(e_i + e_o)`, where
/// :math:`e_i` is the number of incoming edges and :math:`e_o` the number of outgoing edges
/// (the full complexity depends on the number of new edges to be created).
///
/// Edges will be added between all pairs of predecessor and successor nodes that have equal
/// weights. As a consequence, any weight which appears only on predecessor edges will not
/// appear in the output, as there are no successors to pair it with.
///
/// If there are multiple edges with the same weight, the exact Python object used on the new
/// edges is an implementation detail and may change. The only guarantees are that it will be
/// deterministic for a given graph, and that it will be drawn from the incoming edges if
/// ``use_outgoing=False`` (the default) or from the outgoing edges if ``use_outgoing=True``.
///
/// :param int node: The index of the node to remove. If the index is not present in the graph
/// it will be ignored and this function will have no effect.
/// :param key: A callable Python object that is called once for each connected edge, to
/// generate the "key" for that weight. It is passed exactly one argument positionally
/// (the weight of the edge), and should return a Python object that is hashable and
/// implements equality checking with all other relevant keys. If not given, the edge
/// weights will be used directly.
/// :param bool use_outgoing: If ``False`` (default), the new edges will use the weight from
/// one of the incoming edges. If ``True``, they will instead use a weight from one of the
/// outgoing edges.
#[pyo3(signature=(node, /, key=None, *, use_outgoing=false))]
pub fn remove_node_retain_edges_by_key(
&mut self,
py: Python,
node: usize,
key: Option<Py<PyAny>>,
use_outgoing: bool,
) -> PyResult<()> {
let node_index = NodeIndex::new(node);
let in_edges = {
let in_edges = PyDict::new_bound(py);
for edge in self
.graph
.edges_directed(node_index, petgraph::Direction::Incoming)
{
let key_value = if let Some(key_fn) = &key {
key_fn.call1(py, (edge.weight(),))?
} else {
edge.weight().clone_ref(py)
};
if let Some(edge_data) = in_edges.get_item(key_value.bind(py))? {
let edge_data = edge_data.downcast::<RemoveNodeEdgeValue>()?;
edge_data.borrow_mut().nodes.push(edge.source());
} else {
in_edges.set_item(
key_value,
RemoveNodeEdgeValue {
weight: edge.weight().clone_ref(py),
nodes: vec![edge.source()],
}
.into_py(py),
)?
}
}
in_edges
};
let out_edges = {
let out_edges = PyDict::new_bound(py);
for edge in self
.graph
.edges_directed(node_index, petgraph::Direction::Outgoing)
{
let key_value = if let Some(key_fn) = &key {
key_fn.call1(py, (edge.weight(),))?
} else {
edge.weight().clone_ref(py)
};
if let Some(edge_data) = out_edges.get_item(key_value.bind(py))? {
let edge_data = edge_data.downcast::<RemoveNodeEdgeValue>()?;
edge_data.borrow_mut().nodes.push(edge.target());
} else {
out_edges.set_item(
key_value,
RemoveNodeEdgeValue {
weight: edge.weight().clone_ref(py),
nodes: vec![edge.target()],
}
.into_py(py),
)?
}
}
out_edges
};

for (in_key, in_edge_data) in in_edges {
let in_edge_data = in_edge_data.downcast::<RemoveNodeEdgeValue>()?.borrow();
let out_edge_data = match out_edges.get_item(in_key)? {
Some(out_edge_data) => out_edge_data.downcast::<RemoveNodeEdgeValue>()?.borrow(),
None => continue,
};
for source in in_edge_data.nodes.iter() {
for target in out_edge_data.nodes.iter() {
let weight = if use_outgoing {
out_edge_data.weight.clone_ref(py)
} else {
in_edge_data.weight.clone_ref(py)
};
self._add_edge(*source, *target, weight)?;
}
}
}
self.node_removed = self.graph.remove_node(node_index).is_some();
Ok(())
}

Expand Down Expand Up @@ -3090,3 +3272,33 @@ where
attrs: py.None(),
}
}

/// Simple wrapper newtype that lets us use `Py` pointers as hash keys with the equality defined by
/// the pointer address. This is equivalent to using Python's `is` operator for comparisons.
/// Using a newtype rather than casting the pointer to `usize` inline lets us retrieve a copy of
/// the reference from the key entry.
struct PyAnyId(Py<PyAny>);
impl PyAnyId {
fn clone_ref(&self, py: Python) -> Py<PyAny> {
self.0.clone_ref(py)
}
}
impl ::std::hash::Hash for PyAnyId {
fn hash<H: ::std::hash::Hasher>(&self, state: &mut H) {
(self.0.as_ptr() as usize).hash(state)
}
}
impl PartialEq for PyAnyId {
fn eq(&self, other: &Self) -> bool {
self.0.as_ptr() == other.0.as_ptr()
}
}
impl Eq for PyAnyId {}

/// Internal-only helper class used by `remove_node_retain_edges_by_key` to store its data as a
/// typed object in a Python dictionary.
#[pyclass]
struct RemoveNodeEdgeValue {
weight: Py<PyAny>,
nodes: Vec<NodeIndex>,
Copy link
Member

@mtreinish mtreinish Apr 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You used a smallvec in the identity function does the same optimization hold for this? Or because we end up storing this in Python it doesn't matter?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd guess that in this case any locality effects are overshadowed by the requirement to reach through a few Python pointers to get here in the first place (and to call the Python-space key function), but I can also change it over if you prefer - I don't imagine it hurts for sure.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nah, it's fine that's a good reason to not bother. It wouldn't make a big difference if there was any in this case.

}
Loading
Loading