Skip to content

Agglomerative clustering tool for network-x graphs

License

Notifications You must be signed in to change notification settings

zjelveh/agglom_cluster

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

agglom_cluster

Agglomerative clustering tool for network-x graphs

Clustering

Implements the algorithm described by: "Fast algorithm for detecting community structure in networks" M. E. J. Newman. 2004 http://arxiv.org/pdf/cond-mat/0309508v1.pdf This is a greedy agglomerative hierarchical clustering algorithm

This implementation uses a heap to select the best pair to cluster at each iteration

  • A naive implementation considers all "n" edges in the graph (O(n))
  • A heap reduces this search dramatically (O(log(n))

Problems

  • The actual Modularity score does not exactly match the Modularity score of the example on the wikipedia page
  • Does not work for directed graphs
  • Does not work for negative graphs (TODO add this capability)

Stores the following information

  • Supergraph
    • Duplicate of the original graph. Gets manipulated during the clustering: edges and nodes are condensed and reweighted
    • Cluster degree stored in node attribute
    • Number of connections between clusters stored in edge attribute (weighted edge)
    • Implicitly keeps track of the existing nodes which is required for the heap
  • Dendrogram
    • Stores the clustering history in a tree
  • Heap
    • Stores the modularity quality difference for combining each pair of existing nodes
    • Popping returns the node pair with the largest modulairty/quality difference, then smallest id1, then smallest id2
      • Stored as a tuple (value, id1, id2) where the modularity/value is negated, and id1 is the smaller of the two
      • Processing the smaller IDs first means that smaller clusters will be chosen first during modularity tie breaking
    • Non-existing nodes are not actively removed from the heap, so pairs with non-existing nodes are ignored when popping
  • Quality History
    • Charts the Modularity score for each number of clustering

Author

Author(s): Ethan Lozano & Matthew Seal

Collaborator(s):

(C) Copyright 2013, Opengov

About

Agglomerative clustering tool for network-x graphs

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%