Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Faster symmetry breaking #1502

Merged
merged 2 commits into from
Oct 30, 2019
Merged

Faster symmetry breaking #1502

merged 2 commits into from
Oct 30, 2019

Conversation

brahmaneya
Copy link
Collaborator

@brahmaneya brahmaneya commented Oct 30, 2019

Description of proposed changes

Use the Munkres algorithm instead of brute force search to find the optimal permutation for breaking column symmetry. Our objective for symmetry breaking is to maximize the total probability of LFs being accurate (summed over LFs). We optimize this over combinations of permutations of columns corresponding to labels with the same prior probability. This is equivalent to optimizing the trace of the product of the summed probabilities matrix with the permutation matrix, which can be done in O(n^3) time.

Fixes # (issue)
#1486

Test plan

Modified existing symmetry breaking test.

Checklist

Need help on these? Just ask!

  • I have read the CONTRIBUTING document.
  • I have updated the documentation accordingly.
  • I have added tests to cover my changes.
  • I have run tox -e complex and/or tox -e spark if appropriate.
  • All new and existing tests passed.

@codecov
Copy link

codecov bot commented Oct 30, 2019

Codecov Report

Merging #1502 into master will increase coverage by <.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master    #1502      +/-   ##
==========================================
+ Coverage    97.6%   97.61%   +<.01%     
==========================================
  Files          55       55              
  Lines        2049     2056       +7     
  Branches      335      339       +4     
==========================================
+ Hits         2000     2007       +7     
  Misses         22       22              
  Partials       27       27
Impacted Files Coverage Δ
snorkel/labeling/model/label_model.py 95.92% <100%> (+0.09%) ⬆️

snorkel/labeling/model/label_model.py Outdated Show resolved Hide resolved
# Set mu according to highest-scoring permutation
probs_sum = sum([mu[i : i + k] for i in range(0, self.m * k, k)]) @ P

m = Munkres()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for such a short var name here. It's used far away from this definition and just the once.

# Compute submatrix corresponding to the group.
probs_proj = probs_sum[[[g] for g in group], group]
# Use the Munkres algorithm to find optimal permutation.
# We use minus because we want to maximize diagonal sum, not minimize,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice comments

Copy link
Contributor

@ajratner ajratner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!! Minor changes only

snorkel/labeling/model/label_model.py Outdated Show resolved Hide resolved
test/labeling/model/test_label_model.py Show resolved Hide resolved
snorkel/labeling/model/label_model.py Show resolved Hide resolved
snorkel/labeling/model/label_model.py Outdated Show resolved Hide resolved
snorkel/labeling/model/label_model.py Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants