Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible off-by-one bug in derive_rank_cutoff #6

Open
scyrusm opened this issue Jan 23, 2024 · 0 comments
Open

Possible off-by-one bug in derive_rank_cutoff #6

scyrusm opened this issue Jan 23, 2024 · 0 comments

Comments

@scyrusm
Copy link

scyrusm commented Jan 23, 2024

See also here

The following code from ctxcore leads to the perverse situation where setting auc_threshold to 1 will lead to an assertion error: if auc_threshold were 1, rank_cutoff will be equal to total_genes, but rank_threshold has been set to total_genes - 1. So rank_threshold can at most be total_genes - 1, but then in the final line, rank_cutoff is decremented by `. It seems that there was a double attempt to fix the 0- vs 1-indexing discrepancy between python and R, leading to an off-by-one error...

def derive_rank_cutoff(
  auc_threshold: float, total_genes: int, rank_threshold: Optional[int] = None
  ) -> int:
  """
  Get rank cutoff.
  
  :param auc_threshold: The fraction of the ranked genome to take into account for
      the calculation of the Area Under the recovery Curve.
  :param total_genes: The total number of genes ranked.
  :param rank_threshold: The total number of ranked genes to take into account when
      creating a recovery curve.
  :return Rank cutoff.
  """
  
  if not rank_threshold:
      rank_threshold = total_genes - 1
  
  assert (
      0 < rank_threshold < total_genes
  ), f"Rank threshold must be an integer between 1 and {total_genes:d}."
  assert (
      0.0 < auc_threshold <= 1.0
  ), "AUC threshold must be a fraction between 0.0 and 1.0."
  
  # In the R implementation the cutoff is rounded.
  rank_cutoff = int(round(auc_threshold * total_genes))
  assert 0 < rank_cutoff <= rank_threshold, (
      f"An AUC threshold of {auc_threshold:f} corresponds to {rank_cutoff:d} top "                                                                                                                               
      f"ranked genes/regions in the database. Please increase the rank threshold "                                                                                                                               
      "or decrease the AUC threshold."                                                                                                                                                                           
  )                                                                                                                                                                                                              
  # Make sure we have exactly the same AUC values as the R-SCENIC pipeline.                                                                                                                                      
  # In the latter the rank threshold is not included in AUC calculation.                                                                                                                                         
  rank_cutoff -= 1
  return rank_cutoff
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant