Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add exploratory data analysis, more data preprocessing and features, and more models #9

Open
wants to merge 18 commits into
base: main
Choose a base branch
from

Conversation

schance995
Copy link

@schance995 schance995 commented Jun 5, 2024

Here's @0mWh's and my solution so far. We plan to attempt all 3 UnitaryHack challenges. There are several changes, we look forward to any questions and feedback.

Notebooks

  • QRNG_ Classification_Main_UnitaryHack windowed.ipynb for models with more training data from data/QRNG_ Classification_Main_UnitaryHack windowed_preprocessed_df_1717557318.csv.zst (generated in same notebook)
  • QRNG_ Classification_Main_UnitaryHack.ipynb for more models, preprocessing, and exploratory data analysis.
  • process_logical_reduction.ipynb for distribution analysis and statistical testing

Changes so far

  • Use sliding window of 100 bits to generate more training data
    • Any subsequence of 100 bits is also generated by the same quantum computer
  • Compare results against classical PRNGs
    • Impossible to classify classical PRNGs unless noise is added
  • Exploratory data analysis
    • Check frequencies of bitstrings
      • Each label has a set of unique bitstrings, so it should be possible to them apart
    • Mann-Whitney U test to tell distributions apart
      • We can tell quantum computer 4 apart from the rest, but 1,2,3 are quite similar
    • Use PCA, tSNE, UMAP to determine clustering of bitstrings and features
      • The bitstrings themselves are not informative
      • Need some computed features
      • Features become more informative with larger bitstrings
  • Add more features
  • Make features usable for ML
    • remove NaNs
    • remove features with identical values
    • mean and 0-1 normalize features
    • avoid test/train leakage
    • under/oversample
  • Train/test models
    • add threads to speed up computation
    • more sklearn models
      • Naive Bayes, K-means

Best performance so far

We got 67% on one of our models, but we caution against further interpretation until we implement more robust model testing. A limitation is that we don't have a held-out test set for a fair comparison against other project submissions.

Next steps

  • Model tuning
    • balanced class weights
    • k-fold cross validation
    • hyperparameter sweep
  • Generalized quantum circuit
    • Qiskit Quantum Volume Circuit

@0mWh
Copy link

0mWh commented Jun 14, 2024

currently sitting at 75-77% accuracy

@0mWh 0mWh force-pushed the unitaryhack branch 2 times, most recently from e1e6749 to 8ef37b7 Compare June 20, 2024 22:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants