Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add reproduction log #2544

Merged
merged 9 commits into from
Jul 19, 2024
Merged

Conversation

MehrnazSadeghieh
Copy link
Contributor

@MehrnazSadeghieh MehrnazSadeghieh commented Jul 12, 2024

I reproduced results on macOS using MacBook terminal. Everything worked successfully, except for using curl instead of wget for downloading the dataset.

Setup details:

  • Operating System: macOS
  • Environment: MacBook terminal
  • Configuration: Since wget command did not work for me, I used curl -o collections/msmarco-passage/collectionandqueries.tar.gz https://msmarco.z22.web.core.windows.net/msmarcoranking/collectionandqueries.tar.gz instead of wget.
    All steps completed without any issues.

Here is my device's information:

  • macOS version 12.7.5 (21H1222)
  • MacBook Air (Retina, 13-inch, 2018)
  • Processor 1.6 GHz Dual-Core Intel Core i5
  • Memory 8 GB 2133 MHz LPDDR3
  • Graphics Intel UHD Graphics 617 1536 MB

@MehrnazSadeghieh
Copy link
Contributor Author

MehrnazSadeghieh commented Jul 12, 2024

Dear Dr. Lin,

Thank you very much for your comprehensive explanation in your guide.

For running the mentioned commands, I used the MacBook terminal again. As I mentioned previously, my operating system is macOS.

To be honest, until now, I had only worked with Pandas and Elasticsearch for my academic projects in data mining and information retrieval. It is really amazing for me to become familiar with a new toolkit. By following your guidelines, everything was great so far, and I hope to complete the whole onboarding path without any problem.

Through the BM25 Baselines for MS MARCO Passage Ranking in Anserini, I realized I had not installed Anserini previously. So, as you mentioned, I used your guide for the installation process. However, I faced a little problem. During the execution of the command:
java -cp anserini-0.36.1-fatjar.jar io.anserini.search.SearchCollection
-index msmarco-v1-passage.splade-pp-ed
-topics msmarco-v1-passage.dev
-encoder SpladePlusPlusEnsembleDistil
-output run.msmarco-v1-passage-dev.splade-pp-ed-onnx.txt
-impact -pretokenized
My connection got interrupted, and the file did not continue downloading. So, as usual, I entered Control+C to stop downloading the file. But when I wanted to run the command again, I faced a challenge. Because I was not successful in downloading the index file completely, I got the checksum error. I knew the error was because I did not complete my download, but I could not find the index file to delete it and redownload it. It was a bit challenging for me. All the commands and scripts I used did not help me to find that file.

After a lot of searching on the web or with ChatGPT and failing, I suddenly figured out that you linked the detailed instructions in the installation guide. I don't know why I did not see that sooner, and I am sorry for that. But after all, by checking the detailed instruction page, I figured out what to do. I just wanted to thank you for your guides and share my challenge as you asked.

Also I wanted to suggest that it might be helpful to include a note in the main guide about handling incomplete downloads or checksum errors, and where to find the index files to delete and redownload if needed.

Thank you very much for checking our process. I really appreciate your time and consideration.

Best regards,

@lintool
Copy link
Member

lintool commented Jul 14, 2024

hi @MehrnazSadeghieh please fix conflicts here.

+ Results reproduced by [@XKTZ](https://github.com/XKTZ) on 2024-07-12 (commit [`3885b5c`](https://github.com/castorini/anserini/commit/3885b5c25178d2a88fc3b953d572b518ef0d1da6))
+ Results reproduced by [@MehrnazSadeghieh](https://github.com/MehrnazSadeghieh) on 2024-07-12 (commit [`f509bea`](https://github.com/MehrnazSadeghieh/anserini/commit/f509bea0be0483ec4373c1b8516009e4d3b059a1))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please change to a commit on the main branch, e.g., should be https://github.com/castorini/anserini/commit/ instead of https://github.com/MehrnazSadeghieh/anserini/commit/.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Dr.lin
i changed it
thanks a lot for your consideration

@lintool
Copy link
Member

lintool commented Jul 18, 2024

#2428

@MehrnazSadeghieh
Copy link
Contributor Author

#2428

Hi Dr. Lin,

I have tried very hard to fix this issue, but the approaches I have taken did not go well. To be honest, at first, I thought the problem was because I pushed my changes to a new branch. So, I tried this approach on another repository to check if pushing to the master branch would fix the problem. I tried it, and it was okay. Then, I decided to push my changes to the master branch in this repository (on this pull request: #2549), but I still get the same error that you mentioned. Can you please guide me on what else I can do to resolve this problem?

I use the following command to get my commit ID:
git log -1 --format="%H"

@MehrnazSadeghieh
Copy link
Contributor Author

#2428

Hi Dr. Lin,

I have tried very hard to fix this issue, but the approaches I have taken did not go well. To be honest, at first, I thought the problem was because I pushed my changes to a new branch. So, I tried this approach on another repository to check if pushing to the master branch would fix the problem. I tried it, and it was okay. Then, I decided to push my changes to the master branch in this repository (on this pull request: #2549), but I still get the same error that you mentioned. Can you please guide me on what else I can do to resolve this problem?

I use the following command to get my commit ID: git log -1 --format="%H"

hi Dr. Lin i think i fixed this issue on the new pull request i have mentioned in this quote. please let me know if there is still a problem. I guess the problem was that i was updating my commit ids any time i pushed to a branch and that id did not point to the main branch of repository and it was pointing to my own commits and that was why the error happened.

@MehrnazSadeghieh
Copy link
Contributor Author

#2428

Hi Dr. Lin,
I have tried very hard to fix this issue, but the approaches I have taken did not go well. To be honest, at first, I thought the problem was because I pushed my changes to a new branch. So, I tried this approach on another repository to check if pushing to the master branch would fix the problem. I tried it, and it was okay. Then, I decided to push my changes to the master branch in this repository (on this pull request: #2549), but I still get the same error that you mentioned. Can you please guide me on what else I can do to resolve this problem?
I use the following command to get my commit ID: git log -1 --format="%H"

hi Dr. Lin i think i fixed this issue on the new pull request i have mentioned in this quote. please let me know if there is still a problem. I guess the problem was that i was updating my commit ids any time i pushed to a branch and that id did not point to the main branch of repository and it was pointing to my own commits and that was why the error happened.

I just commited my changes in this branch too for your better access because as i realized the problem was not that i was thinking before.

Thanks a lot again and I apologize for my confusion

@lintool lintool merged commit 1d863b4 into castorini:master Jul 19, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants