Skip to content

Commit

Permalink
move anserini (#473)
Browse files Browse the repository at this point in the history
  • Loading branch information
seanmacavaney authored Aug 23, 2024
1 parent a537597 commit d5aee13
Show file tree
Hide file tree
Showing 31 changed files with 976 additions and 1,284 deletions.
68 changes: 0 additions & 68 deletions .github/workflows/test-anserini.yml

This file was deleted.

31 changes: 0 additions & 31 deletions docs/anserini.rst

This file was deleted.

4 changes: 3 additions & 1 deletion docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,10 +23,12 @@
import textwrap

from extras import generate_includes
from extras import generate_extensions
if not "QUICK" in os.environ:
generate_includes.setup()
generate_includes.dataset_include()
generate_includes.experiment_includes()
generate_extensions.generate_extensions()

# -- Project information -----------------------------------------------------
import datetime
Expand Down Expand Up @@ -58,6 +60,7 @@
'sphinx.ext.viewcode',
'sphinx.ext.githubpages',
'sphinx.ext.napoleon',
'sphinx_tabs.tabs',
]

# Add any paths that contain templates here, relative to this directory.
Expand Down Expand Up @@ -195,4 +198,3 @@ def setup(app):

extensions += ["myst_parser"]
source_suffix = ['.rst', '.md']

18 changes: 18 additions & 0 deletions docs/extensions.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
# Anserini <https://github.com/seanmacavaney/pyterrier-anserini>
# Advanced Caching <https://github.com/seanmacavaney/pyterrier-caching>
# Dense Retrieval <https://github.com/terrierteam/pyterrier_dr>
# PISA <https://github.com/terrierteam/pyterrier_pisa>
# OpenNIR <https://opennir.net/>
# T5 <https://github.com/terrierteam/pyterrier_t5>
# GenRank <https://github.com/emory-irlab/pyterrier_genrank>
# ColBERT <https://github.com/terrierteam/pyterrier_colbert>
# ChatNoir <https://github.com/chatnoir-eu/chatnoir-pyterrier>
# ANCE <https://github.com/terrierteam/pyterrier_ance>
# Doc2Query <https://github.com/terrierteam/pyterrier_doc2query>
# DeepCT <https://github.com/terrierteam/pyterrier_deepct>
# Quality <https://github.com/terrierteam/pyterrier-quality>
# Adaptive <https://github.com/terrierteam/pyterrier-adaptive>
# SPLADE <https://github.com/cmacdonald/pyt_splade>
# AutoQrels <https://github.com/seanmacavaney/autoqrels>
# Sentence Transformers <https://github.com/soldni/pyterrier_sentence_transformers>
# DeepImpact <https://github.com/terrierteam/pyterrier_deepimpact>
55 changes: 55 additions & 0 deletions docs/extras/generate_extensions.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
import importlib
from pathlib import Path
from hashlib import sha1

def generate_extensions():
Path('_includes').mkdir(parents=True, exist_ok=True)

with Path('_includes/ext_toc.rst').open('wt') as f_ext:
f_ext.write('''
.. toctree::
:maxdepth: 1
:caption: Extensions
''')
for line in open('extensions.txt'):
if '#' in line:
pkg, display_name = line.split('#', 1)
else:
pkg, display_name = line, line
pkg = pkg.strip()
display_name = display_name.strip()

if pkg == '':
if '<' not in display_name:
print(f'Skipping {line!r} -- must be in the format of "# Package Name <URL>"')
else:
f_ext.write(f' {display_name}\n')
continue

metadata = importlib.metadata.metadata(pkg)
pkg_name = metadata['name']
docs = importlib.resources.files(pkg).joinpath('pt_docs')
if docs.is_dir():
# Documentation included in the package, copy it over
paths = [(docs, Path(f'ext/{pkg_name}'))]
while paths:
src, dest = paths.pop()
dest.mkdir(parents=True, exist_ok=True)
for path in src.iterdir():
if path.is_dir():
paths.append((path, dest/path.name))
else:
if (dest/path.name).exists():
source_hash = sha1(path.open('rb').read()).hexdigest()
dest_hash = sha1((dest/path.name).open('rb').read()).hexdigest()
if source_hash == dest_hash:
continue
with path.open('rb') as fin, (dest/path.name).open('wb') as fout:
fout.write(fin.read())
f_ext.write(f' {display_name} <../ext/{pkg_name}/index.rst>\n')
elif 'home-page' in metadata:
# No documentation included in the package, but we can link to the repo
f_ext.write(f' {display_name} <{metadata["home-page"]}>\n')
else:
print(f'Skipping {line!r} -- No pt_docs in package and no home-page in metadata"')
14 changes: 1 addition & 13 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -34,22 +34,10 @@ Welcome to PyTerrier's documentation!

io
apply
anserini
new
debug

.. toctree::
:maxdepth: 1
:caption: Other PyTerrier plugins

OpenNIR <https://opennir.net/>
PyTerrier_T5 <https://github.com/terrierteam/pyterrier_t5>
PyTerrier_GenRank <https://github.com/emory-irlab/pyterrier_genrank>
PyTerrier_ColBERT <https://github.com/terrierteam/pyterrier_colbert>
PyTerrier_ChatNoir <https://github.com/chatnoir-eu/chatnoir-pyterrier>
PyTerrier_ANCE <https://github.com/terrierteam/pyterrier_ance>
PyTerrier_doc2query <https://github.com/terrierteam/pyterrier_doc2query>
PyTerrier_DeepCT <https://github.com/terrierteam/pyterrier_deepct>
.. include:: ./_includes/ext_toc.rst

Indices and tables
==================
Expand Down
9 changes: 0 additions & 9 deletions docs/installation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -96,15 +96,6 @@ These options adjust how the Terrier engine is loaded.
.. autofunction:: pyterrier.terrier.set_properties
.. autofunction:: pyterrier.terrier.extend_classpath

Anserini Configuration
~~~~~~~~~~~~~~~~~~~~~~

These options adjust how the Anserini engine is loaded. Note that the `pyserini` package needs to be
installed to use PyTerrier's Anserini integration.

.. autofunction:: pyterrier.anserini.set_version


Note on Deprecated Java Configuration
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Expand Down
6 changes: 2 additions & 4 deletions docs/parallel.rst
Original file line number Diff line number Diff line change
Expand Up @@ -49,10 +49,8 @@ What to Parallelise
===================

Only transformers that can be `pickled <https://docs.python.org/3/library/pickle.html>`_. Transformers that use native code
may not be possible to pickle. Some standard PyTerrier transformers have additional support for parallelisation:

- Terrier retrieval: pt.terrier.Retriever(), pt.terrier.FeaturesRetriever()
- Anserini retrieval: pt.anserini.AnseriniBatchRetrieve()
may not be possible to pickle. Some standard PyTerrier transformers have additional support for parallelisation,
most notably Terrier retrieval: :class:`pt.terrier.Retriever()` and :class:`pt.terrier.FeaturesRetriever()`.

Pure python transformers, such as `pt.text.sliding()` are picklable. However, parallelising only `pt.text.sliding()` may not produce
efficiency gains, due to the overheads of shuffling data back and forward.
Expand Down
3 changes: 2 additions & 1 deletion docs/requirements.txt
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
myst_parser
sphinx_rtd_theme
sphinx-autodoc-typehints
tabulate
tabulate
sphinx-tabs
Loading

0 comments on commit d5aee13

Please sign in to comment.