Skip to content

Commit

Permalink
documentation wip
Browse files Browse the repository at this point in the history
  • Loading branch information
seanmacavaney committed Sep 25, 2024
1 parent ce69f18 commit 10ffa8f
Show file tree
Hide file tree
Showing 4 changed files with 83 additions and 1 deletion.
48 changes: 48 additions & 0 deletions docs/artifact.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
Artifact API
------------------------------------------------

PyTerrier's Artifact API provides a powerful way to share resources, such as indexes,
cached results, and more. Re-using one another's artifacts is a great way to help achieve
green (i.e., sustainable) research [#]_.

The API is provided by the :class:`~pyterrier.Artifact` classs, which includes methods
for sharing artifacts using a variety of services, such as HuggingFace Hub and Zenodo.

.. note::
**What is an Artifact?** "Artifact" often refers to a broad range of items. For
instance, the `ACM defines <https://www.acm.org/publications/policies/artifact-review-and-badging-current>`__
an artifact as: "a digital object that was either created by the authors to be used as part of the study
or generated by the experiment itself."

In PyTerrier, we use a narrower definition. We treat artifacts as components that
can be represented as a file or directory stored on disk. These are most frequently built indexes,
but can also be resources such as cached pipeline results.

Working with Artifacts
=================================================

TODO

Artifact Implementations
=================================================

Here's a list of existing :class:`~pyterrier.Artifact` implementations. (If you've added one,
feel free to make a PR to this page to add it!)

.. To add to this list, edit extras/generate_includes.py
.. include:: ./_includes/artifact_list.rst

Advanced: Writing Your Own Artifact
=================================================

TODO: code, entry points

Advanced: Writing Custom Artifact URL Schemes
=================================================

TODO

----

.. [#] See: Scells, Zhuang, and Zuccon. `Reduce, Reuse, Recycle: Green Information Retrieval Research
<https://dl.acm.org/doi/10.1145/3477495.3531766>`_. SIGIR 2022.
3 changes: 2 additions & 1 deletion docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,10 +24,11 @@

from extras import generate_includes
from extras import generate_extensions
generate_includes.setup()
if not "QUICK" in os.environ:
generate_includes.setup()
generate_includes.dataset_include()
generate_includes.experiment_includes()
generate_includes.artifact_list_include()
generate_extensions.generate_extensions()

# -- Project information -----------------------------------------------------
Expand Down
32 changes: 32 additions & 0 deletions docs/extras/generate_includes.py
Original file line number Diff line number Diff line change
Expand Up @@ -138,3 +138,35 @@ def experiment_includes():
).head().to_markdown(tablefmt="rst")
with open("_includes/experiment-perq.rst", "wt") as f:
f.write(table)


def artifact_list_include():
table = [
{'class': 'pyterrier.terrier.TerrierIndex', 'package': 'python-terrier', 'package_url': 'https://github.com/terrier-org/pyterrier', 'type': 'sparse_index', 'format': 'terrier'},
{'class': 'pyterrier_pisa.PisaIndex', 'package': 'pyterrier-pisa', 'package_url': 'https://github.com/terrierteam/pyterrier_pisa', 'type': 'sparse_index', 'format': 'pisa'},
{'class': 'pyterrier_anserini.AnseriniIndex', 'package': 'pyterrier-anserini', 'package_url': 'https://github.com/seanmacavaney/pyterrier-anserini', 'type': 'sparse_index', 'format': 'anserini'},
{'class': 'pyterrier_adaptive.corpus_graph.NpTopKCorpusGraph', 'package': 'pyterrier-adaptive', 'package_url': 'https://github.com/terrierteam/pyterrier-adaptive', 'type': 'corpus_graph', 'format': 'np_topk'},
{'class': 'pyterrier_ciff.CiffIndex', 'package': 'pyterrier-ciff', 'package_url': 'https://github.com/seanmacavaney/pyterrier-ciff', 'type': 'sparse_index', 'format': 'ciff'},
{'class': 'pyterrier_dr.FlexIndex', 'package': 'pyterrier-dr', 'package_url': 'https://github.com/terrierteam/pyterrier_dr', 'type': 'dense_index', 'format': 'flex'},
{'class': 'pyterrier_quality.QualCache', 'package': 'pyterrier-quality', 'package_url': 'https://github.com/terrierteam/pyterrier-quality', 'type': 'quality_score_cache', 'format': 'numpy'},
{'class': 'pyterrier_caching.Lz4PickleIndexerCache', 'package': 'pyterrier-caching', 'package_url': 'https://github.com/seanmacavaney/pyterrier-caching', 'type': 'indexer_cache', 'format': 'lz4pickle'},
{'class': 'pyterrier_caching.DbmRetrieverCache', 'package': 'pyterrier-caching', 'package_url': 'https://github.com/seanmacavaney/pyterrier-caching', 'type': 'retriever_cache', 'format': 'dbm.dumb'},
{'class': 'pyterrier_caching.Hdf5ScorerCache', 'package': 'pyterrier-caching', 'package_url': 'https://github.com/seanmacavaney/pyterrier-caching', 'type': 'scorer_cache', 'format': 'hdf5'},
]
with open("_includes/artifact_list.rst", "wt") as f:
f.write('''
.. list-table::
:header-rows: 1
* - Class
- Package
- Type / Format
- Links
''')
for rec in table:
f.write('''
* - :class:`~{class}`
- `{package} <{package_url}>`_
- ``{type}``/``{format}``
- `HuggingFace <https://huggingface.co/datasets?other=pyterrier-artifact.{type}.{format}>`__
'''.format(**rec))
1 change: 1 addition & 0 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ Welcome to PyTerrier's documentation!
experiments
rewrite
ltr
artifact

.. toctree::
:maxdepth: 1
Expand Down

0 comments on commit 10ffa8f

Please sign in to comment.