Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use src layout and use hatch for packaging #1592

Merged
merged 6 commits into from
Feb 6, 2024

Conversation

d-v-b
Copy link
Contributor

@d-v-b d-v-b commented Dec 7, 2023

This PR changes the layout of the repo by putting the python source code in the src directory at the root of the repo. The logic for this layout can be found in lots of places online, but the scientific python packaging guide expresses it fairly concisely. Basically putting the source code in src helps prevent running tests with the local code instead of the installed code.

This change broke some tests, specifically all the tests that were trying to treat zarr.tests as a module, which was simple to fix. Oddly, there's code in main that imports from the test suite: see here which seems pretty undesirable, if not a bug. I fixed this by reversing the direction of the imports: the relevant class is now defined in meta.py and the tests that use this class import it from zarr.

Something else I noticed is that the test suite creates a persistent directory called fixture that contains a bunch of data that outlives the tests. These tests can be found here in main. The tests put this directory in the same directory as the root of the module, which means after running tests we end up with a src/fixture directory that has a bunch of stuff in it. This seems like something we should not be doing. For example, perhaps we could create temporary directories scoped to the test session. Maybe someone who understands these tests better can shed some light on how we ended up here.

In addition to changing the layout of the source code, I modified pyproject.toml to support use of the hatch project manager. I also considered PDM and Poetry (which I have the most experience with). I chose hatch because it was used in the scientific python packaging guide, and I've never heard anyone complain about it, so it seems safe. I think a modern project manager could make development a lot easier. We can replace the requirements_dev_*.txt files with entries in pyproject.toml, define environment variables for workflows in project.toml, and manage separate environments more easily than with our current setup.

This PR is WIP because

  • It's a work in progress. I'm not done folding all our build / test / ci configuration into pyproject.toml
  • I anticipate that lots of people will want to weigh in on how packaging for the repo should be configured.

I'm very interested to hear what people think!

@d-v-b d-v-b added the V3 Affects the v3 branch label Dec 7, 2023
@d-v-b
Copy link
Contributor Author

d-v-b commented Dec 7, 2023

I think this is blocked until I can figure out how to preserve our git history.

@d-v-b d-v-b mentioned this pull request Dec 8, 2023
@d-v-b
Copy link
Contributor Author

d-v-b commented Dec 8, 2023

Git history has been restored via me using git mv instead of vanilla mv.

I opened #1601 to separate the test importing issues from the actual layout change.

Things I would appreciate help with:

  • getting CI to use hatch environments / commands
  • Setting up hatch environments for our different test contexts
  • getting dynamic versioning working
  • expressing the doc generation process via hatch

@d-v-b d-v-b added the help wanted Issue could use help from someone with familiarity on the topic label Dec 8, 2023
Copy link
Contributor

@Saransh-cpp Saransh-cpp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Saw the help wanted label in my GitHub feed 😄

tests/__about__.py Outdated Show resolved Hide resolved
pyproject.toml Outdated
@@ -1,42 +1,38 @@
[build-system]
requires = ["setuptools>=64.0.0", "setuptools-scm>1.5.4"]
build-backend = "setuptools.build_meta"
requires = ["hatchling"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is an equivalent setuptools-scm for hatch - hatch-vcs. I feel that going from automated versioning to manual versioning would be a downgrade and an extra overhead for the maintainers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that makes sense! We definitely want to avoid manual versioning. I will look into adding hatch-vcs

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this is referred to in the scientific packaging guide too https://learn.scientific-python.org/development/guides/packaging-simple/#versioning

pyproject.toml Outdated Show resolved Hide resolved
pyproject.toml Outdated Show resolved Hide resolved
pyproject.toml Outdated Show resolved Hide resolved
pyproject.toml Outdated Show resolved Hide resolved
pyproject.toml Outdated
Comment on lines 134 to 137
[tool.black]
target-version = ["py38"]
line-length = 100
skip-string-normalization = true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

black can now be replaced with ruff format but maybe not in this PR

# SPDX-FileCopyrightText: 2023-present Zarr Developers
#
# SPDX-License-Identifier: MIT
__version__ = "0.0.1"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See the message about hatch-vcs above.

pyproject.toml Outdated Show resolved Hide resolved
@clbarnes
Copy link
Contributor

clbarnes commented Dec 8, 2023

What's the rationale behind the __about__.py files? Seems like an extra place to store a different version.

@d-v-b
Copy link
Contributor Author

d-v-b commented Dec 8, 2023

What's the rationale behind the __about__.py files? Seems like an extra place to store a different version.

I honestly have no clue -- when I ran hatch new zarr, they were automatically generated, and I figured there must be some logic to it.

after searching, it seems like the hatch dev answered this question here. He doesn't want to clutter the root namespace with extra importables. This seems reasonable to me, but I don't feel strongly either way.

@clbarnes
Copy link
Contributor

clbarnes commented Dec 8, 2023

__version__ is something you have a cause to look for (feature switches) and having it at least importable at the root is a pretty standard place that you'd expect to find it. But if we switch to hatch-vcs then the version will come from a different file anyway.

@clbarnes
Copy link
Contributor

Hatch has just released a new version, 1.8, with some nice features https://hatch.pypa.io/latest/blog/2023/12/11/hatch-v180/

@jhamman
Copy link
Member

jhamman commented Jan 24, 2024

@d-v-b - would love to see this move forward. Is the list of help-needed here still relevant?

@d-v-b
Copy link
Contributor Author

d-v-b commented Jan 24, 2024

Yes! Help is definitely still wanted here. I'm not sure when I will have time to advance this.

@Saransh-cpp
Copy link
Contributor

@d-v-b would it be possible for you to give me access to your fork? I might get some time this weekend to work on the hatch-vcs part. Otherwise, if you have time to work on it yourself, I could answer your questions or help you with the issues you are facing with hatch-vcs.

getting CI to use hatch environments / commands
Setting up hatch environments for our different test contexts
expressing the doc generation process via hatch

These are add-ons. Everything working in the CI with setuptools should work with hatch. I would suggest just migrating to hatch and hatch-vcs in this PR and then switching to hatch commands in the CI in the follow-up PRs.

@d-v-b
Copy link
Contributor Author

d-v-b commented Jan 24, 2024

@Saransh-cpp thank you! I added you as a collaborator to my fork of zarr-python.

@d-v-b
Copy link
Contributor Author

d-v-b commented Jan 24, 2024

These are add-ons. Everything working in the CI with setuptools should work with hatch. I would suggest just migrating to hatch and hatch-vcs in this PR and then switching to hatch commands in the CI in the follow-up PRs.

+1 to this approach

@Saransh-cpp
Copy link
Contributor

Thanks! Will work on this on weekend.

src/zarr/_storage/v3.py Outdated Show resolved Hide resolved
docs/contributing.rst Outdated Show resolved Hide resolved
src/zarr/meta.py Outdated
@@ -459,7 +459,7 @@ def _encode_storage_transformer_metadata(

@classmethod
def _decode_storage_transformer_metadata(cls, meta: Mapping) -> "StorageTransformer":
from zarr.tests.test_storage_v3 import DummyStorageTransfomer
from tests.test_storage_v3 import DummyStorageTransfomer
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is a class from tests imported in user-facing functionality?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't know! see #1601

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see, haha! No worries, I'll rebase this once your PR is merged!

@Saransh-cpp
Copy link
Contributor

Saransh-cpp commented Jan 26, 2024

This should be ready for a review. The tests pass for me locally and the package builds correctly.

The failing mypy test looks unrelated to this PR.

I have also force-pushed because it was much easier to migrate from scratch than resolving the conflicts in this PR.

@d-v-b please feel free to remove [WIP] from the PR title.

@d-v-b d-v-b changed the title [WIP] use src layout and use hatch for packaging use src layout and use hatch for packaging Jan 26, 2024
@MSanKeys963
Copy link
Member

Thanks a lot, @Saransh-cpp.

@Saransh-cpp
Copy link
Contributor

Should be ready now

Copy link
Member

@jhamman jhamman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two small questions around the edges here. I think we should be good to go here soon though.

node: $Format:%H$
node-date: $Format:%cI$
describe-name: $Format:%(describe:tags=true,match=*[0-9]*)$
ref-names: $Format:%D$
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you explain why we need this file? Looking at hatch-vcs, I don't see any requirement of this configuration.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file allows users to install Zarr from Git archives without encountering version issues. This is not a hatch-vcs thing, instead this file should've been introduced when setuptools-scm was introduced in Zarr - https://setuptools-scm.readthedocs.io/en/latest/usage/#builtin-mechanisms-for-obtaining-version-numbers. hatch-vcs uses setuptools_scm underneath.

Scientific Python also recommends adding git archival support -

You should also add these two files:

.git_archival.txt:

node: $Format:%H$
node-date: $Format:%cI$
describe-name: $Format:%(describe:tags=true,match=*[0-9]*)$
ref-names: $Format:%D$

And .gitattributes (or add this line if you are already using this file):

.git_archival.txt  export-subst

This will allow git archives (including the ones generated from GitHub) to also support versioning.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. Thanks for the explanation.

@@ -8,7 +8,7 @@
from zarr.attrs import Attributes
from zarr.storage import KVStore, DirectoryStore
from zarr._storage.v3 import KVStoreV3
from .util import CountingDict, CountingDictV3
from tests.util import CountingDict, CountingDictV3
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a bit of surprising change. Is it required? Is this what hatch suggests packages do? Generally, I prefer a relative imports within the tests directory.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yes, this was just a personal preference. Happy to revert it!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's revert it. Happy to discuss this change elsewhere but let's keep this pr scope as small as possible.

@joshmoore
Copy link
Member

There have been a number of breakages in conda land related to the fixtures/ directory. Considering the scale of the rework, this might be a time to address the root issue (if it doesn't just do it by chance!)

see:

@jhamman
Copy link
Member

jhamman commented Feb 6, 2024

@joshmoore - let's sort out CI separately. Hatch is going to help us manage our various envs so let's get right on that after merging this.

@jhamman jhamman merged commit f8874bf into zarr-developers:v3 Feb 6, 2024
5 of 6 checks passed
@Saransh-cpp
Copy link
Contributor

This will require some minor changes in the feedstock. I'll keep a watch, or if I forget, please ping me on the next version update PR in Zarr's feedstock.

@jhamman jhamman mentioned this pull request Feb 6, 2024
6 tasks
@jhamman jhamman added this to the 3.0.0.alpha milestone Apr 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Issue could use help from someone with familiarity on the topic V3 Affects the v3 branch
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

6 participants