-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize Asset Paths #1312
Merged
Merged
Optimize Asset Paths #1312
Changes from all commits
Commits
Show all changes
28 commits
Select commit
Hold shift + click to select a range
efea959
Asset Paths design doc
jjnesbitt 430392b
Add asset paths models and service
jjnesbitt 444876d
Add asset path regex validator
jjnesbitt 37f723a
Add asset paths ingestion command
jjnesbitt 25dec83
Update serializers, use asset paths in views
jjnesbitt aef9ba2
Update client to use updated paths endpoint
jjnesbitt 7cee38a
Reorg asset service
jjnesbitt 648c90c
Update tests
jjnesbitt a64a567
Add asset paths on asset creation
jjnesbitt 767ce77
Update asset path add/delete tests
jjnesbitt 8e1fae6
Update asset paths when assets are updated in API
jjnesbitt 14208cd
Rename asset path service functions
jjnesbitt 5edb35d
Reorg asset paths service
jjnesbitt 0e2116d
Add check constraints against asset path field
jjnesbitt 12c64e1
Reorganize asset paths code
jjnesbitt 09410a2
Move validate_path outside of Asset model
jjnesbitt 0d23ff0
Don't inherit AssetPath models from TimeStampedModel
jjnesbitt 5079062
Protect on deletion for AssetPath.asset
jjnesbitt 0e27f5f
Use NotFound in paths endpoint
jjnesbitt 248a61b
Don't include regex in asset path validation error
jjnesbitt c171288
Handle asset path copy on version publish
jjnesbitt 4f6849b
Update asset path management command
jjnesbitt 7b5f465
Add comment
jjnesbitt c095468
Update asset paths publish test
jjnesbitt f716ae5
Fix swagger endpoint annotation
jjnesbitt b56eb29
Fix incorrect type hint
jjnesbitt fd0a644
Allow '=' in asset paths
jjnesbitt 4353b03
Order AssetPath querysets by 'path'
jjnesbitt File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,156 @@ | ||
from __future__ import annotations | ||
|
||
import os | ||
|
||
from django.db import transaction | ||
from django.db.models import Count, F, QuerySet | ||
|
||
from dandiapi.api.models import Asset, AssetPath, AssetPathRelation, Version | ||
|
||
#################################################################### | ||
# Dandiset and version deletion will cascade to asset path deletion. | ||
# Thus, no explicit action is needed for these. | ||
#################################################################### | ||
|
||
|
||
def extract_paths(path: str) -> list[str]: | ||
nodepaths: list[str] = path.split('/') | ||
for i in range(len(nodepaths))[1:]: | ||
nodepaths[i] = os.path.join(nodepaths[i - 1], nodepaths[i]) | ||
|
||
return nodepaths | ||
|
||
|
||
def get_root_paths(version: Version) -> QuerySet[AssetPath]: | ||
"""Return all root paths for a version.""" | ||
# Use prefetch_related here instead of select_related, | ||
# as otherwise the resulting join is very large | ||
qs = AssetPath.objects.prefetch_related( | ||
'asset', | ||
'asset__blob', | ||
'asset__embargoed_blob', | ||
'asset__zarr', | ||
) | ||
return ( | ||
qs.filter(version=version) | ||
.alias(num_parents=Count('parent_links')) | ||
.filter(num_parents=1) | ||
.order_by('path') | ||
) | ||
|
||
|
||
def get_path_children(path: AssetPath) -> QuerySet[AssetPath]: | ||
"""Get all direct children from an existing path.""" | ||
qs = AssetPath.objects.select_related( | ||
'asset', | ||
'asset__blob', | ||
'asset__embargoed_blob', | ||
'asset__zarr', | ||
) | ||
path_ids = ( | ||
AssetPathRelation.objects.filter(parent=path, depth=1) | ||
.values_list('child', flat=True) | ||
.distinct() | ||
) | ||
return qs.filter(id__in=path_ids).order_by('path') | ||
|
||
|
||
def search_asset_paths(query: str, version: Version) -> QuerySet[AssetPath] | None: | ||
"""Return all direct children of this path, if there are any.""" | ||
if not query: | ||
return get_root_paths(version) | ||
|
||
# Ensure no trailing slash | ||
fixed_query = query.rstrip('/') | ||
|
||
# Retrieve path | ||
path = AssetPath.objects.filter(version=version, path=fixed_query).first() | ||
if path is None: | ||
return None | ||
|
||
return get_path_children(path) | ||
|
||
|
||
# TODO: Make idempotent | ||
@transaction.atomic() | ||
def add_asset_paths(asset: Asset, version: Version): | ||
# Get or create leaf path | ||
leaf, created = AssetPath.objects.get_or_create(path=asset.path, asset=asset, version=version) | ||
if not created: | ||
return | ||
|
||
# Create absolute paths (exclude leaf node) | ||
nodepaths = extract_paths(asset.path)[:-1] | ||
|
||
# Create nodes | ||
AssetPath.objects.bulk_create( | ||
[AssetPath(path=path, version=version, asset=None) for path in nodepaths], | ||
ignore_conflicts=True, | ||
) | ||
|
||
# Retrieve all paths | ||
paths = [*AssetPath.objects.filter(version=version, path__in=nodepaths).order_by('path'), leaf] | ||
|
||
# Create relations between paths | ||
links = [] | ||
for i in range(len(paths)): | ||
links.extend( | ||
[ | ||
AssetPathRelation(parent=paths[i], child=paths[j], depth=j - i) | ||
for j in range(len(paths))[i:] | ||
] | ||
) | ||
|
||
# Create objects | ||
AssetPathRelation.objects.bulk_create(links, ignore_conflicts=True) | ||
|
||
# Get all relations (including leaf node) | ||
parent_ids = ( | ||
AssetPathRelation.objects.filter(child=leaf) | ||
.distinct('parent') | ||
.values_list('parent', flat=True) | ||
) | ||
|
||
# Update size + file count | ||
AssetPath.objects.filter(id__in=parent_ids).update( | ||
aggregate_size=F('aggregate_size') + asset.size, aggregate_files=F('aggregate_files') + 1 | ||
) | ||
|
||
|
||
@transaction.atomic() | ||
def delete_asset_paths(asset: Asset, version: Version): | ||
leaf: AssetPath = AssetPath.objects.get(asset=asset, version=version) | ||
|
||
# Fetch parents | ||
parent_ids = ( | ||
AssetPathRelation.objects.filter(child=leaf) | ||
.distinct('parent') | ||
.values_list('parent', flat=True) | ||
) | ||
parent_paths = AssetPath.objects.filter(id__in=parent_ids) | ||
|
||
# Update parents | ||
parent_paths.update( | ||
aggregate_size=F('aggregate_size') - asset.size, aggregate_files=F('aggregate_files') - 1 | ||
) | ||
|
||
# Ensure integrity | ||
leaf.refresh_from_db() | ||
assert leaf.aggregate_size == 0 | ||
assert leaf.aggregate_files == 0 | ||
|
||
# Delete leaf node and any other paths with no contained files | ||
AssetPath.objects.filter(aggregate_files=0).delete() | ||
|
||
|
||
@transaction.atomic() | ||
def update_asset_paths(old_asset: Asset, new_asset: Asset, version: Version): | ||
delete_asset_paths(old_asset, version) | ||
add_asset_paths(new_asset, version) | ||
|
||
|
||
@transaction.atomic() | ||
def add_version_asset_paths(version: Version): | ||
"""Add every asset from a version.""" | ||
for asset in version.assets.iterator(): | ||
add_asset_paths(asset, version) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,12 @@ | ||
import djclick as click | ||
|
||
from dandiapi.api.asset_paths import add_version_asset_paths | ||
from dandiapi.api.models import Version | ||
|
||
|
||
@click.command() | ||
def ingest_asset_paths(): | ||
for version in Version.objects.iterator(): | ||
print(f'Version: {version}') | ||
print(f'\t {version.assets.count()} assets') | ||
add_version_asset_paths(version) |
127 changes: 127 additions & 0 deletions
127
dandiapi/api/migrations/0038_assetpath_assetpathrelation_alter_asset_path_and_more.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,127 @@ | ||
# Generated by Django 4.1.1 on 2022-10-19 16:07 | ||
|
||
from django.db import migrations, models | ||
import django.db.models.deletion | ||
|
||
import dandiapi.api.models.asset | ||
|
||
|
||
class Migration(migrations.Migration): | ||
|
||
dependencies = [ | ||
('api', '0037_alter_version_status'), | ||
] | ||
|
||
operations = [ | ||
migrations.CreateModel( | ||
name='AssetPath', | ||
fields=[ | ||
( | ||
'id', | ||
models.BigAutoField( | ||
auto_created=True, primary_key=True, serialize=False, verbose_name='ID' | ||
), | ||
), | ||
('path', models.CharField(max_length=512)), | ||
('aggregate_files', models.PositiveBigIntegerField(default=0)), | ||
('aggregate_size', models.PositiveBigIntegerField(default=0)), | ||
], | ||
), | ||
migrations.CreateModel( | ||
name='AssetPathRelation', | ||
fields=[ | ||
( | ||
'id', | ||
models.BigAutoField( | ||
auto_created=True, primary_key=True, serialize=False, verbose_name='ID' | ||
), | ||
), | ||
('depth', models.PositiveIntegerField()), | ||
], | ||
), | ||
migrations.AlterField( | ||
model_name='asset', | ||
name='path', | ||
field=models.CharField( | ||
max_length=512, validators=[dandiapi.api.models.asset.validate_asset_path] | ||
), | ||
), | ||
migrations.AddConstraint( | ||
model_name='asset', | ||
constraint=models.CheckConstraint( | ||
check=models.Q( | ||
('path__regex', '^([A-z0-9(),&\\s#+~_=-]?\\/?\\.?[A-z0-9(),&\\s#+~_=-])+$') | ||
), | ||
name='asset_path_regex', | ||
), | ||
), | ||
migrations.AddConstraint( | ||
model_name='asset', | ||
constraint=models.CheckConstraint( | ||
check=models.Q(('path__startswith', '/'), _negated=True), | ||
name='asset_path_no_leading_slash', | ||
), | ||
), | ||
migrations.AddField( | ||
model_name='assetpathrelation', | ||
name='child', | ||
field=models.ForeignKey( | ||
on_delete=django.db.models.deletion.CASCADE, | ||
related_name='parent_links', | ||
to='api.assetpath', | ||
), | ||
), | ||
migrations.AddField( | ||
model_name='assetpathrelation', | ||
name='parent', | ||
field=models.ForeignKey( | ||
on_delete=django.db.models.deletion.CASCADE, | ||
related_name='child_links', | ||
to='api.assetpath', | ||
), | ||
), | ||
migrations.AddField( | ||
model_name='assetpath', | ||
name='asset', | ||
field=models.ForeignKey( | ||
blank=True, | ||
null=True, | ||
on_delete=django.db.models.deletion.PROTECT, | ||
related_name='leaf_paths', | ||
to='api.asset', | ||
), | ||
), | ||
migrations.AddField( | ||
model_name='assetpath', | ||
name='version', | ||
field=models.ForeignKey( | ||
on_delete=django.db.models.deletion.CASCADE, | ||
related_name='asset_paths', | ||
to='api.version', | ||
), | ||
), | ||
migrations.AddConstraint( | ||
model_name='assetpathrelation', | ||
constraint=models.UniqueConstraint( | ||
fields=('parent', 'child'), name='unique-relationship' | ||
), | ||
), | ||
migrations.AddConstraint( | ||
model_name='assetpath', | ||
constraint=models.CheckConstraint( | ||
check=models.Q(('path__endswith', '/'), _negated=True), name='consistent-slash' | ||
), | ||
), | ||
migrations.AddConstraint( | ||
model_name='assetpath', | ||
constraint=models.UniqueConstraint( | ||
fields=('asset', 'version'), name='unique-asset-version' | ||
), | ||
), | ||
migrations.AddConstraint( | ||
model_name='assetpath', | ||
constraint=models.UniqueConstraint( | ||
fields=('version', 'path'), name='unique-version-path' | ||
), | ||
), | ||
] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is done as two check constraints as it was difficult to create a working/efficient regex that covered both.