Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Image scales catalog metadata #3521

Merged
merged 26 commits into from
Jun 22, 2022
Merged

Image scales catalog metadata #3521

merged 26 commits into from
Jun 22, 2022

Conversation

cekk
Copy link
Member

@cekk cekk commented May 4, 2022

As discussed today at Beethoven sprint.

We need a way to pass image scales to Volto without waking up objects and calculating them every time.

This is a poc to see if we can store them in catalog.
We store a PersistentDict into metadata so it is only loaded from db if accessed.

There are two new adapters:

  • for context (IImageScalesAdapter)
  • for fields (IImageScalesFieldsAdapter)

The first one iterate over context's schema and try to call the second over schema fields.
If there is an adapter registered for that field, it can return the proper scales.
This allows future customizations like for example if miniatures are stored in an external service and not in Plone.

I copied the code for getting scales from plone.restapi utils (we could move that code from restapi to here if needed).

@mister-roboto
Copy link

@cekk thanks for creating this Pull Request and helping to improve Plone!

TL;DR: Finish pushing changes, pass all other checks, then paste a comment:

@jenkins-plone-org please run jobs

To ensure that these changes do not break other parts of Plone, the Plone test suite matrix needs to pass, but it takes 30-60 min. Other CI checks are usually much faster and the Plone Jenkins resources are limited, so when done pushing changes and all other checks pass either start all Jenkins PR jobs yourself, or simply add the comment above in this PR to start all the jobs automatically.

Happy hacking!

@cekk cekk requested review from jensens, ericof and MrTango May 4, 2022 21:15
@ale-rt
Copy link
Member

ale-rt commented May 5, 2022

Is this implying that we generate all the scales whenever a new image is uploaded?

@cekk
Copy link
Member Author

cekk commented May 5, 2022

Is this implying that we generate all the scales whenever a new image is uploaded?

isn't this already made when we create a new content on Volto? When you get its data from restapi, you get also its miniatures.

@cekk
Copy link
Member Author

cekk commented May 6, 2022

@ale-rt @plone/framework-team if this is ok for you and you don't have any complaints, i will split that pr into the right packages

@cekk
Copy link
Member Author

cekk commented May 6, 2022

@jensens this pr needs also an upgrade-step because we're adding a new metadata to the catalog and catalog needs to be updated.
That could be a "problem" in large installations because for metadata we need to reindex the whole objects, not only the index and could take time.
Is it ok to do it in a Plone upgrade-step?

@ale-rt
Copy link
Member

ale-rt commented May 6, 2022

Is this implying that we generate all the scales whenever a new image is uploaded?

isn't this already made when we create a new content on Volto? When you get its data from restapi, you get also its miniatures.

IIRC correctly @datakurre made a PR to some package that enables that behavior by setting some env variable, but with a quick search I was not able to find it.

Anyway this is not the general case and I would like to keep it like that.
The big argument against creating all the scales immediately is that we could upload N images at a time and block the site while computing the scales.

Also I think that the upgrade step that adds the image_scales index lead to a lot of undesired effects.

I am not deeply in to your use case and this might be the perfect solution, but I would ask you to think more about this.

@datakurre
Copy link
Member

@datakurre
Copy link
Member

I also created a branch with async image scaling back then https://github.com/plone/plone.namedfile/tree/datakurre-image-scaling-queue (in practice it was found to lack optimistic savepoint support)

That branch generated all scale metadata immediately, but responded with a temporary redirect to the original until the scale was really available. Therefore an additional patch for Volto was required to retry the scale request as long as the response had that status code.

All the code is probably bitrotten by now, but the concept could still work. But agreed, it was a bit complex.

@cekk
Copy link
Member Author

cekk commented May 7, 2022

Ok, i've found some problems with this implementation.
The first one is that when i create a new object with image, or i update the image with a new one, when the indexer is called i don't have the final blob file in content-type field, but a temporary file.
That means that when the object is indexed, we generate a wrong set of scales that will be invalidated at first object GET call.
The catalog keeps the old ones, and restapi call returns the right ones generated from the proper blob.
When i save again the content without touching the image, everything has fixed.

Probably adding two event handlers that listen to object added and modified events can have the right blobs, but i need a way to check into my indexer (actually into the field adapter) if the current image is temporary or not.

@datakurre @ale-rt any suggestions?

I don't know if this is the perfect solution also because we have to re-index the catalog.

Let me better explain our use-case: when dealing with listing blocks (and other blocks) we could have a list of results with images.
With listing blocks (and search endpoint too) we can choose to get the fullobjects or the brains with a custom set of metadata (catalog metadata).
To speedup the queries and not have too big responses we decided to get brains by default. Most blocks needs images infos. For frontend optimizations, we need the full list of available scales for each image and their width and height.
We can get these data from the real object (that means waking up each brain) or from a metadata stored in catalog.

An alternative solution that we discussed was to create a separate catalog (maybe just a BTree into the site root) that we are going to populate with the same data that i'm trying to store in metadata, and when we serialize the brains, we'll get scales infos from there without touching real objects.

@ale-rt
Copy link
Member

ale-rt commented May 7, 2022

I do not know if I understand completely your issue and your "frontend optimizations".

When working with a brain usually the URL to access the scale looks like: ${brain/getURL}/@@images/${scale}/?t=${brain/modified}.

Knowing the scale you are pretty sure to know the width of the image but not the height given that usually the scales size look like $WIDTH:65536.

You already see that puts emphasis on the width rather than the height and in my opinion this is a good thing.
The two dimensions are not exactly equal when we talk about rendering a page: the vertical one is much more delicate and browser dependent.

That said, with a brain in your hands there are good chances that you already know what is the with width of your scale and the URL to display it.

That should be enough info to build a web page that can be rendered nicely across different browsers.

If your "frontend optimizations" have desperate need to know the height of an image I strongly doubt they are fit for the Plone core.
Can you reconsider them?

@pnicolli
Copy link

pnicolli commented May 7, 2022

@ale-rt A couple explanations are probably due here:

  • the height of the image is required for properly rendering it, the browser uses it to reserve the image space on the page, thus avoiding Cumulative Layout Shift
  • in addition to that, in Volto we don't have access to the scales configured in the backend, at this time, so an alternative to have those would need to be implemented

@datakurre
Copy link
Member

datakurre commented May 7, 2022

@cekk Ouch. The rabbit hole gets deeper.

A thing that has troubled me forever is that scale uids are not deterministic, but just random uuid4s: https://github.com/plone/plone.scale/blob/09b3598b3843363260a263c65435bd2f463249ab/plone/scale/storage.py#L198

If were able to use deterministic uids, this should be among the issues that get fixed. Like hash of the original file by its contents and full scale configuration.

(I have had a similar patch for that for JYU for years, but don't have access to the code until next year. We did it for better caching, because the builtin 24h scale invalidation on any scale modification with continuously changing scale URLs was bad.)

@ale-rt
Copy link
Member

ale-rt commented May 7, 2022

@cekk Ouch. The rabbit hole gets deeper.

A thing that has troubled me forever is that scale uids are not deterministic, but just random uuid4s: https://github.com/plone/plone.scale/blob/09b3598b3843363260a263c65435bd2f463249ab/plone/scale/storage.py#L198

If were able to use deterministic uids, this should be among the issues that get fixed. Like hash of the original file by its contents and full scale configuration.

This is super true, having the possibility to get the proper URL out of some brain metadata would be really awesome.

About the hashing, I think it is enough to consider the _p_modified timestamp of the blob rather than hashing the full file content which can be several MB.

@ale-rt
Copy link
Member

ale-rt commented May 7, 2022

@ale-rt A couple explanations are probably due here:

  • the height of the image is required for properly rendering it, the browser uses it to reserve the image space on the page, thus avoiding Cumulative Layout Shift
  • in addition to that, in Volto we don't have access to the scales configured in the backend, at this time, so an alternative to have those would need to be implemented
  1. I do not see it required for properly rendering. I grok it is a plus but it is not required by any means.
  2. That seems to me a really easy thing to solve. I guess you already have some constants about the site already defined somewhere (e.g. the portal_url). That would be just another one.

@ale-rt
Copy link
Member

ale-rt commented May 7, 2022

Knowing the scale you are pretty sure to know the width of the image but not the height given that usually the scales size look like $WIDTH:65536.

And if you want to know the height given the width you can just get it knowing the original images and doing a proportion or if you prefer you can divide it by the aspect ratio

@cekk
Copy link
Member Author

cekk commented May 9, 2022

Ok, so we could get rid of real scales when storing the data (also because i suspect that if we call the view on temp files when we are uploading new image, we'll get blobs twice).

I think that we should store something in the catalog anyway, because we need to know at least original image sizes.
Then we could manually calculate different scales sizes based on what @ale-rt suggested (width from the registry and height calculated).
And generate urls without blobs UIDS but appending modified date to invalidate cache.

The big question now is: WHERE and HOW? In catalog as proposed? In a separate tree?
In both of the options we need to make a "reindex" of the objects to populate new data.
If this is not only a Volto need, we should made it in the core.

An alternative solution could be: in summary serializer, if you are requesting an image-ish field (we need to define a way to set that list) with additional_metadata, we wake up the brain and get proper scales (as we already do when serializing the object).
Probably this will slow down a bit query performances, but doesn't add any extra data in the catalog. And we're talking about batched results, so usually we have ate least 20/30 brains each query.

@yurj
Copy link
Contributor

yurj commented May 9, 2022

The first one is that when i create a new object with image, or i update the image with a new one, when the indexer is called i don't have the final blob file in content-type field, but a temporary file.

Hi! this is a bug, because when indexing the object on save, the object should already be finalized. I think it has to do with the uncommit state of the blob:
https://github.com/zopefoundation/ZODB/blob/22d1405d413a0c210ecbf43575d6bb15b34ecdc9/src/ZODB/tests/blob_transaction.txt

can you check if the committed call over the blob fails like in https://github.com/zopefoundation/ZODB/blob/22d1405d413a0c210ecbf43575d6bb15b34ecdc9/src/ZODB/tests/blob_transaction.txt#L321?

Also, @1letter had this:

# first attempt
(Pdb++) pp blob._p_blob_committed 
None

(Pdb++) dir(blob)
['__class__', '__delattr__', '__dict__', '__doc__', '__format__', '__getattribute__', '__getstate__', '__hash__', '__implemented__', '__init__', '__module__', '__new__', '__providedBy__', '__provides__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__setstate__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__', '_create_uncommitted_file', '_p_activate', '_p_blob_committed', '_p_blob_ref', '_p_blob_uncommitted', '_p_changed', '_p_deactivate', '_p_delattr', '_p_estimated_size', '_p_getattr', '_p_invalidate', '_p_jar', '_p_mtime', '_p_oid', '_p_serial', '_p_setattr', '_p_state', '_p_status', '_p_sticky', '_uncommitted', 'closed', 'committed', 'consumeFile', 'open', 'opened', 'readers', 'writers']

# second attempt
(Pdb++) pp blob._p_blob_committed 
u'/Plone5/py2/var/blobstorage/0x00/0x00/0x00/0x00/0x00/0xaa/0x68/0xbc/0x03d9906455146bcc.blob'

He solved with:

blobfile = obj.file
blob = blobfile._blob
blob._p_activate()

Hope this can help.

@pnicolli
Copy link

pnicolli commented May 9, 2022

@ale-rt A couple explanations are probably due here:

  • the height of the image is required for properly rendering it, the browser uses it to reserve the image space on the page, thus avoiding Cumulative Layout Shift
  • in addition to that, in Volto we don't have access to the scales configured in the backend, at this time, so an alternative to have those would need to be implemented
  1. I do not see it required for properly rendering. I grok it is a plus but it is not required by any means.
  2. That seems to me a really easy thing to solve. I guess you already have some constants about the site already defined somewhere (e.g. the portal_url). That would be just another one.

@ale-rt

  1. It is required if we want good scores on core web vitals. I realize it also looks good without it, but we should want our frontend to be the best possible one in order to have a solid product (Plone itself). Also, several customers tend to look at the lighthouse score more than they look at the actual look of the site.
    Good source of info on the topic: https://web.dev/optimize-cls/#images-without-dimensions

  2. We could indeed save the Plone scales in Volto config, but then we would need to manually keep those in sync with changes in the backend config, in the actual scales calculated by Plone. Of course, this requirement is especially valid if we get width and height of images from the catalog like suggested in the PR, because then the code sync problem can bite us anytime.

@cekk
Regarding where and how to generate image urls without uuids like ale-rt suggested, which is fine to me, I would suggest doing it directly in the frontend, that means in Volto and in Classic UI templates. The info to build the url are there, it's just a matter of using those. This is true unless we decide that we want the backend to decide how the url is built and the frontend to just use the one provided by the backend, and this option is not to be discarded probably, since it is probably more future proof.

@cekk
Copy link
Member Author

cekk commented May 9, 2022

@pnicolli we need to generate them in the backend because otherwise in brains you don't know what are the actual image sizes, and you need both if i understood correctly.

@pnicolli
Copy link

pnicolli commented May 9, 2022

@cekk I just meant the url of the image, which in the frontend could be generated like {brain.getURL}/@@images/{brain.image_field}/preview?modified={brain.modified}. Of course we would need to assume preview exists instead of knowing it from the catalog, but I believe there is no escaping from this.
Of course we still need image width and height from the backend.

@fredvd
Copy link
Sponsor Member

fredvd commented May 9, 2022

From what I learned and proposed for last friday at the Beethovensprint my current understanding is that the most useful and probably only thing to store in the catalog and what is needed to create the correct image sizes in the srcset is the width of the original image stored on the plone.namedfile field or of the images available on a context.

What you don't want to do is to generate a srcset with for example image widths (300w, 600w, 1200w, 1600w) when the stored image is only 980 pixels wide.

All other information can be generated independent of waking up the image object. And with the current knowledge for every site there will be a limited identical set of 'fixed' image widths that can be used for all images that are served and rendered in the frontend, either volto or classic.

So then you could filter the image widths that are not available (for example for hidpi usage) because it would mean the image gets upscaled when that image is pulled from the srcset.

@fredvd
Copy link
Sponsor Member

fredvd commented May 9, 2022

A thing that has troubled me forever is that scale uids are not deterministic, but just random uuid4s: https://github.com/plone/plone.scale/blob/09b3598b3843363260a263c65435bd2f463249ab/plone/scale/storage.py#L198

If were able to use deterministic uids, this should be among the issues that get fixed. Like hash of the original file by its contents and full scale configuration.

(I have had a similar patch for that for JYU for years, but don't have access to the code until next year. We did it for better caching, because the builtin 24h scale invalidation on any scale modification with continuously changing scale URLs was bad.)

One problem that we never had to deal with, but became relevant with plone.restapi and decoupled frontend is opening up the server for Denial of Service attacks by requestiong thousands of flexible image scales. When the templates are processed only server side and a uuid url is generated this is not possible.

Some stand alone image servers that were looked at for inspiration or integration (also in 2020 at the Dresden Plonetagung) allow to 'sign' a image request with public/private key, effectively generating a signed 'ticket' with which the frontend is allowed to request a certain image scale/format/quality/crop .

@datakurre I'm not saying you are proposing this: you are only discussing about deterministic scales, but if one more step would be to generate the image based on decoding the information in the url without any security checks we need to be aware of DoS.


I'm surprised to learn that scales are 24h lived, My assumption until now was that they are there permanently until the image data is refreshed or in the current implementation the scale definitions is altered. I base that on the stale image scale annotations that build up and for which we run a purge image scales script on Plone sites (https://github.com/zestsoftware/plonescripts/blob/master/purge_image_scales.py).


What I missed in a solution for problems we encountered the last few years with plone.restapi image responsse triggering generation of all image scales at once because returning the 'stable' url with uuid needs the generation of the image:

why did nobody until now propose to make the generation of the image scaled to a certain width lazy by:

  • registering the uuid in an servide side dict/list with the requested processing parameters so it is 'pre registered' for creation when restapi or a page template needs the width(s).
  • generate the image scale upon the first incoming request for this uuid when it is pre-registered and then cache it on an annotation

This avoid rendering scales that are not requested but still provide some safety measure against DoS.

@datakurre
Copy link
Member

@datakurre I'm not saying you are proposing this: you are only discussing about deterministic scales, but if one more step would be to generate the image based on decoding the information in the url without any security checks we need to be aware of DoS.

I don't propose decoding scale from the URL. Only that the scale URL remains the same as long as the input and parameters are the same.

In this thread is would fix the weird blob related issues.

Else where it would fix that when any update to scales after 24h clears old scales, the new scales would have the same URLs.

@MrTango
Copy link
Contributor

MrTango commented May 9, 2022

@jensens and I where discussing this also today and came to the same conclusion, that we would like to generate the URL in the front-end. Because than the browser will only trigger scaling for individual images and only for one scale at a time.
To archive this we where thinking about having a timestamp for each scale on the object and use it in the URL. Other than that the URL will be similar to what we had without the UID-based version.

@@images/image/TIMESTAMP/large

I also did some testing earlier today, to research what information we need.
We need the one pair of width/height for example from the original image when it's uploaded.
Would be also good to have that in the catalog metadata.
With this we have the original aspect ratio.
For the srcset to be generated on client side, we only need to put one pair, it could be even something like width="3" and height="2". The browser will adjust this to the real width for us, if height="auto" is set.

If we later set the width in CSS, that is fine.
It also works well together with max-with="100%".
The Browser will keep updating it's calculation and loads the fitting image from the list we provided.
It works well with "px" or "vw" on images, but not with "%", as the later will make the image jump around the others are not.

Asking for the huge scale on a 900x900 image would just return the same 900x900 image.
@jensens jensens changed the title [WIP] Image scales metadata Image scales catalog metadata Jun 14, 2022
@jensens
Copy link
Sponsor Member

jensens commented Jun 14, 2022

(I removed WIP from title: We have labels and draft state, its enough).

@mauritsvanrees mauritsvanrees marked this pull request as ready for review June 15, 2022 16:03
@mauritsvanrees
Copy link
Sponsor Member

I think this is ready for review.
@cekk, you started this. Can you check if this is still what you had in mind, and I did not break any expectations with my changes?
Does this implementation help Volto?

I have started Jenkins jobs for this PR together with plone/plone.namedfile#121 which uses the new catalog metadata in the @@image_scale view (not so well known I think, meant for listings of brains).

What is missing mostly is an upgrade step. Add the new metadata column. And I fear we need to wake up all objects and update the metadata. Do we have a volunteer?

Are we missing anything else?

@sneridagh
Copy link
Member

LGTM! Although I might also be missing something, I've been looking at this from the fence.
We need more feedback!

Copy link
Sponsor Member

@jensens jensens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After moving all parts to its respective packages as planned, I would give green light for merge here.

Comment on lines 1 to 23
from zope.interface import Interface


class IImageScalesAdapter(Interface):
"""
Return a list of image scales for the given context
"""

def __init__(context, request):
"""Adapts context and the request."""

def __call__():
""" """


class IImageScalesFieldAdapter(Interface):
""" """

def __init__(field, context, request):
"""Adapts field, context and request."""

def __call__():
"""Returns JSON compatible python data."""
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The interfaces here need to go to plone.base before merge. @cekk put them here to simplify prototyping. It was never planned to keep them here.

Products/CMFPlone/image_scales/adapters.py Outdated Show resolved Hide resolved
@mauritsvanrees
Copy link
Sponsor Member

This is ready for review, with code in the new places.
I have added a PLIP file to coredev buildout in plips/plip-image-scales-metadata.cfg, checking out branches of four packages, with the following PRs:

  • plone.base has the interfaces.
  • plone.namedfile has the namedfile field adapter, and it uses the metadata in the @@image_scale view on the portal or navigation root.
  • Products.CMFPlone (this PR) has the remaining code: the indexer and the image scales adapter for dexterity content.
  • plone.app.upgrade adds image_scales to the portal_catalog columns and recatalogs all brains.

Please review. I think this is the last blocker for a first beta of Plone 6!

mauritsvanrees added a commit to plone/jenkins.plone.org that referenced this pull request Jun 22, 2022
Let this replace the merged image srcsets PLIP.
See plone/Products.CMFPlone#3521
@ale-rt
Copy link
Member

ale-rt commented Jun 22, 2022

I think I already wrote that somewhere else but:

  • I do not have time to review it
  • I trust the work you guys did and I even saw the cool stuff coming out of this work

That's a +1 from my point of view.

It would be cool to have some other @plone/framework-team member or core contributor to say something about that.
If that does not happen consider this work approved.

@mauritsvanrees
Copy link
Sponsor Member

Jenkins job with all PRs is green.
Meanwhile the PLIP jobs can be run, and they are green as well: 3.8 and 3.9.

@ericof
Copy link
Sponsor Member

ericof commented Jun 22, 2022

Also a 👍🏼 from me

mister-roboto pushed a commit to plone/buildout.coredev that referenced this pull request Jun 22, 2022
Branch: refs/heads/main
Date: 2022-06-21T11:44:12+02:00
Author: Maurits van Rees (mauritsvanrees) <maurits@vanrees.org>
Commit: plone/plone.base@0cfbfd2

Add images interface with IImageScalesAdapter and IImageScalesFieldAdapter.

See plone/Products.CMFPlone#3521

Files changed:
A news/3521.feature
A src/plone/base/interfaces/images.py
M src/plone/base/interfaces/__init__.py
Repository: plone.base

Branch: refs/heads/main
Date: 2022-06-22T22:13:22+02:00
Author: Jens W. Klein (jensens) <jk@kleinundpartner.at>
Commit: plone/plone.base@d8f67b6

Merge pull request #13 from plone/image_scales_metadata

Add images interface with adapters for image scales

Files changed:
A news/3521.feature
A src/plone/base/interfaces/images.py
M src/plone/base/interfaces/__init__.py
@jensens jensens merged commit 1ade707 into master Jun 22, 2022
@jensens jensens deleted the image_scales_metadata branch June 22, 2022 20:15
@jensens
Copy link
Sponsor Member

jensens commented Jun 22, 2022

I merged all except the upgrade step. See my comment there. plone/plone.app.upgrade#292

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.