Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for Non-stay_open and Non-singleton Mode in Cloud Function Environments? #190

Closed
gowy222 opened this issue Jul 14, 2024 · 2 comments

Comments

@gowy222
Copy link

gowy222 commented Jul 14, 2024

Hi,

I'm using exiftool-vendored in a cloud function environment
(There's no need to worry about the underlying Perl dependencies, as many serverless primitives can be configured to prioritize their installation.)

and I'm facing some challenges with the current implementation. I'd like to question a feature or configuration option that allows for a simpler, stateless execution mode. Here's the context and rationale:

  1. Cloud Function Environment Characteristics:

    • Unpredictable lifecycle
    • Cold/hot start mechanisms
    • Function instance reuse varies between cloud providers
    • High concurrency scenarios
  2. Current Concerns:

    • The end() call can be problematic in this environment
    • Potential conflicts at lifecycle critical points in high concurrency scenarios
    • Difficulty in managing stay_open and singleton modes effectively
  3. Proposed Solution:

    • A configuration option for new ExifTool() that enables a simple, process-per-invocation mode
    • Each cloud function invocation would spawn a new, independent ExifTool process
    • No need for explicit end() calls or process/thread pool management
    • The process terminates naturally with the function invocation
  4. Rationale:

    • Safer and more predictable behavior in cloud environments
    • Eliminates concerns about proper resource cleanup
    • Process overhead is negligible for cloud functions running on hundreds or thousands of CPU clusters
    • Simplifies usage in serverless and highly concurrent scenarios

Would it be possible to add an option to new ExifTool() that enables this kind of straightforward, use-and-discard process mode? This would greatly simplify usage in cloud function environments and potentially other scenarios where managing long-lived processes is challenging.

If possible, could provide an initialization code sample?
Thx!

@mceachen
Copy link
Member

Process forking in “serverless” setups can be problematic, which in turn makes this library’s architecture problematic.

If I was going to implement this, we would completely skip batch-cluster and use the already-set-up process factory (without stay_open, as you mentioned).

I would expect that per-call latency could jump to 200ms-1.5s.

I’m not familiar with a way to simulate a serverless system within GHA, so testing it and reproducing any issues that arise would be problematic. If there is indeed a way to move forward with testing, I’d be more positive towards this feature.

@mceachen
Copy link
Member

mceachen commented Jul 22, 2024

I've updated the constructors for the ReadTask and WriteTask to be public, and the .parse() methods to be public as well.

You can access the path to exiftool via await require("exiftool-vendored").exiftool.exiftoolPath(), fork the process yourself with whatever arguments you need, and pass stdout to ReadTask, and get what you're wanting.

I'm not going to add "official" support for cloudless, though, given how I don't know how to test it rigorously with GHA.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants