Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add a batch option #94

Closed
kosloot opened this issue Mar 12, 2024 · 6 comments
Closed

add a batch option #94

kosloot opened this issue Mar 12, 2024 · 6 comments

Comments

@kosloot
Copy link
Contributor

kosloot commented Mar 12, 2024

We should examine the possibility to run Ucto on a group of files, using wildcards or from a subdir.

Considering the small overhead of starting Ucto over and over again, it was never an issue.
But when running a Docker instance of ucto it might become cumbersome to do that 1 file at a time.

Batch mode should not really be a problem. Of course with some limitations, regarding options.

@martinreynaert thanks for the idea

@kosloot
Copy link
Contributor Author

kosloot commented Mar 13, 2024

Thinking about this: A logic next request would be to process several files in parallel.
This requires a lot of refactoring of the current implementation, but it is doable.

An unpleasant detail is how NOT to break the current working of ucto, where you can have ONE filename on the command line, for input; and optionally a SECOND one, for output.

Simplest solution might be a --batch option that changes this behavior, an takes ALL files on the command line as input.
This requires a way to automatically determine the name of the output files then.
Maybe also an option to set the output directory (and input directory?) might improve using Ucto too.

@proycon and @martinreynaert comments welcome!

@proycon
Copy link
Member

proycon commented Mar 13, 2024

A --batch option sounds nice.. perhaps it can also detect whether it is a file or directory and work from there?

@kosloot
Copy link
Contributor Author

kosloot commented Mar 13, 2024

that look good at first sight. but what if you don't want to run on ALL files in that directory?
Imagine you do something like ucto -Lnld dir1/test.txt dir2/ dir3/a*.txt
The intention is to run on 1 file in dir1, ALL files in dir2 (implicit!), and some files in dir3 (expanded by the shell)
This gets VERY complicated, and we need to find a way that is both simple to use and understand.

@martinreynaert
Copy link

Hi,
I would not take things as far as suggested in the last update here.
Far easier to restrict things to a single dir. Also, if one happens to have files in separate directories that need uctoing, one would do best to run them separately, moving the lot into background. That way, one gets parallel processing for free.

Another thing that 'should' be possible is to also set the ID of the elements. For this I usually take the file name, stripped of its extension(s).

Thanks!

@kosloot
Copy link
Contributor Author

kosloot commented Apr 11, 2024

A version of Ucto implementing batch processing is now available in Git
New options:

  • -B to enable batch mode
  • -O to name an output directory (required)
  • -I to give an input directory (optional)

also xml:id is now generated using the name of the input file, when possible
the --id= option is no longer required (and forbidden in batch mode)

@kosloot
Copy link
Contributor Author

kosloot commented Apr 30, 2024

released

@kosloot kosloot closed this as completed Apr 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants