Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

markdown file xgettext + gettext #92

Open
wants to merge 8 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 0 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
@@ -1,2 +1 @@
target/
*~
1 change: 1 addition & 0 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions i18n-helpers/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ description = "Plugins for a mdbook translation workflow based on Gettext."
[dependencies]
anyhow = "1.0.68"
chrono = { version = "0.4.31", default-features = false, features = ["alloc"] }
clap = "4.4.4"
mdbook = { version = "0.4.25", default-features = false }
polib = "0.2.0"
pulldown-cmark = { version = "0.9.2", default-features = false }
Expand Down
250 changes: 250 additions & 0 deletions i18n-helpers/src/bin/markdown-gettext.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,250 @@
use anyhow::{anyhow, bail, Context};
use clap::{Arg, ArgGroup, Command};
use mdbook_i18n_helpers::translate;
use polib::catalog::Catalog;
use polib::po_file;
use std::fs;
use std::fs::File;
use std::io::{BufReader, Read, Write};
use std::path::{Path, PathBuf};

fn build_catalog(lang: &str) -> anyhow::Result<Catalog> {
let pot_path = Path::new(lang);
let catalog = po_file::parse(pot_path)
.map_err(|err| anyhow!("{err}"))
.with_context(|| format!("Could not parse {:?} as PO file", pot_path))?;

Ok(catalog)
}

fn translate_files(
catalog: Catalog,
files: Vec<PathBuf>,
output_path: PathBuf,
) -> anyhow::Result<()> {
for file_path in files.iter() {
let content = File::open(file_path)
.with_context(|| format!("Failed to open file: {}", file_path.display()))?;
let mut buf_reader = BufReader::new(content);
let mut contents = String::new();
buf_reader
.read_to_string(&mut contents)
.with_context(|| format!("Failed to read file: {}", file_path.display()))?;

let translated_content = translate(&contents, &catalog);

let output_file_name = file_path.file_name().ok_or(anyhow!("Invalid file name"))?;
let output_file_path = output_path.join(output_file_name);

let mut output_file = File::create(&output_file_path)
.with_context(|| format!("Failed to create file: {}", output_file_path.display()))?;

output_file
.write_all(translated_content.as_bytes())
.with_context(|| format!("Failed to write to file: {}", output_file_path.display()))?;
friendlymatthew marked this conversation as resolved.
Show resolved Hide resolved
}
Ok(())
}

fn validate_file(path: &Path) -> bool {
path.is_file() && path.extension().map_or(false, |ext| ext == "md")
}

fn allocate_files(
dir_option: Option<&str>,
file_option: Option<&str>,
) -> anyhow::Result<Vec<PathBuf>> {
let mut valid_files = Vec::new();
match (dir_option, file_option) {
(Some(dir_path), None) => {
let full_dir = PathBuf::from(dir_path);
fs::read_dir(full_dir)?
.filter_map(Result::ok)
.filter(|entry| validate_file(&entry.path()))
.for_each(|entry| valid_files.push(entry.path()));
}
(None, Some(file_path)) => {
let full_file_path = PathBuf::from(file_path);
if !validate_file(&full_file_path) {
bail!("Markdown file does not exist: {}", full_file_path.display())
}
valid_files.push(full_file_path);
}
_ => unreachable!(),
}
Ok(valid_files)
}

fn build_output_path(output_path_option: Option<&str>) -> anyhow::Result<PathBuf> {
let output_path = output_path_option.unwrap_or("translated_md_files");
let path = PathBuf::from(output_path);
fs::create_dir_all(&path)
.with_context(|| format!("Failed to create directory: {}", path.display()))?;
Ok(path)
}

fn main() -> anyhow::Result<()> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit in doubt about the best command line interface for this...

One idea would be to allow

markdown-gettext xx.po foo.md -o foo-xx.md

which would translate foo.md using xx.po and write the output to foo-xx.md.

Here, foo.md could be left out or it could be - and then we read from stdin (like for markdown-xgettext below). Similar, -o can be left out or be - and then we write to stdout.

If there are multiple inputs, then I guess the user cannot write to stdout and must instead give an output directory:

markdown-gettext xx.po foo.md bar.md --output-dir xx

In this case it would be an error to attempt to read from stdin since we don't really know what to name the output file in that case.

Copy link
Contributor Author

@friendlymatthew friendlymatthew Oct 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm understanding correctly, we want optional flags for output file title and directory.

Since markdown-gettext xx.po foo.md -o foo-xx.md, does this mean if one was to add an additional wef.md, would the command be: markdown-gettext xx.po foo.md wef.md -o foo-xx.md wef-xx.md?

Output directory makes sense to me 👍. I would just like some clarification on file naming conventions, especially when one wants to translate multiple files.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm understanding correctly, we want optional flags for output file title and directory.

Yeah, I'm trying to figure out how we can model the behavior here to be similar to existing Unix commands.

We should also keep it simple! Less code for all of us to write and maintain 😄

I'm thinking of using cp as a basis:

  • cp can copy a single path to a new name or
  • cp can copy multiple path into a single directory

The paths which are copied can be files or directories (or a mix of both). If there is a directory, then you must add -r to do a recursive copy.

I now see

markdown-gettext xx.po source.md dest.md

as

cp source.md dest.md

but with the translation happening as well.

Similarly,

markdown-gettext xx.po foo.md bar.md dest/

as

cp foo.md bar.md dest/

Finally, the rule out -r should be implemented: if you want to recursively translate a directory, you have to add a -r flag.

This new thinking is different from what I had before where I thought we should use the flags from the GNU Gettext utilities. Back then, I hadn't realized that there isn't a command line program which mirrors what our markdown-gettext program will do 😄

Note that these are just ideas, please let me know if you have better ones!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mgeisler I'll write up a doc that outlines what the CLI should do for both xgettext and gettext. Once we get everything to your liking, I'll change the binaries to follow those requirements!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds great! That document could simultaneously become the manual for the tools, so your work here will be reused right away.

Copy link
Contributor Author

@friendlymatthew friendlymatthew Oct 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the writeup for gettext

markdown-gettext

This tool copies source paths to destination paths while performing translations.

Usage

For a single file translation:

md-gettext lang.po source.md dest.md

note for single file translations:

  • if source.md is not specified, the program reads from stdin
  • if dest.md is not specified, the program outputs to stdout

For multiple paths:

md-gettext lang.po foo.md bar.txt wef.md dest/

For directories, you must use the -r flag for a recursive translation:

md-gettext lang.po sourcedir/ destdir/ -r

Arguments

  1. xx.po

    • the translation file in .po format
  2. source.md foo.md bar.txt sourcedir/ etc...

    • source paths which can be a mix of files and directories
  3. dest.md, dest/

    • destination paths

Options

  1. -r
    • enables recursive translation for directories
    • must be added if you want to translate contents of a directory and its subdirectories

Please let me know what you think! @mgeisler @djmitche. I tried to be as simple as possible.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @friendlymatthew, sorry for the slow reply. The above looks great, thank you very much for writing it up! I think it follow the behavior from the cp manual page closely.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No worries @mgeisler, I appreciate the opportunity to work on this 😁.

When I have some time off from official work, I'll get on this ASAP

let matches = Command::new("markdown-gettext")
.about("markdown translator binary for mdbook-i18n-helpers")
friendlymatthew marked this conversation as resolved.
Show resolved Hide resolved
.arg(Arg::new("input_dir").short('d').long("dir"))
.arg(Arg::new("file").short('f').long("file"))
.arg(Arg::new("po").short('p').long("po").required(true))
.arg(Arg::new("output_dir").short('o').long("out"))
.group(
ArgGroup::new("input_source")
.args(["input_dir", "file"])
.required(true)
.multiple(false),
)
.get_matches();

let lang = matches.get_one::<String>("po").unwrap().as_str();
let dir_option = matches.get_one::<String>("input_dir").map(|d| d.as_str());
let file_option = matches.get_one::<String>("file").map(|f| f.as_str());
let output_path_option = matches.get_one::<String>("output_dir").map(|d| d.as_str());

let files = allocate_files(dir_option, file_option)?;
let catalog = build_catalog(lang)?;
let output_path = build_output_path(output_path_option)?;

translate_files(catalog, files, output_path)?;

Ok(())
}

#[cfg(test)]
mod tests {
use super::*;
use tempfile::TempDir;

static MARKDOWN_CONTENT: &str = "# How to Foo from a specified file path\n\
\n\
The first paragraph about Foo.\n\
Still the first paragraph.*baz*\n";

static INVALID_CONTENT: &str = "<p> This is not markdown content </p>";
friendlymatthew marked this conversation as resolved.
Show resolved Hide resolved

fn create_temp_directory(
content: &str,
file_title: &str,
) -> anyhow::Result<(TempDir, PathBuf)> {
let tmp_dir = tempfile::tempdir()?;
let file_path = tmp_dir.path().join(file_title);

fs::write(&file_path, content)?;

Ok((tmp_dir, file_path))
}

#[test]
fn test_allocate_files_with_dir() -> anyhow::Result<()> {
let tmp_dir = tempfile::tempdir()?;
fs::write(tmp_dir.path().join("valid.md"), MARKDOWN_CONTENT)?;
fs::write(tmp_dir.path().join("invalid.html"), INVALID_CONTENT)?;
fs::write(tmp_dir.path().join("ibid.txt"), INVALID_CONTENT)?;

let valid_files = allocate_files(Some(tmp_dir.path().to_str().unwrap()), None)?;
assert_eq!(valid_files.len(), 1);
tmp_dir.close()?;
Ok(())
}

#[test]
fn test_allocate_files_valid_file() -> anyhow::Result<()> {
let (temp_dir, valid_file) = create_temp_directory(MARKDOWN_CONTENT, "test.md")?;
let valid_files = allocate_files(None, Some(valid_file.to_str().unwrap()))?;
assert_eq!(valid_files.len(), 1);

temp_dir.close()?;
Ok(())
}

#[test]
fn test_allocate_files_invalid_file() -> anyhow::Result<()> {
let (tmp_dir, invalid_file) = create_temp_directory(INVALID_CONTENT, "wef.html")?;
let result = allocate_files(None, Some(invalid_file.to_str().unwrap()));

assert!(result.is_err());

tmp_dir.close()?;

Ok(())
}

#[test]
fn test_validate_file() -> anyhow::Result<()> {
let (md_dir, md_path) = create_temp_directory(MARKDOWN_CONTENT, "test.md")?;
let (html_dir, html_path) = create_temp_directory(INVALID_CONTENT, "test.html")?;

assert!(validate_file(&md_path));
assert!(!validate_file(&html_path));

md_dir.close()?;
html_dir.close()?;

Ok(())
}

#[test]
fn test_build_catalog_nonexistent_file() {
let result = build_catalog("nonexistent.po");
assert!(result.is_err());
}

#[test]
fn test_find_output_path_not_a_directory() -> anyhow::Result<()> {
let tmp_dir = tempfile::tempdir()?;
let file_path = tmp_dir.path().join("wef.txt");
File::create(&file_path)?;

let result = build_output_path(Some(&file_path.to_str().unwrap().to_string()));
assert!(result.is_err());

tmp_dir.close()?;

Ok(())
}

#[test]
fn test_find_output_path_default() -> anyhow::Result<()> {
let default_output_path = build_output_path(None)?;
let dir_output_path = build_output_path(Some("wef_dir"))?;

assert_eq!(default_output_path, PathBuf::from("translated_md_files"));
assert_eq!(dir_output_path, PathBuf::from("wef_dir"));
assert!(default_output_path.is_dir() && dir_output_path.is_dir());

fs::remove_dir_all("translated_md_files")?;
fs::remove_dir_all("wef_dir")?;

Ok(())
}

#[test]
fn test_find_output_path_given_existing_dir() -> anyhow::Result<()> {
let tmp_dir = tempfile::tempdir()?;
let output_path = build_output_path(Some(tmp_dir.path().to_str().unwrap()))?;
assert_eq!(output_path, tmp_dir.path());
assert!(output_path.is_dir());

tmp_dir.close()?;

Ok(())
}

#[test]
fn test_find_output_path_given_invalid() -> anyhow::Result<()> {
let tmp_dir = tempfile::tempdir()?;
let tmp_file = tmp_dir.path().join("temp.md");
File::create(&tmp_file)?;

let file_result = build_output_path(Some(tmp_file.to_str().unwrap()));
let rogue_result = build_output_path(Some("\0"));

assert!(file_result.is_err() && rogue_result.is_err());

tmp_dir.close()?;

Ok(())
}
}
Loading