Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: unify data directory cli #1469

Merged
merged 1 commit into from
Sep 22, 2024
Merged

Conversation

morph-dev
Copy link
Collaborator

@morph-dev morph-dev commented Sep 20, 2024

What was wrong?

Currently, trin-execution binary doesn't support setting custom directory and trin supports it only via env variable. Also the code initializes directory is split.

How was it fixed?

Added data-dir command line flag to both trin and trin-execution, and extracted common logic into trin-utils crate.

  • if neither data-dir nor ephemeral flags are set, we use OS's default local data directory (same as now)
  • if only data-dir is set, we use that folder instead
  • if only ephemeral is set, we use OS's temporary directory, and create trin/trin-execution subfolder inside (same as now)
  • if both flags are set, it behaves the same as with ephemeral, but it uses provided directory instead of OS's temp directory

In the case of trin, I left an option for env variable (TRIN_DATA_PATH) to be used. It behaves the same as --data-dir was set. If both are set, we will throw error.

Other smaller refactorings:

  • convert to use &Path instead of PathBuf in several existing functions
  • created create_temp_test_dir function that unifies how temporary directories are created for test purposes
    • they are more use cases that I didn't catch in this PR, as this wasn't the primary purpose of this PR

To-Do

@morph-dev morph-dev added cleanup Tidy up existing code or remove dead code refactoring Code refactoring labels Sep 20, 2024
@morph-dev morph-dev self-assigned this Sep 20, 2024
@morph-dev morph-dev marked this pull request as ready for review September 20, 2024 14:57
Copy link
Member

@KolbyML KolbyML left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@@ -124,5 +124,6 @@ async fn test_we_can_generate_content_key_values_up_to_x() -> Result<()> {
trin_execution.database.storage_cache.clear();
println!("Block {block_number} finished. Total content key/value pairs: {content_pairs}");
}
temp_directory.close()?;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it's good practice to explicitly close this, but just so I understand, all the explicity closing you're doing isn't necessary correct? Since once temp_directory goes out of scope, the directory will be deleted

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but...

It's important to ensure that handles (like File and ReadDir) to files inside the directory are dropped before the TempDir goes out of scope. The TempDir destructor will silently ignore any errors in deleting the directory; to instead handle errors call TempDir::close.

So calling close() helps us identify if we're doing anything to stop the directory deletion from working.

Also, I wonder if the way we exit trin will let the destructor work. I haven't tinkered in the exit code for a while, so it would be nice to double-check that the new ephemeral directory disappears, at least after a clean trin exit.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I wonder if the way we exit trin will let the destructor work. I haven't tinkered in the exit code for a while, so it would be nice to double-check that the new ephemeral directory disappears, at least after a clean trin exit.

Currently (and after this pr), when running trin or trin-execution with "ephemeral" flag, the folder won't be deleted at exit. The way it works (at least in Linux) is that new directory will be created under /tmp/trin/ (/tmp/trin-execution), that OS will delete during next boot.

I actually like this behavior as it allows me to inspect database manually, or even restart the same node (e.g. using --data-dir), but still count on that data being deleted by the OS on next restart.

The way this works is that you can call TempDir::into_path that will destroy TempDir without deleting the directory. This is used both in trin and trin-execution. But also in couple of tests, where I think it shouldn't (it's easier to have utility function that will initialize the object you are testing, e.g. OverlayService, without having to keep track of TempDir).

I guess it's good practice to explicitly close this, but just so I understand, all the explicity closing you're doing isn't necessary correct? Since once temp_directory goes out of scope, the directory will be deleted

What @carver said, plus doing it explicitly means that you can't accidental call into_path on it (e.g. doing refactoring) that will result in not deleting it.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah right, I remember this debate about ephemeral, now. Yes, I see the value in being able to inspect. I also am slightly annoyed that running tests spams my /tmp, especially when running something like cargo watch -x 'test ...' which can run quite a few tests. (and I reboot my laptop ... irregularly 😆 ) But if I'm forced to choose, the inspection currently wins out, on balance.

I guess we could find some way to alter this behavior for just tests, but I'm not sure what that solution would look like, and if you don't see a quick way to do that, then I don't want to hold up the PR for it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my opinion, tests should delete their TempFile once they are done. But they need to be refactored to behave like that. I plan to look into this next week, during my 🦩 week.


And for completeness, as I believe I didn't explain this in details in the previous post.
If you have TempDir and you need PathBuf, you can do it in two ways:

  • temp_dir.into_path() - which will destroy TempDir and return PathBuf that you can use
    • this will not delete the directory at all
  • temp_dir.path().to_path_buf() - this returns &Path that is converted to PathBuf
    • this will delete temporary directory when temp_dir goes out of scope (or temp_dir.close() is called
    • in order for this to work correctly in test, you have to keep temp_dir as variable in the body of the test itself, otherwise directory might be deleted before test finishes (e.g. if temp_dir is used only in utility function that initializes everything)

false => None,
true => Some(setup_temp_dir()?),
};
let data_dir = setup_data_dir(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick not sure if it's confusing that we accept an env var in trin but not here... Maybe it's worth adding a check just to see if the env var is set, and displaying a warning that it's not being used? Or maybe not...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trin is using TRIN_DATA_PATH. Should we do a check for that one (which doesn't make much sense), or for TRIN_EXECUTION_DATA_PATH (which doesn't exist and is never mentioned)? Or should I come up with new, shared, name?

I'm more in favor of completely removing support for env vars, but @carver made an argument that he finds it useful, so I'm leaving it. And if we are leaving that one, I'm not against adding another one (e.g. TRIN_EXECUTION_DATA_PATH), and have the same behavior. Thoughts?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want to add an environment variable to trin-execution I think it should have a different variable name then for trin

shared, name

I would not be a fan of ^. Because you might be running both on the same system.

In any case Trin will likely be used as a library in both

  • portal bridge
  • Trin Execution

One day (not fully related, just sharing what I think)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I'm also inclined more towards different name, if we want to add one.

Copy link
Collaborator

@carver carver left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@morph-dev morph-dev merged commit 286adfb into ethereum:master Sep 22, 2024
9 checks passed
@morph-dev morph-dev deleted the data_dir_cli branch September 22, 2024 07:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cleanup Tidy up existing code or remove dead code refactoring Code refactoring
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants