Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Implement file cache #268

Merged

Conversation

zachfeldman
Copy link

Building on #191, this attempts to finish up the implementation of a file cache. Also adds associated tests for cli class and new tests for cache class.

@zachfeldman zachfeldman marked this pull request as ready for review August 15, 2022 00:16
@zachfeldman zachfeldman mentioned this pull request Aug 15, 2022
3 tasks
Copy link

@ChrisBr ChrisBr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great to me 👍

@file_loader = file_loader
@hits = []
@new_results = []
puts "Cache mode is on"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure how this is handled in the rest of the library but maybe we should pass in an io or logger so we could silence these outputs?

Something like

def initialize(config, file_loader = nil, logger = STDOUT)
  # ...
  @logger = logger
  @logger.puts "Cache mode is on"
end

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems like a nice addition, the rest of the library though uses puts to output to $stdout though. I did try the code you recommended, but I'm not really sure if that allows an easy way to silence output (at least I couldn't Google it, and couldn't make it happen running the command without output unless I redirect to > /dev/null). Do you have a good example of this pattern in another lib? I could also go for a full blown logger instance with debug levels for most of the stuff in this file but then I feel like I'd want to apply that to all puts statements in the project.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the rest of the library uses puts this is fine.

To silence it in tests you can pass in a StringIO like

io = StringIO.new

Foo.new(io: io)

io.string

mode = File.stat(file).mode

digester.update(
"#{file}#{mode}#{config.to_hash}"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should put the library version into the cache key too?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call. Added a commit to do that.

cache.clear
success!("cache directory cleared")
else
failure!("cache directory already doesn't exist, skipped deletion.")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sentence doesn't sound like correct english?

Suggested change
failure!("cache directory already doesn't exist, skipped deletion.")
failure!("cache directory doesn't exist, skipped deletion.")

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah not sure how that ended up there, probably some copy pasta - fixed in the latest update.

Comment on lines 147 to 156
def prune_cache?
@options[:prune_cache]
end
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we can also be a bit more smart here? Something what Rubocop does with max files in cache and then we prune the oldest when it's getting higher.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We definitely could but, naively, you can just run erblint <files> --with-cache --prune-cache every time and the code as written will remove outdated cache files only. Maybe we can leave more smart cache pruning for a future PR?

@files = @options[:stdin] || dupped_args

load_config

@cache = Cache.new(@config, file_loader) if with_cache? || clear_cache?
if clear_cache?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if clear_cache?
if clear_cache?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made my own commit to fix these newline suggestions!


def run_on_file(runner, filename)
file_content = read_content(filename)
if with_cache? && !autocorrect?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if with_cache? && !autocorrect?
if with_cache? && !autocorrect?

else
file_content = run_with_corrections(runner, filename, file_content)
end
log_offense_stats(runner, filename)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
log_offense_stats(runner, filename)
log_offense_stats(runner, filename)

end

def run_using_cache(runner, filename, file_content)
if cache.include?(filename) && !autocorrect?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method is never called with autocorrect? so we don't need to check again.

Suggested change
if cache.include?(filename) && !autocorrect?
if cache.include?(filename)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! I removed it in the latest update.

mode = File.stat(file).mode

digester.update(
"#{file}#{mode}#{config.to_hash}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should.

@zachfeldman
Copy link
Author

Thanks for your comments @ChrisBr and @rafaelfranca ! I'm hoping to address them one night this evening or this weekend since I'm working on this PR in my spare time.

@zachfeldman
Copy link
Author

@ChrisBr @rafaelfranca thanks for the comments, I think I've replied to all of them. Are either of you able to approve me to run the CI workflow against my branch?

@zachfeldman zachfeldman force-pushed the zfeldman/cbruckmayer/implement-file-cache branch from 7b7ff2b to bc2193e Compare August 20, 2022 02:11
@zachfeldman zachfeldman changed the title Implement file cache [Feature] Implement file cache Aug 20, 2022
@zachfeldman
Copy link
Author

@rafaelfranca I think the tests should be fixed, I believe they were failing on CI because the checksum was slightly different there vs locally. I mocked out some of the methods it uses to generate the checksum instead and everything is passing for me locally still.

@zachfeldman zachfeldman force-pushed the zfeldman/cbruckmayer/implement-file-cache branch 2 times, most recently from f49bda2 to fe68af5 Compare August 20, 2022 15:33
Copy link

@ChrisBr ChrisBr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice 👏

@zachfeldman
Copy link
Author

@rafaelfranca or @etiennebarrie are you able to run CI for me? I have some time today to fix things up if there are still failures 😃

@zachfeldman zachfeldman force-pushed the zfeldman/cbruckmayer/implement-file-cache branch from fe68af5 to 1a5db05 Compare August 30, 2022 00:49
@zachfeldman
Copy link
Author

Rebased against master, specs were passing before the rebase but just want to stay up to date.

@ChrisBr
Copy link

ChrisBr commented Aug 30, 2022

@rafaelfranca do you want to have another look and 🚢 ?

@zachfeldman
Copy link
Author

Hey good morning ya'll! @etiennebarrie are you able to give this a review as another maintainer that seemingly has write access? I know @rafaelfranca can get pretty busy with lots of projects under his belt, so just exploring if there's another way to move this forward in the meantime. Thanks for your consideration!

Copy link
Member

@etiennebarrie etiennebarrie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not seeing the exceptions I reported last time.

We do have cops that error, and when you use the cache you don't get the error anymore, but I'm not sure we need to do something about it.

Tiny feedback about the name: I wish it was --cache rather than --with-cache. Also should we enable pruning by default? What's the reason for not pruning?

lib/erb_lint/cli.rb Outdated Show resolved Hide resolved
@zachfeldman zachfeldman requested review from eterry1388 and etiennebarrie and removed request for eterry1388 and etiennebarrie October 13, 2022 12:00
@zachfeldman
Copy link
Author

@etiennebarrie great! As mentioned before the cache is opt-in, so I'm not too concerned about that - subsequent pull requests can be made if people aren't happy with what information is stored/restored for their specific linter/use case, but I think this is far beyond a basic skeleton to add to for this feature at this point. I think I'm about ~25 hours of work in.

I changed the name to cache (this was a holdover from the first implementation https://github.com/Shopify/erb-lint/pull/191/files#diff-1156d85dde58090796995118a6420fe59feb9f48234b5f550f3b9ed2b2ea23a6R131). I also enabled pruning by default - I agree that it makes sense to do so, it just adds slightly more risk to those running the tool for the first time, but it'll probably cause even more problems later on to not have it enabled by default.

I also added a README. Do you have any other concerns I can address? Thanks for your time.

@etiennebarrie
Copy link
Member

I also enabled pruning by default

For comparison, I checked what Rubocop does, it prunes not on the files that didn't have a hit but on the total number of cache files, and removes the oldest ones. But it must make it possible to use caching while linting a single file without pruning all the other cached results (locally in dev typically).

it just adds slightly more risk to those running the tool for the first time

How so?

Do you have any other concerns I can address?

  • One thing I noticed is that we use the ability to tweak the Rubocop cache directory. It's not so much of a concern, more like a new feature that can be added in another PR.

  • I don't think we should display "Skipping deletion of new cache result" or "Cleaning deleted cached file with checksum". It just adds to the output without much value.


Other than that I tested it on our app and it works well.

@zachfeldman
Copy link
Author

zachfeldman commented Oct 17, 2022

Great thanks @etiennebarrie ! I haven't had time yet to get back to look at those suggestions like I thought I would this weekend (man, the autumn is really beautiful here in New York so it's hard to want to be on my laptop) but I might some night this week or next weekend.

I wasn't clear on if these last comments are enough for you to withhold your approval/merging of this to main, or if these are just nice to haves - I think we just need you specifically to approve this to merge it at this point, since we have 2 other approvals and you're a maintainer, right? Meanwhile maybe we can merge this so more people can test it and make some of your suggestions in follow up PRs since this one is already pretty huge (perhaps indicated by the fact that there's already been a PR merged to this PR :)) and has been open for over 2 months now? Unless there's anything else in it that makes you uncomfortable/worried about a merge.

@etiennebarrie
Copy link
Member

I wasn't clear on if these last comments are enough for you to withhold your approval

I'm just isn't clear about the risk that pruning adds?

it just adds slightly more risk to those running the tool for the first time

How so?

And I don't like the output of all files pruned

  • I don't think we should display "Skipping deletion of new cache result" or "Cleaning deleted cached file with checksum". It just adds to the output without much value.

The rest can be tackled later IMO.

@zachfeldman
Copy link
Author

zachfeldman commented Oct 18, 2022

Great @etiennebarrie , see e082855 to reduce the output of the pruning process. Ideally there'd be some "verbose" mode for erb-lint I could optionally hook that output into but again I'm just trying to get to a merge at this point.

Regarding the pruning - my point about risk was mostly a general one. All of this code is pretty new, anything we can do to reduce the risk of those using it is good IMHO. Sometimes it's better to just try to do less - ideally I'd want to put out the --cache and --prune options and let people experiment with them before running them both at the same time. But a few people have tried this branch and seemingly not reported any issues I'm not thinking of, and the code is well unit tested, so it's probably fine. As I keep pointing out all of this should be opt-in anyway and not affect anyone not using the cache.

Appreciate the time you've spent on the review so far, it's definitely made the branch better.

@zachfeldman
Copy link
Author

Hi @etiennebarrie , just following up, do you think it would be possible to get this approved and merged sometime this week if I've addressed your remaining concerns sufficiently? Glad to discuss anything else on your mind or try to recruit more people to test this branch if you'd like more real-world examples of it working for people!

@etiennebarrie
Copy link
Member

I think we're good! Thanks a lot. I think we'll want another PR to make the directory configurable and then we can ship a new version.

@etiennebarrie etiennebarrie merged commit 3f14d38 into Shopify:main Oct 26, 2022
@zachfeldman
Copy link
Author

You're welcome @etiennebarrie and again super thanks for your help picking this one up and reviewing it! Your suggestions were very helpful and really ensured a high-quality diff was merged. Also props to @ChrisBr and @flavorjones for the foundational work here, @rafaelfranca for your initial review, and @eterry1388 and @joshuapinter for testing out the code and even providing a patch along the way.

I'm not sure if I'll have time for the change mentioned to make the cache file directory configurable soon, but will comment here or on the issue about the cache if I'm able to get to it.

@ChrisBr
Copy link

ChrisBr commented Oct 26, 2022

The power of open source 😄 Good work @zachfeldman and I'm glad this made it into erb-lint finally.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants