Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

List of Invalid/Binary files handling not exhaustive #3

Open
agamvrinos opened this issue Jun 17, 2020 · 0 comments
Open

List of Invalid/Binary files handling not exhaustive #3

agamvrinos opened this issue Jun 17, 2020 · 0 comments
Labels
enhancement New feature or request

Comments

@agamvrinos
Copy link
Owner

When reading a codebase to create the initial Clone Index, the application skips binary and invalid (non-unicode) files. This happens on the basis of a list that includes the extensions that should be skipped. This list is not exhaustive though, in the sense that there might be binary file extensions that have not been included.

In such a case, the issue would be the following:

  1. The application would ignore the file due to the inability to read it (an exception is thrown but the application continues by ignoring the file)
  2. A commit includes changes that affect the specific file. For instance, a .jpg file (assuming .jpg is not handled, which is) was renamed.
  3. The application tries to find the file in the index, but since it was not processed when the codebase was read, it fails.

Possible Solutions

  1. Prior to reading it, try to do an initial check to see if a file is binary or invalid. For binary files, there are libraries that do this but they do so probabilistically, so these are still not suitable.
  2. Update the application to only consider file extensions that form the majority of the codebase. For example for a Java-based project, only consider .java extensions or maybe a combination of extensions in case the project uses multiple languages.
@agamvrinos agamvrinos added the enhancement New feature or request label Jun 17, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant