Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

langchain[minor]: Multi-file loader #5584

Merged

Conversation

theogravity
Copy link
Contributor

Introduces the MultiFileLoader, which behaves like the DirectoryLoader, except the input is a list of file paths instead of a single directory path.

@dosubot dosubot bot added the size:L This PR changes 100-499 lines, ignoring generated files. label May 28, 2024
Copy link

vercel bot commented May 28, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
langchainjs-api-refs ✅ Ready (Inspect) Visit Preview 💬 Add feedback Jun 25, 2024 6:06pm
langchainjs-docs ✅ Ready (Inspect) Visit Preview Jun 25, 2024 6:06pm

@dosubot dosubot bot added the auto:enhancement A large net-new component, integration, or chain. Use sparingly. The largest features label May 28, 2024
import { Document } from "@langchain/core/documents";
import { getEnv } from "@langchain/core/utils/env";
import { BaseDocumentLoader } from "../base.js";
import { type LoadersMapping, UnknownHandling } from "./directory.js";
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if these imports should be refactored / moved to some common file instead of being imported from directory.js.

Open to suggestions.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fine

*
* ```
*/
export class MultiFileLoader extends BaseDocumentLoader {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This implementation shares a lot in common with DirectoryLoader that I wonder if this is something that can be moved to a new base class so there is reuse.

* indicating that the modules failed to load.
* @returns A promise that resolves to an object containing the imported functions.
*/
static async imports(): Promise<{
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't have to do this anymore, you can just import them at the top as normal:

https://github.com/langchain-ai/langchainjs/blob/main/.github/contributing/INTEGRATIONS.md

Copy link
Contributor Author

@theogravity theogravity Jun 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was pretty much copied from the directory loader. Should I also remove it from there as well?

I can update the config file accordingly as mentioned in the doc you posted.

@jacoblee93
Copy link
Collaborator

Thanks for the PR!

Is this materially different from calling the DirectoryLoader in a loop? If it doesn't support individual files, would it make more sense to add that functionality?

@jacoblee93 jacoblee93 added the question Further information is requested label May 31, 2024
@theogravity
Copy link
Contributor Author

theogravity commented May 31, 2024

It's a completely different use-case. I have files from different directories that I may pick and choose from vs entire directories, so looping with the DirectoryLoader wouldn't work as that is on a directory-basis.

I'm trying to build something for myself where I feed in source code files that maybe have imports from different paths so I don't need a full directory read.

@jacoblee93 jacoblee93 changed the title Multi-file loader langchain[minor]: Multi-file loader Jun 25, 2024
@jacoblee93 jacoblee93 added lgtm PRs that are ready to be merged as-is and removed question Further information is requested labels Jun 25, 2024
@jacoblee93
Copy link
Collaborator

Thanks! Sorry for delay

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto:enhancement A large net-new component, integration, or chain. Use sparingly. The largest features lgtm PRs that are ready to be merged as-is size:L This PR changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants