Skip to content

Commit

Permalink
core[minor]: Add LangSmith doc loader (#6568)
Browse files Browse the repository at this point in the history
* core[minor]: Add LangSmith doc loader

* tests and nits

* add test and docs

* rename
  • Loading branch information
bracesproul committed Aug 21, 2024
1 parent af25a44 commit 465bbfb
Show file tree
Hide file tree
Showing 6 changed files with 569 additions and 0 deletions.
Original file line number Diff line number Diff line change
@@ -0,0 +1,302 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"---\n",
"sidebar_label: FireCrawl\n",
"\n",
"---"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# LangSmithLoader\n",
"\n",
"This notebook provides a quick overview for getting started with the [LangSmithLoader](/docs/integrations/document_loaders/). For detailed documentation of all `LangSmithLoader` features and configurations head to the [API reference](https://api.js.langchain.com/classes/_langchain_core.document_loaders_langsmith.LangSmithLoader.html).\n",
"\n",
"## Overview\n",
"### Integration details\n",
"\n",
"| Class | Package | Local | Serializable | [PY support](https://python.langchain.com/docs/integrations/document_loaders/langsmith)|\n",
"| :--- | :--- | :---: | :---: | :---: |\n",
"| [LangSmithLoader](https://api.js.langchain.com/classes/_langchain_core.document_loaders_langsmith.LangSmithLoader.html) | [@langchain/community](https://api.js.langchain.com/classes/_langchain_core.html) | ✅ | beta | ✅ | \n",
"### Loader features\n",
"| Source | Web Loader | Node Envs Only\n",
"| :---: | :---: | :---: | \n",
"| LangSmithLoader | ✅ | ❌ | \n",
"\n",
"[FireCrawl](https://firecrawl.dev) crawls and convert any website into LLM-ready data. It crawls all accessible sub-pages and give you clean markdown and metadata for each. No sitemap required.\n",
"\n",
"FireCrawl handles complex tasks such as reverse proxies, caching, rate limits, and content blocked by JavaScript. Built by the [mendable.ai](https://mendable.ai) team.\n",
"\n",
"This guide shows how to scrap and crawl entire websites and load them using the `LangSmithLoader` in LangChain.\n",
"\n",
"## Setup\n",
"\n",
"To access the LangSmith document loader you'll need to install `@langchain/core`, create a [LangSmith](https://langsmith.com/) account and get an API key.\n",
"\n",
"### Credentials\n",
"\n",
"Sign up at https://langsmith.com and generate an API key. Once you've done this set the `LANGSMITH_API_KEY` environment variable:\n",
"\n",
"```bash\n",
"export LANGSMITH_API_KEY=\"your-api-key\"\n",
"```\n",
"\n",
"### Installation\n",
"\n",
"The `LangSmithLoader` integration lives in the `@langchain/core` package:\n",
"\n",
"```{=mdx}\n",
"import IntegrationInstallTooltip from \"@mdx_components/integration_install_tooltip.mdx\";\n",
"import Npm2Yarn from \"@theme/Npm2Yarn\";\n",
"\n",
"<IntegrationInstallTooltip></IntegrationInstallTooltip>\n",
"\n",
"<Npm2Yarn>\n",
" @langchain/core\n",
"</Npm2Yarn>\n",
"\n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Create example dataset\n",
"\n",
"For this example, we'll create a new dataset which we'll use in our document loader."
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [],
"source": [
"import { Client as LangSmithClient } from 'langsmith';\n",
"import { faker } from \"@faker-js/faker\";\n",
"\n",
"const lsClient = new LangSmithClient();\n",
"\n",
"const datasetName = \"LangSmith Few Shot Datasets Notebook\";\n",
"\n",
"const exampleInputs = Array.from({ length: 10 }, (_, i) => ({\n",
" input: faker.lorem.paragraph(),\n",
"}));\n",
"const exampleOutputs = Array.from({ length: 10 }, (_, i) => ({\n",
" output: faker.lorem.sentence(),\n",
"}));\n",
"const exampleMetadata = Array.from({ length: 10 }, (_, i) => ({\n",
" companyCatchPhrase: faker.company.catchPhrase(),\n",
"}));\n",
"\n",
"await lsClient.deleteDataset({\n",
" datasetName,\n",
"})\n",
"\n",
"const dataset = await lsClient.createDataset(datasetName);\n",
"\n",
"const examples = await lsClient.createExamples({\n",
" inputs: exampleInputs,\n",
" outputs: exampleOutputs,\n",
" metadata: exampleMetadata,\n",
" datasetId: dataset.id,\n",
"});"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [],
"source": [
"import { LangSmithLoader } from \"@langchain/core/document_loaders/langsmith\"\n",
"\n",
"const loader = new LangSmithLoader({\n",
" datasetName: \"LangSmith Few Shot Datasets Notebook\",\n",
" // Instead of a datasetName, you can alternatively provide a datasetId\n",
" // datasetId: dataset.id,\n",
" contentKey: \"input\",\n",
" limit: 5,\n",
" // formatContent: (content) => content,\n",
" // ... other options\n",
"})"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Load"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"{\n",
" pageContent: 'Conventus supellex aegrotatio termes. Vapulus abscido ubi vita coadunatio modi crapula comparo caecus. Acervus voluptate tergeo pariatur conor argumentum inventore vomito stella.',\n",
" metadata: {\n",
" id: 'f1a04800-6f7a-4232-9743-fb5d9029bf1f',\n",
" created_at: '2024-08-20T17:01:38.984045+00:00',\n",
" modified_at: '2024-08-20T17:01:38.984045+00:00',\n",
" name: '#f1a0 @ LangSmith Few Shot Datasets Notebook',\n",
" dataset_id: '9ccd66e6-e506-478c-9095-3d9e27575a89',\n",
" source_run_id: null,\n",
" metadata: {\n",
" dataset_split: [Array],\n",
" companyCatchPhrase: 'Integrated solution-oriented secured line'\n",
" },\n",
" inputs: {\n",
" input: 'Conventus supellex aegrotatio termes. Vapulus abscido ubi vita coadunatio modi crapula comparo caecus. Acervus voluptate tergeo pariatur conor argumentum inventore vomito stella.'\n",
" },\n",
" outputs: {\n",
" output: 'Excepturi adeptio spectaculum bis volaticus accusamus.'\n",
" }\n",
" }\n",
"}\n"
]
}
],
"source": [
"const docs = await loader.load()\n",
"docs[0]"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"{\n",
" id: 'f1a04800-6f7a-4232-9743-fb5d9029bf1f',\n",
" created_at: '2024-08-20T17:01:38.984045+00:00',\n",
" modified_at: '2024-08-20T17:01:38.984045+00:00',\n",
" name: '#f1a0 @ LangSmith Few Shot Datasets Notebook',\n",
" dataset_id: '9ccd66e6-e506-478c-9095-3d9e27575a89',\n",
" source_run_id: null,\n",
" metadata: {\n",
" dataset_split: [ 'base' ],\n",
" companyCatchPhrase: 'Integrated solution-oriented secured line'\n",
" },\n",
" inputs: {\n",
" input: 'Conventus supellex aegrotatio termes. Vapulus abscido ubi vita coadunatio modi crapula comparo caecus. Acervus voluptate tergeo pariatur conor argumentum inventore vomito stella.'\n",
" },\n",
" outputs: { output: 'Excepturi adeptio spectaculum bis volaticus accusamus.' }\n",
"}\n"
]
}
],
"source": [
"console.log(docs[0].metadata)"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"{\n",
" input: 'Conventus supellex aegrotatio termes. Vapulus abscido ubi vita coadunatio modi crapula comparo caecus. Acervus voluptate tergeo pariatur conor argumentum inventore vomito stella.'\n",
"}\n"
]
}
],
"source": [
"console.log(docs[0].metadata.inputs)"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"{ output: 'Excepturi adeptio spectaculum bis volaticus accusamus.' }\n"
]
}
],
"source": [
"console.log(docs[0].metadata.outputs)"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[\n",
" 'id',\n",
" 'created_at',\n",
" 'modified_at',\n",
" 'name',\n",
" 'dataset_id',\n",
" 'source_run_id',\n",
" 'metadata',\n",
" 'inputs',\n",
" 'outputs'\n",
"]\n"
]
}
],
"source": [
"console.log(Object.keys(docs[0].metadata))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## API reference\n",
"\n",
"For detailed documentation of all `LangSmithLoader` features and configurations head to the [API reference](https://api.js.langchain.com/classes/_langchain_core.document_loaders_langsmith.LangSmithLoader.html)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "TypeScript",
"language": "typescript",
"name": "tslab"
},
"language_info": {
"codemirror_mode": {
"mode": "typescript",
"name": "javascript",
"typescript": true
},
"file_extension": ".ts",
"mimetype": "text/typescript",
"name": "typescript",
"version": "3.7.2"
}
},
"nbformat": 4,
"nbformat_minor": 4
}
4 changes: 4 additions & 0 deletions langchain-core/.gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,10 @@ document_loaders/base.cjs
document_loaders/base.js
document_loaders/base.d.ts
document_loaders/base.d.cts
document_loaders/langsmith.cjs
document_loaders/langsmith.js
document_loaders/langsmith.d.ts
document_loaders/langsmith.d.cts
embeddings.cjs
embeddings.js
embeddings.d.ts
Expand Down
1 change: 1 addition & 0 deletions langchain-core/langchain.config.js
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ export const config = {
chat_history: "chat_history",
documents: "documents/index",
"document_loaders/base": "document_loaders/base",
"document_loaders/langsmith": "document_loaders/langsmith",
embeddings: "embeddings",
example_selectors: "example_selectors/index",
indexing: "indexing/index",
Expand Down
13 changes: 13 additions & 0 deletions langchain-core/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -186,6 +186,15 @@
"import": "./document_loaders/base.js",
"require": "./document_loaders/base.cjs"
},
"./document_loaders/langsmith": {
"types": {
"import": "./document_loaders/langsmith.d.ts",
"require": "./document_loaders/langsmith.d.cts",
"default": "./document_loaders/langsmith.d.ts"
},
"import": "./document_loaders/langsmith.js",
"require": "./document_loaders/langsmith.cjs"
},
"./embeddings": {
"types": {
"import": "./embeddings.d.ts",
Expand Down Expand Up @@ -653,6 +662,10 @@
"document_loaders/base.js",
"document_loaders/base.d.ts",
"document_loaders/base.d.cts",
"document_loaders/langsmith.cjs",
"document_loaders/langsmith.js",
"document_loaders/langsmith.d.ts",
"document_loaders/langsmith.d.cts",
"embeddings.cjs",
"embeddings.js",
"embeddings.d.ts",
Expand Down
Loading

0 comments on commit 465bbfb

Please sign in to comment.