Skip to content

Commit

Permalink
[Security Solution] [Elastic AI Assistant] Fixes Knowledge Base not l…
Browse files Browse the repository at this point in the history
…oading in cloud environments (#169039)

## Summary

Resolves an issue on cloud deployments where the Knowledge Base could be
set up, but the ES|QL entries would not be loaded.

Renames `knowledge_base/esql/docs` to
`knowledge_base/esql/documentation`, as `docs` is part of the Kibana
build [time exclusion
strategy](https://github.com/elastic/kibana/blob/main/src/dev/build/tasks/copy_legacy_source_task.ts#L41).
Note, even though line `39` excludes `asciidoc`, in testing it was
actually the `docs` entry from line `41` that was preventing them from
being included in the Kibana build process.

Note: Actual changeset here is just a couple files, updating the
`esql_loader` and corresponding tests. Majority of changes are from the
rename, so should be a straight-forward review.


To test that the assets are included in the build, you can run a `yarn
build` locally and verify the assets are included in the dist at:
`build/kibana/node_modules/@kbn/elastic-assistant-plugin/server/knowledge_base/esql/documentation`),
or alternatively, just log into this PR's `ci:cloud-deploy` instance
from the Kibana build details, and verify that the appropriate errors
(ELSER n/a, not file missing) are logged when trying to load the `ES|QL
Knowledge Base Documents`.


> [!NOTE]
> Since the `ci:cloud-deploy` instances don't deploy with an ML node of
sufficient capacity, you can't actually deploy ELSER, but you can
download it, which is all that the initial ELSER check ensures, so you
can still test that the docs have attempted to be loaded into the
`.kibana-elastic-ai-assistant-kb` index by checking the [cluster's
kibana
logs](https://kibana-pr-169039.kb.us-west2.gcp.elastic-cloud.com:9243/app/logs/stream?logFilter=(filters:!(),query:(language:kuery,query:'service.id:%2258121ceb066505e00f0913733b3e5ee9%22%20and%20%22language%20docs%22'),refreshInterval:(pause:!t,value:5000),timeRange:(from:now-15m,to:now))&logView=(logViewId:default,type:log-view-reference)&flyoutOptions=(flyoutId:'3L_7PosBZTjGpbeGx6t3',flyoutVisibility:hidden,surroundingLogsId:!n)&logPosition=(position:(tiebreaker:2852,time:1697599602455))),
and verifying the below log line:
>
> `[kibana.log][INFO] Loaded 0 ES|QL docs, language docs, and example
queries into the Knowledge Base`
>
> with logs above it detailing the docs to be loaded, and them failing
because ELSER is MIA.

---------

Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>
  • Loading branch information
spong and kibanamachine authored Oct 18, 2023
1 parent d6f7384 commit 716b1d3
Show file tree
Hide file tree
Showing 134 changed files with 12 additions and 12 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -8,15 +8,15 @@
import { Document } from 'langchain/document';

/**
* Mock LangChain `Document`s from `knowledge_base/esql/docs`, loaded from a LangChain `DirectoryLoader`
* Mock LangChain `Document`s from `knowledge_base/esql/documentation`, loaded from a LangChain `DirectoryLoader`
*/
export const mockEsqlDocsFromDirectoryLoader: Document[] = [
{
pageContent:
'[[esql-agg-avg]]\n=== `AVG`\nThe average of a numeric field.\n\n[source.merge.styled,esql]\n----\ninclude::{esql-specs}/stats.csv-spec[tag=avg]\n----\n[%header.monospaced.styled,format=dsv,separator=|]\n|===\ninclude::{esql-specs}/stats.csv-spec[tag=avg-result]\n|===\n\nThe result is always a `double` not matter the input type.\n',
metadata: {
source:
'/Users/andrew.goldstein/Projects/forks/andrew-goldstein/kibana/x-pack/plugins/elastic_assistant/server/knowledge_base/esql/docs/aggregation_functions/avg.asciidoc',
'/Users/andrew.goldstein/Projects/forks/andrew-goldstein/kibana/x-pack/plugins/elastic_assistant/server/knowledge_base/esql/documentation/aggregation_functions/avg.asciidoc',
},
},
];
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ export const mockMsearchResponse: MsearchResponse = {
_source: {
metadata: {
source:
'/Users/andrew.goldstein/Projects/forks/andrew-goldstein/kibana/x-pack/plugins/elastic_assistant/server/knowledge_base/esql/docs/source_commands/from.asciidoc',
'/Users/andrew.goldstein/Projects/forks/andrew-goldstein/kibana/x-pack/plugins/elastic_assistant/server/knowledge_base/esql/documentation/source_commands/from.asciidoc',
},
vector: {
tokens: {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,8 @@ This directory contains assets for the Knowledge Base feature. The assets are us

The assets are stored in their original source format, so `.asciidoc` for documentation, and `.g4` and `.tokens` for the ANTLR language definitions. File names have been updated to be snake_case to satisfy Kibana linting rules.

NOTE: When adding knowledge base assets, please ensure that the source files and directories are not excluded as part of the Kibana build process, otherwise things will work fine locally, but will fail once a distribution has been built (i.e. cloud deployments). See `src/dev/build/tasks/copy_legacy_source_task.ts` for details on exclusion patterns.

### Future

Once asset format and chunking strategies are finalized, we may want to either move the assets to a shared package so they can be consumed by other plugins, or potentially ship the pre-packaged ELSER embeddings as part of a Fleet Integration. For now though, the assets will be included in their source format within the plugin, and can then be processed and embedded at runtime.
Original file line number Diff line number Diff line change
Expand Up @@ -105,8 +105,7 @@ describe('loadESQL', () => {
await loadESQL(esStore, logger);

expect(logger.error).toHaveBeenCalledWith(
'Failed to load ES|QL docs, language docs, and example queries into the Knowledge Base',
error
'Failed to load ES|QL docs, language docs, and example queries into the Knowledge Base\nError: Failed to load documents'
);
});
});
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ import { ESQL_RESOURCE } from '../../../routes/knowledge_base/constants';
export const loadESQL = async (esStore: ElasticsearchStore, logger: Logger): Promise<boolean> => {
try {
const docsLoader = new DirectoryLoader(
resolve(__dirname, '../../../knowledge_base/esql/docs'),
resolve(__dirname, '../../../knowledge_base/esql/documentation'),
{
'.asciidoc': (path) => new TextLoader(path),
},
Expand Down Expand Up @@ -76,8 +76,7 @@ export const loadESQL = async (esStore: ElasticsearchStore, logger: Logger): Pro
return response.length > 0;
} catch (e) {
logger.error(
`Failed to load ES|QL docs, language docs, and example queries into the Knowledge Base`,
e
`Failed to load ES|QL docs, language docs, and example queries into the Knowledge Base\n${e}`
);
return false;
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -225,7 +225,7 @@ describe('ElasticsearchStore', () => {
"[[esql-from]]\n=== `FROM`\n\nThe `FROM` source command returns a table with up to 10,000 documents from a\ndata stream, index, or alias. Each row in the resulting table represents a\ndocument. Each column corresponds to a field, and can be accessed by the name\nof that field.\n\n[source,esql]\n----\nFROM employees\n----\n\nYou can use <<api-date-math-index-names,date math>> to refer to indices, aliases\nand data streams. This can be useful for time series data, for example to access\ntoday's index:\n\n[source,esql]\n----\nFROM <logs-{now/d}>\n----\n\nUse comma-separated lists or wildcards to query multiple data streams, indices,\nor aliases:\n\n[source,esql]\n----\nFROM employees-00001,employees-*\n----\n",
metadata: {
source:
'/Users/andrew.goldstein/Projects/forks/andrew-goldstein/kibana/x-pack/plugins/elastic_assistant/server/knowledge_base/esql/docs/source_commands/from.asciidoc',
'/Users/andrew.goldstein/Projects/forks/andrew-goldstein/kibana/x-pack/plugins/elastic_assistant/server/knowledge_base/esql/documentation/source_commands/from.asciidoc',
},
},
{
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -122,7 +122,7 @@ export class ElasticsearchStore extends VectorStore {
i.index?._id != null && i.index.error == null ? [i.index._id] : []
);
} catch (e) {
this.logger.error('Error loading data into KB', e);
this.logger.error(`Error loading data into KB\n ${e}`);
return [];
}
};
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ describe('getFlattenedHits', () => {
"[[esql-from]]\n=== `FROM`\n\nThe `FROM` source command returns a table with up to 10,000 documents from a\ndata stream, index, or alias. Each row in the resulting table represents a\ndocument. Each column corresponds to a field, and can be accessed by the name\nof that field.\n\n[source,esql]\n----\nFROM employees\n----\n\nYou can use <<api-date-math-index-names,date math>> to refer to indices, aliases\nand data streams. This can be useful for time series data, for example to access\ntoday's index:\n\n[source,esql]\n----\nFROM <logs-{now/d}>\n----\n\nUse comma-separated lists or wildcards to query multiple data streams, indices,\nor aliases:\n\n[source,esql]\n----\nFROM employees-00001,employees-*\n----\n",
metadata: {
source:
'/Users/andrew.goldstein/Projects/forks/andrew-goldstein/kibana/x-pack/plugins/elastic_assistant/server/knowledge_base/esql/docs/source_commands/from.asciidoc',
'/Users/andrew.goldstein/Projects/forks/andrew-goldstein/kibana/x-pack/plugins/elastic_assistant/server/knowledge_base/esql/documentation/source_commands/from.asciidoc',
},
},
];
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -98,7 +98,7 @@ export const indexEvaluations = async ({

return true;
} catch (e) {
logger.error('Error indexing data into the evaluation index', e);
logger.error(`Error indexing data into the evaluation index\n${e}`);
return false;
}
};

0 comments on commit 716b1d3

Please sign in to comment.