Skip to content

Commit

Permalink
docs: document HTML sanitization
Browse files Browse the repository at this point in the history
  • Loading branch information
lucgagan committed Nov 12, 2023
1 parent bf45904 commit 868534e
Showing 1 changed file with 8 additions and 2 deletions.
10 changes: 8 additions & 2 deletions src/sanitizeHtml.ts
Original file line number Diff line number Diff line change
@@ -1,9 +1,15 @@
import sanitize from "sanitize-html";

/**
* The reason for sanitization is because we do not need all HTML tags to be present in the prompt.
* For example, we do not need <style> or <script> tags to be present in the prompt.
* The reason for sanitization is because OpenAI does not need all of the HTML tags
* to know how to interpret the website, e.g. it will not make a difference to AI if
* we include or exclude <script> tags as they do not impact the already rendered DOM.
*
* In my experience, reducing HTML only to basic tags produces faster and more reliable prompts.
*
* Note that the output of this function is designed to interpret only the HTML tags.
* For instructions that rely on visual cues (e.g. "click red button") we intend to
* combine HTML with screenshots in the future versions of this library.
*/
export const sanitizeHtml = (subject: string) => {
return sanitize(subject, {
Expand Down

0 comments on commit 868534e

Please sign in to comment.