Name	Name	Last commit message	Last commit date
parent directory ..
.svelte-kit	.svelte-kit
src	src
static	static
.env.example	.env.example
.eslintignore	.eslintignore
.eslintrc.cjs	.eslintrc.cjs
.firebaserc	.firebaserc
.gitattributes	.gitattributes
.gitignore	.gitignore
.npmrc	.npmrc
.prettierignore	.prettierignore
.prettierrc	.prettierrc
CONTRIBUTOR_GUIDE.md	CONTRIBUTOR_GUIDE.md
LICENSE.md	LICENSE.md
README.md	README.md
firebase.json	firebase.json
package-lock.json	package-lock.json
package.json	package.json
postcss.config.cjs	postcss.config.cjs
svelte.config.auto.js	svelte.config.auto.js
svelte.config.js	svelte.config.js
svelte.config.static.js	svelte.config.static.js
tailwind.config.cjs	tailwind.config.cjs
tsconfig.json	tsconfig.json
vite.config.ts	vite.config.ts

Talk to the City

Talk to the City is an application that:

ingests unstructured natural language, e.g:
- citizen surveys / public deliberations
- newsgroups
- forums
- discussion archives
uses LLMs to extract and classify:
- atomic claims
- topics and subtopics
generates interactive reports

Demo

Heal Michigan

The Heal Michigan report is a video-based survey and an in-depth look into the challenges and daily lives of the Michigan community.

Taiwan same-sex marriage

The Taiwan same-sex marriage report is a very large survey of the Taiwanese population, covering their views on same-sex marriage in Taiwan.

Mina protocol

The Mina protocol report features the results of a user-survey carried out by Mina Zero Knowledge Protocol on their users.

Repo link: https://github.com/AIObjectives/talk-to-the-city-reports

Computational Graph

On a technology front, tttc uses a dependency-graph based data and computational model based on nodes that are connected by directional edges. The nodes + edges form a pipeline where some nodes provide data, whilst others provide computation steps. Computation simply involves a topological sort (since edges are directed) where the output of nodes are passed into the input of their downstream nodes. On each step the "compute" function for each node is simply invoked with the upstream input data, and so on until all nodes have been computed.

Computation has two modes: "run" when the pipeline creator actively runs the pipeline, and "load" which is called when the resulting report page is loaded by a viewer.

Reusability with the MVC pattern

The graph is also used for the UI. Pipelines have two rendering mode: graph and standard. The graph view uses Svelteflow whilst the standard view performs a topological sort and renders the nodes in a single column.

Nodes use the MVC pattern. The compute functions hold the Model and the Controller. The graph UI components hold the View.

Since the MC and V are decoupled, we can use different combinations of MC <-> V to yield many combinations of compute + UI entities whilst minimizing code and maximizing reusability.

Documentation

Our AI Pipeline Engineering Guide #1 takes the reader step by step over the process of creating a report pipeline.

Our user docs provides a very high level overview of the application for non-technical users.

Cloning

$ git clone https://github.com/AIObjectives/talk-to-the-city-reports

Firebase

The application can be hosted anywhere, although the persistence layer is currently coupled with Firestore and Google Cloud Storage.

Setting up a firebase instance

Since the app uses Firebase, you'll need a dev / staging firebase instance for local development, and for deployment. To do so, you have two options:

setting up your own instance.
using AOI's dev instance.

Deploying and maintaining google cloud platform resources is fairly simple and straight forwards although requires the use of the gcloud and gsutil CLI applications. So before we get started make sure you have those correctly installed, and authenticated.

https://cloud.google.com/sdk/docs/install

Setting up your own instance

To set up your own instance:

Head over to https://console.firebase.google.com/
Click "add project" and enter a project name
Disable google analytics
Click "create project" & continue
Under "Get started by adding Firebase to your app" click on the web </> icon
Add an app nickname (same as earlier)
Click "firebase hosting" if you intend to deploy the app
Click "register app"
Copy .env.example to .env in the turbo directory
Copy & paste the values of the variables.
Click next.
npm install -g firebase-tools
firebase login

Setting up authentication

In the project overview, click on "Authentication"
Click on "set up sign-in method"
Click 'Google'
Click 'enable'
Select a support email address
Click 'save'

Setting up firestore

In the project overview, in the left side panel, click on "build"
Click on "firestore database"
Click "Create Database"
Select your region / multi region
Click 'next'
Click 'Start in test mode'
Click 'enable'

N.B Firestore rules are still being finalized. Please contact @lightningorb to find out more.

Setting up Google Cloud Storage

In the project overview, in the left side panel, click on "build"
Click on 'storage'
Click 'get started'
Click 'start in test mode'
Click next
Click done

Setting up CORS on GSC

Install and configure the gsutil application
Save the following in a temporary cors.json file

[
  {
    "origin": ["http://localhost:5173", "https://<optional_deployment_url>"],
    "method": ["GET", "HEAD", "DELETE"],
    "responseHeader": ["Content-Type"],
    "maxAgeSeconds": 3600
  }
]

Install the gsutil application
Run the following:

gsutil cors set cors.json gs://<project-name>.appspot.com

Setting up the service account

Authenticated backend endpoints require the service account file:

in the console for the project, click on project settings (the cog icon)
click on "service accounts"
click on Manage service account permissions
look for the email address that matches the project id
- click actions
- click create key
save the json private key to turbo/src/lib/service-account-pk.json
add the environment variable to your shell: export GOOGLE_APPLICATION_CREDENTIALS="src/lib/service-account-pk.json"

Post fresh install steps

DB 'dataset' index

After launching the app, for the first time check your dev console, as it will contain a link for creating an index for datasets.

Templates

Talk to the City turbo uses pipeline templates, so end users do not have to construct their own graphs.

You can manage templates via http://localhost:5173/templates or https://tttc-turbo.web.app/templates.

Admin UID

The .env file contains a VITE_ADMIN variable that should be filled in with your user id, which can be acquired from the Firestore database.

Using AOI's dev instance

Contact @brittneygallagher or @lightningorb for credentials files
save the provided .env in turbo/
optional steps for deployment:
- save the provided service-account-pk.json in turbo/src/lib/
- npm install -g firebase-tools
- firebase login

Disclaimer: by using a shared dev instance, you are aware that the data you shared by nature, and therefore no privacy can be made for the data you choose to upload to the platform. For better privacy, consider setting up your own instance.

Deploying to firebase

Once you're done making your changes, you can deploy to firebase with:

$ firebase deploy

Multi-site deployments

Firebase allows easily deploying to multiple sites that use the same project resources.

To specify a different site:

modify .hosting.site in turbo/firebase.json
run firebase deploy --only hosting:<alt-site-name>

Running

Once you have set up a Firebase instance:

Node version tested: v18.0.0

$ cd talk-to-the-city-reports/turbo
$ npm install --legacy-peer-deps # or --force
$ npm run dev

Dev documentation

Adding new node types

To add pipeline computation nodes:

create the compute function in src/lib/compute/
look for a suitable UI component in src/components/
- In the vast majority of cases, you should be able to simply use an existing UI component. If a UI component does not suit your needs, then feel free to create a new one.
Bind the node's compute type with a component in src/lib/node_types.ts
add the node to src/lib/templates.ts
add node documentation to src/lib/docs

Node UI component hierarchy

Node UI component hierarchy:

The primary UI components displayed to users are called "nodes" as they are part of a dependency graph.

The docs that appear when the user presses the ? mark are stored in:

src/lib/docs

Adding text inside nodes:

The UI nodes are stored in ./turbo/src/components/graph/nodes.

DGNode is the 'base' node, that all nodes reuse. DefaultNode is an empty generic node, when nodes don't have a specialized UI. DefaultNode is the generic file upload, which CSVNode and JSON reuse.

This is the "Argument Extraction" and "Cluster Extraction" etc. nodes, essentially all nodes requiring prompts to interact with GPTs use the PromptNode.

Internationalization

Internationalization:

src/lib/i18n/en.json
src/lib/zh-TW.json

Since we use internationalization, UI strings use:

<script lang='ts>
    import { _ as __ } from 'svelte-i18n';
</script>


<p>{$__('this_is_a_string')}</p>

The localized strings is then added to their respective src/lib/<lang>.json files.

Tests & TDD

The core functionalities of the nodes are tested. Thus it is strongly recommended to run the tests, and keep them running (vitest uses a daemon with file watch) while you make changes.

$ npm run test-ui

Testing the live website

brew install xorg-server
pip install chromedriver-autoinstaller selenium pyvirtualdisplay
DISPLAY=:99 python src/test/test_selenium.py

Test Results

Metric	Count
Total Test Suites	100
Passed Test Suites	100
Failed Test Suites	0
Pending Test Suites	0
Total Tests	202
Passed Tests	202
Failed Tests	0
Pending Tests	0
Todo Tests	0

`[1]` InfoPanelClaim.test.ts

Test	Status	Duration (ms)
testing vimeo claim	passed
testing yt claim	passed
testing yt link has si	passed
testing yt link has timestamp	passed
testing yt link has si and timestamp	passed
testing no video	passed
testing no claim throws error	passed

`[2]` add_csv_v0.test.ts

Test	Status	Duration (ms)
should concatenate multiple CSV inputs into a single output array	passed
should handle empty input arrays	passed
should handle a single input array	passed
should set dirty to false after compute	passed
should return an empty array if no inputs are provided	passed
should not mutate the input data	passed

`[3]` argument_extraction_v0.test.ts

Test	Status	Duration (ms)
extract the given arguments	passed
should not extract the arguments if no csv	passed
should not extract the arguments if no open_ai_key and no GCS	passed
should load from GCS if no open ai key	passed
should not extract the arguments if no prompt and no system prompt	passed
test GCS caching	passed

`[4]` argument_extraction_v1.test.ts

Test	Status	Duration (ms)
extract the given arguments	passed
extract the given arguments with missing rows in CSV	passed
should not extract the arguments if no csv	passed
should not extract the arguments if no open_ai_key and no GCS	passed
should load from GCS if no open ai key	passed
should not extract the arguments if no prompt and no system prompt	passed
test GCS caching	passed

`[5]` audio.test.ts

Test	Status	Duration (ms)
should return the cached output if not dirty and output exists	passed
should read audio from GCS and update size and mime_type if download is true	passed
should create an empty audio file if download is false	passed
should set dirty to false after compute	passed
should return undefined if gcs_path is not set	passed

`[6]` chat_v0.test.ts

Test	Status	Duration (ms)
compute should set output to messages and dirty to false	passed

`[7]` cluster_extraction_v0.test.ts

Test	Status	Duration (ms)
extract the cluster	passed
should not extract the cluster if no csv	passed
should not extract the cluster if no open_ai_key	passed
should not extract the cluster if no prompt and no system prompt	passed
test GCS caching	passed

`[8]` cluster_extraction_v1.test.ts

Test	Status	Duration (ms)
extract the cluster	passed
should not extract the cluster if no csv	passed
should not extract the cluster if no open_ai_key	passed
should not extract the cluster if no prompt and no system prompt	passed
test GCS caching	passed

`[9]` comment_expander_v0.test.ts

Test	Status	Duration (ms)
should concatenate comments until reaching 100 words, then start a new chunk	passed
should start a new chunk when the interview field changes	passed
should handle an empty input array	passed
should not lose the last comment if it does not exceed 100 words	passed
should correctly handle comments with exactly 100 words	passed

`[10]` count_tokens.test.ts

Test	Status	Duration (ms)
should correctly count tokens in input data	passed
should not count tokens if input data length matches and node is not dirty	passed
should count tokens if the input data is a string	passed

`[11]` csv.test.ts

Test	Status	Duration (ms)
should process CSV data correctly from GCS	passed
should handle empty CSV data from GCS	passed
should handle rows with uneven columns from GCS	passed

`[12]` dataset.test.ts

Test	Status	Duration (ms)
Find by compute type	passed
Simple pipeline run test	passed
Full pipeline run test	passed

`[13]` edit_csv.test.ts

Test	Status	Duration (ms)
generates new columns	passed
deletes columns	passed
renames columns	passed
returns undefined if input is undefined	passed
handles multiple operations	passed
does not modify input if no operations are specified	passed
does not crash if input is empty	passed

`[14]` filter_csv_v0.test.ts

Test	Status	Duration (ms)
should filter CSV data inclusively based on provided filters	passed
should filter CSV data exclusively based on provided filters	passed
should return all data if no filters are set	passed
should handle multiple filters correctly	passed
should set dirty to false after compute	passed
should not mutate the input data	passed

`[15]` gpt_embeddings_v0.test.ts

Test	Status	Duration (ms)
should compute embeddings for input data	passed
should not compute embeddings if no open_ai_key is provided	passed
should load embeddings from GCS if data length matches and save_to_gcs is true	passed
should handle no data input	passed

`[16]` gpt_v0.test.ts

Test	Status	Duration (ms)
general prompt	passed
json prompt	passed
json prompt with text	passed

`[17]` grid.test.ts

Test	Status	Duration (ms)
sets the output of the node to the input data	passed

`[18]` jq_v0.test.ts

Test	Status	Duration (ms)
should process data correctly with JQ filter	passed
should handle invalid JQ filter	passed

`[19]` jq_v1.test.ts

Test	Status	Duration (ms)
should process data correctly with JQ filter	passed
should handle invalid JQ filter	passed
should return an empty array when no matches found	passed
should process data correctly with a complex JQ filter	passed
should return undefined if the input is null or undefined	passed

`[20]` json.test.ts

Test	Status	Duration (ms)
should process JSON data correctly from GCS	passed
should handle invalid JSON data from GCS	passed
should update dirty state correctly	passed

`[21]` jsonata.test.ts

Test	Status	Duration (ms)
evaluates JSONata expressions	passed
returns undefined if no expression is provided	passed
catches errors when evaluating expressions	passed

`[22]` limit_csv.test.ts

Test	Status	Duration (ms)
should let all data pass through if number is left blank	passed
should limit the number of rows correctly, for an object	passed
should return all rows if limit is greater than number of rows	passed
should return an empty array if input is empty	passed
should not mutate the input node	passed

`[23]` markdown.test.ts

Test	Status	Duration (ms)
should set markdown data if input is a string	passed
should combine multiple string inputs with separation	passed
should wrap non-string inputs within code block	passed
should handle an empty input object	passed
should preserve the order of inputs when combining	passed
should stringify and wrap arrays in code blocks	passed
should throw an error if input data contains circular references	passed

`[24]` merge.test.ts

Test	Status	Duration (ms)
merges cluster_extraction and argument_extraction data	passed
does not merge if cluster_extraction data is missing	passed
does not merge if argument_extraction data is missing	passed
does not merge if cluster_extraction data has no topics	passed
sets node data output to the merged data and dirty to false after merge	passed

`[25]` merge_cluster_extraction.test.ts

Test	Status	Duration (ms)
merges cluster extraction data	passed
does not merge if cluster extractions are missing	passed
uses cached data if available and not dirty	passed
does not merge if no open_ai_key is provided	passed

`[26]` merge_cluster_extraction_v1.test.ts

Test	Status	Duration (ms)
should merge cluster extractions into a single output	passed
should handle empty input data	passed
should not process if no open_ai_key is provided	passed

`[27]` multi_cluster_extraction_v0.test.ts

Test	Status	Duration (ms)
should split CSV into chunks and process each chunk	passed
should handle empty CSV input	passed
should not process if no open_ai_key is provided	passed

`[28]` multi_gpt_v0.test.ts

Test	Status	Duration (ms)
should process multiple prompts	passed
should process multiple differing prompts	passed
should join outputs if join_output is true	passed
should not process if no open_ai_key is provided	passed

`[29]` open_ai_key.test.ts

Test	Status	Duration (ms)
should set the key in cookies if the UI key is valid	passed
if ui key is set but invalid use local key	passed
should set the node text to "Invalid key" if the UI key is not valid and there is no local key	passed
should not mutate the node if the UI key and local key are both valid	passed

`[30]` participant_filter.test.ts

Test	Status	Duration (ms)
filters participants based on the provided name	passed
removes subtopics with no claims after filtering	passed
removes topics with no subtopics after filtering	passed
returns undefined if input data does not contain topics	passed
does not filter claims if interview key is missing	passed

`[31]` pinecone_key_v0.test.ts

Test	Status	Duration (ms)
should set the key in cookies if the UI key is provided	passed
should use the local key from cookies if available	passed
should return an empty string if no key is provided or available in cookies	passed

`[32]` pinecone_v0.test.ts

Test	Status	Duration (ms)
should initialize Pinecone with the provided API key	passed
should create a new index if it does not exist and upsert embeddings	passed
should list Pinecone indexes	passed
should provide tools for querying Pinecone index	passed

`[33]` pyodide.test.ts

Test	Status	Duration (ms)
should execute python script and return outputData	passed
should be able to pass input to outputData	passed
test passing in complex data from jsonapi	passed

`[34]` python.integration.test.ts

Test	Status	Duration (ms)
should execute python script and return outputData	passed
should be able to pass input to outputData	passed
should be able to make get requests to jsonapi	passed

`[35]` python.test.ts

Test	Status	Duration (ms)
should execute python script and return output	passed
should handle fetch errors gracefully	passed
should handle invalid JSON response	passed
should handle non-string JSON response	passed
should update node data output with the response	passed

`[36]` register.test.ts

Test	Status	Duration (ms)
test node registeration	passed
Load all nodes	passed

`[37]` report.test.ts

Test	Status	Duration (ms)
should set the output of the node to the input data	passed
should handle empty input data	passed
should not mutate the input node	passed

`[38]` report_v1.test.ts

Test	Status	Duration (ms)
sets the output of the node to the input data	passed
handles translation	passed
uploads data to GCS on run	passed
reads data from GCS on load if gcs_path is set and input data is empty	passed
clears gcs_path if readFileFromGCS throws an error	passed
sets message if merge and csv data are present	passed
sets message to empty string if merge or csv data are missing	passed
does not mutate the input node	passed

`[39]` score_argument_relevance.test.ts

Test	Status	Duration (ms)
scores the relevance of arguments	passed
uses cached data if available and not dirty	passed
does not score if argument_extraction data is missing	passed
does not score if open_ai_key is missing	passed
does not score if prompts are missing	passed

`[40]` secret_v0.test.ts

Test	Status	Duration (ms)
should set the key in cookies if the UI key is provided	passed
should use the local key from cookies if available	passed
should return an empty string if no key is provided or available in cookies	passed

`[41]` simple_pipeline.test.ts

Test	Status	Duration (ms)
should process CSV data correctly from GCS	passed

`[42]` stringify.test.ts

Test	Status	Duration (ms)
should correctly stringify input data	passed
should return input if it cannot be stringified	passed
should handle different types of input	passed
should not mutate the input node	passed

`[43]` summarize_v0.test.ts

Test	Status	Duration (ms)
should generate summaries for topics and subtopics	passed
should load summaries from GCS if data length matches	passed

`[44]` test.test.ts

Test	Status	Duration (ms)
integer node	passed
adder node	passed
dataset run adder	passed
dataset run multi input multi output	passed

`[45]` text_to_csv_v0.test.ts

Test	Status	Duration (ms)
should convert a single text input to CSV format	passed
should convert multiple text inputs to CSV format	passed
should handle empty text input	passed
should split text into chunks if it exceeds the number of tokens	passed

`[46]` translate_v0.test.ts

Test	Status	Duration (ms)
translates the input data	passed
loads translations from GCS if data has not changed	passed
does not translate if required inputs are missing	passed

`[47]` unique_v0.test.ts

Test	Status	Duration (ms)
should return unique values based on the specified property	passed
should return an empty array if input is empty	passed
should return undefined if no property is specified	passed
should set dirty to false after compute	passed
should not mutate the input data	passed

`[48]` utils.test.ts

Test	Status	Duration (ms)
Test secondsToHHMMSS	passed
Test secondsToHHMMSS with string	passed
Test HHMMSSToSeconds	passed

`[49]` whisper.test.ts

Test	Status	Duration (ms)
should load from cache if data is not dirty and gcs_path is set	passed
should load from GCS if data is not dirty, gcs_path is set, and output is empty and audio size matches	passed
should transcribe audio and upload to GCS if data is dirty	passed
should return undefined and set message if open_ai_key is missing	passed
should convert transcription to internal format if response_format is custom	passed

`[50]` workerpool.test.ts

Test	Status	Duration (ms)
should execute function in workerpool	passed
should execute delayed function in workerpool	passed

Files

turbo

Directory actions

More options

Directory actions

More options

Latest commit

History

turbo

Folders and files

parent directory

README.md