Skip to content

High performance data processing pipeline that uses a combination of AWS technologies and Azure (for cognitive search). Used in conjunction with openlawnz-parsers.

License

Notifications You must be signed in to change notification settings

openlawnz/openlawnz-pipeline

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

OpenLaw NZ Data Pipeline

This is the OpenLaw NZ data ingest and parser pipeline. Given a common JSON file format it:

  • Downloads PDF files to AWS s3 (ingester, filefetcher)
  • Obtains and parses the text content, sending files to Azure cognitive services for OCR if necessary (pdfconverter)
  • Saves results to separate s3 buckets in JSON format (pdfconverter)
  • Inserts the results to a postgres database (putInDB)
  • Triggers a cloudwatch rule to monitor whether an ingest is complete (ingesterWatcher)
  • On completion, stops the watcher and start step functions to carry out additional parsing which must be done sequentially (parseCaseCitations, parseCaseToCase)

Structure

Pipeline Architecture

Each subdirectory is designed to be set up as a serverless function on AWS Lambda.

The flow between Lambda functions must be linked with s3 events notifications and sqs queues to ensure batching, rate limiting, and retryability.

This code is straight from Cloud9 IDE on AWS and is deliberately missing the template.yaml files to make it work. If you're an AWS Cloudformation guru please get in touch.

How to run

The pipeline is started by running /ingester. The ingester downloads case law and other legal information to s3.

The ingester must be set up as a lambda function with environment variables. See the ingester README for further details.

About

High performance data processing pipeline that uses a combination of AWS technologies and Azure (for cognitive search). Used in conjunction with openlawnz-parsers.

Topics

Resources

License

Stars

Watchers

Forks