Skip to content

timshine/BusinessCardParser

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 

Repository files navigation

BusinessCardParser

This program is a business card reader with the following attributes

  • Uses parsed results from Optical Character Recognition (OCR)
  • Extracts name, phone number and email address from business card
  • Provides sample test cases and sample main() output using the Input.txt file

How to Setup

  • Install Natural Language Toolkit (NLTK) from Install NLTK Module
  • Run most recent version of Python 3.*

Programming Approach

  • Names were extracted from the document using the nltk named entity chunker
    • This chunker assigns values of PERSON, ORGANIZATION, TIME, etc. to each chunk of information
    • A reference used for NLTK is NLTK Reference
    • NLTK assigns job titles as a chunk of PERSON
      • A running list of common job keywords was created and is checked before accepting the information as chunked as PERSON as a name
      • This list should be added to based on positions available, but a few examples were given in this code
  • Phone numbers were extracted using regular expressions
    • A reference used for regular expressions for phone number is Phone Number RegEx
    • The regular expression was then customized for this application
    • A good tool for solving regular expressions is https://www.debuggex.com/
  • Email addresses were also extracted using regular expressions
    • A reference used for regular expressions for email addresses is Email Address RegEx
    • This regular expression was then customized for this application

Interface Specification

  • The method get_contact_info(document) will return an instance of the object ContactInfo(name, email, phone)
  • The attributes can then be attained using the respective get methods in the ContactInfo class

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages