1.In order to realize data collection, registration of Baidu Intelligent Cloud OCR recognition engine application, get its API Key and Secret Key.
2.Write access_token.py program to get access_token value by API Key and Secret Key.
3.Use access_token to call the VAT invoice recognition API in Baidu's text recognition technology, and write ocr_invoice.py code to extract the invoice
4.Use access_token to call Baidu Universal Text Recognition (High Precision Edition) technology API, write ocr_contract.py program to extract the contract. As Baidu OCR is not specifically used to identify the contract API, so choose the General Text Recognition. As the contract is a pdf file, different from the invoice, so you need to modify the parameters for the "pdf_file" to meet the read pdf document.
5.Use access_token to call the Baidu Universal Text Recognition (high precision version) technology API, write ocr_bills.py program to extract the bank current
6.Standard field extraction extraction of bank flow txt file using NLP technique.
7.Save the NLP extraction results to a txt file for subsequent data preprocessing operations.
1.Execute oce_accuracy.py program to extract the fuzzy and imprecise part of the text with high accuracy.
Result:
2.Based on the NLP results and oce_accuracy on the final data identified are entered into the excel sheet separately.
1.Based on excel file, write Creat_mysql.py program to implement the creation of database, create 7 tables and store data from excel into database.
2.By writing a program that enables calculations to be performed on the stored data
(a) Total revenue, total net profit of the firm for each year;
(b) Support for individual requirements.
In the program to leave a part of the sql statement, the user can according to the need to achieve the function of writing sql statements to achieve personalized needs.