Captcha Deciphering

Objective

Practical implementation of:

Convolutional Neural Network (Deep Learning) via Keras with TensorFlow backend (default), and
Web Data Mining via Beautiful Soup*

*I mulled over Scrapy and Selenium, but settled on Beautiful Soup for its simplicity and ease of use.

Installation

The following packages to be installed:

beautifulsoup4            4.8.1                    pypi_0    pypi
captcha                   0.3                      pypi_0    pypi
glob2                     0.7                        py_0  
keras                     2.2.4                         0  
opencv-python             4.1.1.26                 pypi_0    pypi
scikit-learn              0.21.3           py37hd81dba3_0

Procedure

Use the CAPTCHA library to generate and save 1,000 sample files under the generated_images sub-directory.
Using the OpenCV library to separate the images into each alphanumeric character by means of thresholding, opening, and contour detection. Save that character in a directory named after its correposding character under the letters sub-directory.
Using the Keras library to train a model with the letter samples collected above for each letter.

Result

63% of accuracy.

Train on 2876 samples, validate on 959 samples
Epoch 1/10
2876/2876 [==============================] - 69s 24ms/step - loss: 2.2493 - acc: 0.4360 - val_loss: 1.6329 - val_acc: 0.6163
Epoch 2/10
2876/2876 [==============================] - 69s 24ms/step - loss: 1.0487 - acc: 0.7406 - val_loss: 1.7075 - val_acc: 0.5912
Epoch 3/10
2876/2876 [==============================] - 72s 25ms/step - loss: 0.4068 - acc: 0.8873 - val_loss: 2.0714 - val_acc: 0.6246
Epoch 4/10
2876/2876 [==============================] - 76s 27ms/step - loss: 0.1239 - acc: 0.9677 - val_loss: 2.4089 - val_acc: 0.6131
Epoch 5/10
2876/2876 [==============================] - 75s 26ms/step - loss: 0.0793 - acc: 0.9812 - val_loss: 2.5761 - val_acc: 0.6027
Epoch 6/10
2876/2876 [==============================] - 77s 27ms/step - loss: 0.0281 - acc: 0.9924 - val_loss: 2.6144 - val_acc: 0.6173
Epoch 7/10
2876/2876 [==============================] - 78s 27ms/step - loss: 0.0249 - acc: 0.9941 - val_loss: 2.6915 - val_acc: 0.6236
Epoch 8/10
2876/2876 [==============================] - 76s 26ms/step - loss: 0.0348 - acc: 0.9927 - val_loss: 2.6212 - val_acc: 0.6194
Epoch 9/10
2876/2876 [==============================] - 76s 27ms/step - loss: 0.0247 - acc: 0.9944 - val_loss: 2.6898 - val_acc: 0.6298
Epoch 10/10
2876/2876 [==============================] - 75s 26ms/step - loss: 0.0128 - acc: 0.9983 - val_loss: 2.8467 - val_acc: 0.6319

Reference

Practical Web Scraping for Data Science Best Practices and Examples with Python (ISBN 978-1-4842-3582-9)

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
README.md		README.md
apply.py		apply.py
captcha.ipynb		captcha.ipynb
constants.py		constants.py
cut.py		cut.py
functions.py		functions.py
generate.py		generate.py
test.py		test.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Captcha Deciphering

Objective

Installation

Procedure

Result

Reference

About

Releases

Packages

Languages

tezzytezzy/captcha-deciphering

Folders and files

Latest commit

History

Repository files navigation

Captcha Deciphering

Objective

Installation

Procedure

Result

Reference

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages