OCR Model for Reading CAPTCHAs Using Keras

Reading CAPTCHAs automatically is a challenging task due to their distorted characters and noise. As an experienced Python Keras developer, I found building an OCR model tailored for CAPTCHAs both practical and insightful.

In this tutorial, I’ll show you how to build an end-to-end OCR model using Keras that can read CAPTCHAs accurately. You’ll get the full code for each step, making it easy to follow and implement.

Table of Contents

What is OCR and Why Use Keras for CAPTCHAs?

Optical Character Recognition (OCR) converts images of text into machine-readable text. CAPTCHAs add complexity with distortions and noise.

Keras offers a flexible and powerful framework to build OCR models using convolutional and recurrent layers, ideal for sequence recognition like CAPTCHAs.

Method 1: Generate CAPTCHA Dataset with Python

Before building the model, we need a dataset of CAPTCHAs and labels.

Step 1: Install and Import Required Libraries

First, set up the essential libraries needed to create and process CAPTCHA images.

!pip install captcha
import numpy as np
from captcha.image import ImageCaptcha
import string
import random
from tensorflow.keras.utils import to_categorical

Step 2: Generate CAPTCHA Images and Labels

Next, generate synthetic CAPTCHA images along with their corresponding text labels.

characters = string.ascii_uppercase + string.digits
captcha_length = 5  # Length of each captcha

def generate_captcha():
    captcha_text = ''.join(random.choices(characters, k=captcha_length))
    image = ImageCaptcha(width=160, height=60)
    captcha_image = image.generate_image(captcha_text)
    return captcha_image, captcha_text

def generate_dataset(num_samples):
    X = []
    y = []
    for _ in range(num_samples):
        image, text = generate_captcha()
        image = image.convert('L')  # Convert to grayscale
        image = np.array(image) / 255.0  # Normalize
        X.append(image)
        y.append(text)
    return np.array(X), y

# Generate 1000 samples for example
X_data, y_text = generate_dataset(1000)
X_data = X_data[..., np.newaxis]  # Add channel dimension
print("Dataset shape:", X_data.shape)

I executed the above example code and added the screenshot below.

This method helps you quickly generate a labeled CAPTCHA dataset for training deep learning models.

Method 2: Encode Labels for OCR Model

We need to convert text labels into a numerical format suitable for training.

char_to_num = {char: idx for idx, char in enumerate(characters)}
num_to_char = {idx: char for char, idx in char_to_num.items()}

def encode_labels(texts):
    max_len = captcha_length
    encoded = np.zeros((len(texts), max_len), dtype=int)
    for i, text in enumerate(texts):
        for j, ch in enumerate(text):
            encoded[i, j] = char_to_num[ch]
    return encoded

y_encoded = encode_labels(y_text)
print("Sample encoded label:", y_encoded[0])

I executed the above example code and added the screenshot below.

This method converts each CAPTCHA text label into a fixed-length numeric sequence, ensuring the OCR model can learn and process character-level outputs during training.

Method 3: Build the OCR Model Architecture in Keras

We will build a CNN + RNN architecture with CTC loss for sequence recognition.

Step 1: Define the Model

Here, we construct a CNN-RNN hybrid model with CTC output for handling sequential CAPTCHA text recognition.

import tensorflow as tf
from tensorflow.keras import layers, models

def build_ocr_model(input_shape, num_classes, max_length):
    inputs = layers.Input(shape=input_shape, name='input_image')

    # CNN layers for feature extraction
    x = layers.Conv2D(32, (3,3), activation='relu', padding='same')(inputs)
    x = layers.MaxPooling2D((2,2))(x)
    x = layers.Conv2D(64, (3,3), activation='relu', padding='same')(x)
    x = layers.MaxPooling2D((2,2))(x)
    x = layers.Conv2D(128, (3,3), activation='relu', padding='same')(x)
    x = layers.MaxPooling2D((2,2))(x)

    # Prepare for RNN
    shape = x.shape
    x = layers.Reshape((shape[1], shape[2] * shape[3]))(x)

    # Bidirectional LSTM layers
    x = layers.Bidirectional(layers.LSTM(128, return_sequences=True))(x)
    x = layers.Bidirectional(layers.LSTM(64, return_sequences=True))(x)

    # Output layer
    x = layers.Dense(num_classes + 1, activation='softmax')(x)  # +1 for CTC blank label

    model = models.Model(inputs, x)
    return model

input_shape = (60, 160, 1)
num_classes = len(characters)
max_length = captcha_length

ocr_model = build_ocr_model(input_shape, num_classes, max_length)
ocr_model.summary()

I executed the above example code and added the screenshot below.

This method sets up the core OCR architecture that converts CAPTCHA images into character sequences using CNN features and LSTM-based decoding.

Method 4: Define CTC Loss and Training Pipeline

CTC loss is essential for sequence labeling when the alignment between input and output is unknown.

def ctc_lambda_func(args):
    y_pred, labels, input_length, label_length = args
    y_pred = y_pred[:, :, :]
    return tf.keras.backend.ctc_batch_cost(labels, y_pred, input_length, label_length)

labels = layers.Input(name='ground_truth_labels', shape=[max_length], dtype='int32')
input_length = layers.Input(name='input_length', shape=[1], dtype='int32')
label_length = layers.Input(name='label_length', shape=[1], dtype='int32')

loss_out = layers.Lambda(ctc_lambda_func, output_shape=(1,), name='ctc')([ocr_model.output, labels, input_length, label_length])

training_model = models.Model(inputs=[ocr_model.input, labels, input_length, label_length], outputs=loss_out)

training_model.compile(optimizer='adam', loss={'ctc': lambda y_true, y_pred: y_pred})

I executed the above example code and added the screenshot below.

This method sets up the CTC loss and training wrapper, enabling the OCR model to learn character sequences even without explicit alignment between image pixels and text labels.

Method 5: Prepare Data Generator for Training

We create a generator to feed images and labels in batches.

def data_generator(X, y, batch_size=32):
    while True:
        for i in range(0, len(X), batch_size):
            X_batch = X[i:i+batch_size]
            y_batch = y[i:i+batch_size]
            label_length = np.ones((len(y_batch), 1)) * max_length
            input_length = np.ones((len(y_batch), 1)) * (X_batch.shape[2] // 4)  # after pooling layers

            inputs = {
                'input_image': X_batch,
                'ground_truth_labels': y_batch,
                'input_length': input_length,
                'label_length': label_length
            }
            outputs = {'ctc': np.zeros(len(y_batch))}
            yield inputs, outputs

# Prepare encoded labels for training
y_train_encoded = y_encoded

batch_size = 32
train_gen = data_generator(X_data, y_train_encoded, batch_size)

This method builds a batch-wise data generator that prepares images, encoded labels, and required CTC inputs, making the OCR model ready for efficient streaming training.

Method 6: Train the OCR Model

Now we train the model using the generator.

steps_per_epoch = len(X_data) // batch_size

training_model.fit(train_gen,
                   steps_per_epoch=steps_per_epoch,
                   epochs=10)

Building an OCR model for reading CAPTCHAs using Keras is a rewarding experience that combines image processing with sequence modeling. By following the steps of generating a dataset, encoding labels, designing a CNN-RNN architecture, and training with CTC loss, you can create a robust model capable of recognizing complex CAPTCHA patterns.

This approach leverages Keras’s flexibility and ease of use, making it accessible even if you’re new to OCR or deep learning. With further tuning and real-world data, you can improve accuracy and adapt the model for various OCR challenges.

Other Python Keras articles you may also like:

Bijay Kumar

Bijay Kumar is an experienced Python and AI professional who enjoys helping developers learn modern technologies through practical tutorials and examples. His expertise includes Python development, Machine Learning, Artificial Intelligence, automation, and data analysis using libraries like Pandas, NumPy, TensorFlow, Matplotlib, SciPy, and Scikit-Learn. At PythonGuides.com, he shares in-depth guides designed for both beginners and experienced developers. More about us.

enjoysharepoint.com/

OCR Model for Reading CAPTCHAs Using Keras

What is OCR and Why Use Keras for CAPTCHAs?

Method 1: Generate CAPTCHA Dataset with Python

Step 1: Install and Import Required Libraries

Step 2: Generate CAPTCHA Images and Labels

Method 2: Encode Labels for OCR Model

Method 3: Build the OCR Model Architecture in Keras

Step 1: Define the Model

Method 4: Define CTC Loss and Training Pipeline

Method 5: Prepare Data Generator for Training

Method 6: Train the OCR Model

51 PYTHON PROGRAMS PDF FREE

Aspiring to be a Python developer?

Let’s be friends