Reading CAPTCHAs automatically is a challenging task due to their distorted characters and noise. As an experienced Python Keras developer, I found building an OCR model tailored for CAPTCHAs both practical and insightful.
In this tutorial, I’ll show you how to build an end-to-end OCR model using Keras that can read CAPTCHAs accurately. You’ll get the full code for each step, making it easy to follow and implement.
What is OCR and Why Use Keras for CAPTCHAs?
Optical Character Recognition (OCR) converts images of text into machine-readable text. CAPTCHAs add complexity with distortions and noise.
Keras offers a flexible and powerful framework to build OCR models using convolutional and recurrent layers, ideal for sequence recognition like CAPTCHAs.
Method 1: Generate CAPTCHA Dataset with Python
Before building the model, we need a dataset of CAPTCHAs and labels.
Step 1: Install and Import Required Libraries
First, set up the essential libraries needed to create and process CAPTCHA images.
!pip install captcha
import numpy as np
from captcha.image import ImageCaptcha
import string
import random
from tensorflow.keras.utils import to_categoricalStep 2: Generate CAPTCHA Images and Labels
Next, generate synthetic CAPTCHA images along with their corresponding text labels.
characters = string.ascii_uppercase + string.digits
captcha_length = 5 # Length of each captcha
def generate_captcha():
captcha_text = ''.join(random.choices(characters, k=captcha_length))
image = ImageCaptcha(width=160, height=60)
captcha_image = image.generate_image(captcha_text)
return captcha_image, captcha_text
def generate_dataset(num_samples):
X = []
y = []
for _ in range(num_samples):
image, text = generate_captcha()
image = image.convert('L') # Convert to grayscale
image = np.array(image) / 255.0 # Normalize
X.append(image)
y.append(text)
return np.array(X), y
# Generate 1000 samples for example
X_data, y_text = generate_dataset(1000)
X_data = X_data[..., np.newaxis] # Add channel dimension
print("Dataset shape:", X_data.shape)I executed the above example code and added the screenshot below.

This method helps you quickly generate a labeled CAPTCHA dataset for training deep learning models.
Method 2: Encode Labels for OCR Model
We need to convert text labels into a numerical format suitable for training.
char_to_num = {char: idx for idx, char in enumerate(characters)}
num_to_char = {idx: char for char, idx in char_to_num.items()}
def encode_labels(texts):
max_len = captcha_length
encoded = np.zeros((len(texts), max_len), dtype=int)
for i, text in enumerate(texts):
for j, ch in enumerate(text):
encoded[i, j] = char_to_num[ch]
return encoded
y_encoded = encode_labels(y_text)
print("Sample encoded label:", y_encoded[0])I executed the above example code and added the screenshot below.

This method converts each CAPTCHA text label into a fixed-length numeric sequence, ensuring the OCR model can learn and process character-level outputs during training.
Method 3: Build the OCR Model Architecture in Keras
We will build a CNN + RNN architecture with CTC loss for sequence recognition.
Step 1: Define the Model
Here, we construct a CNN-RNN hybrid model with CTC output for handling sequential CAPTCHA text recognition.
import tensorflow as tf
from tensorflow.keras import layers, models
def build_ocr_model(input_shape, num_classes, max_length):
inputs = layers.Input(shape=input_shape, name='input_image')
# CNN layers for feature extraction
x = layers.Conv2D(32, (3,3), activation='relu', padding='same')(inputs)
x = layers.MaxPooling2D((2,2))(x)
x = layers.Conv2D(64, (3,3), activation='relu', padding='same')(x)
x = layers.MaxPooling2D((2,2))(x)
x = layers.Conv2D(128, (3,3), activation='relu', padding='same')(x)
x = layers.MaxPooling2D((2,2))(x)
# Prepare for RNN
shape = x.shape
x = layers.Reshape((shape[1], shape[2] * shape[3]))(x)
# Bidirectional LSTM layers
x = layers.Bidirectional(layers.LSTM(128, return_sequences=True))(x)
x = layers.Bidirectional(layers.LSTM(64, return_sequences=True))(x)
# Output layer
x = layers.Dense(num_classes + 1, activation='softmax')(x) # +1 for CTC blank label
model = models.Model(inputs, x)
return model
input_shape = (60, 160, 1)
num_classes = len(characters)
max_length = captcha_length
ocr_model = build_ocr_model(input_shape, num_classes, max_length)
ocr_model.summary()I executed the above example code and added the screenshot below.

This method sets up the core OCR architecture that converts CAPTCHA images into character sequences using CNN features and LSTM-based decoding.
Method 4: Define CTC Loss and Training Pipeline
CTC loss is essential for sequence labeling when the alignment between input and output is unknown.
def ctc_lambda_func(args):
y_pred, labels, input_length, label_length = args
y_pred = y_pred[:, :, :]
return tf.keras.backend.ctc_batch_cost(labels, y_pred, input_length, label_length)
labels = layers.Input(name='ground_truth_labels', shape=[max_length], dtype='int32')
input_length = layers.Input(name='input_length', shape=[1], dtype='int32')
label_length = layers.Input(name='label_length', shape=[1], dtype='int32')
loss_out = layers.Lambda(ctc_lambda_func, output_shape=(1,), name='ctc')([ocr_model.output, labels, input_length, label_length])
training_model = models.Model(inputs=[ocr_model.input, labels, input_length, label_length], outputs=loss_out)
training_model.compile(optimizer='adam', loss={'ctc': lambda y_true, y_pred: y_pred})I executed the above example code and added the screenshot below.

This method sets up the CTC loss and training wrapper, enabling the OCR model to learn character sequences even without explicit alignment between image pixels and text labels.
Method 5: Prepare Data Generator for Training
We create a generator to feed images and labels in batches.
def data_generator(X, y, batch_size=32):
while True:
for i in range(0, len(X), batch_size):
X_batch = X[i:i+batch_size]
y_batch = y[i:i+batch_size]
label_length = np.ones((len(y_batch), 1)) * max_length
input_length = np.ones((len(y_batch), 1)) * (X_batch.shape[2] // 4) # after pooling layers
inputs = {
'input_image': X_batch,
'ground_truth_labels': y_batch,
'input_length': input_length,
'label_length': label_length
}
outputs = {'ctc': np.zeros(len(y_batch))}
yield inputs, outputs
# Prepare encoded labels for training
y_train_encoded = y_encoded
batch_size = 32
train_gen = data_generator(X_data, y_train_encoded, batch_size)This method builds a batch-wise data generator that prepares images, encoded labels, and required CTC inputs, making the OCR model ready for efficient streaming training.
Method 6: Train the OCR Model
Now we train the model using the generator.
steps_per_epoch = len(X_data) // batch_size
training_model.fit(train_gen,
steps_per_epoch=steps_per_epoch,
epochs=10)Building an OCR model for reading CAPTCHAs using Keras is a rewarding experience that combines image processing with sequence modeling. By following the steps of generating a dataset, encoding labels, designing a CNN-RNN architecture, and training with CTC loss, you can create a robust model capable of recognizing complex CAPTCHA patterns.
This approach leverages Keras’s flexibility and ease of use, making it accessible even if you’re new to OCR or deep learning. With further tuning and real-world data, you can improve accuracy and adapt the model for various OCR challenges.
Other Python Keras articles you may also like:
- Keypoint Detection with Transfer Learning in Keras
- Object Detection Using Vision Transformers in Keras
- Monocular Depth Estimation Using Keras
- Monocular Depth Estimation Using Keras

I am Bijay Kumar, a Microsoft MVP in SharePoint. Apart from SharePoint, I started working on Python, Machine learning, and artificial intelligence for the last 5 years. During this time I got expertise in various Python libraries also like Tkinter, Pandas, NumPy, Turtle, Django, Matplotlib, Tensorflow, Scipy, Scikit-Learn, etc… for various clients in the United States, Canada, the United Kingdom, Australia, New Zealand, etc. Check out my profile.