Large-Scale Multi-Label Text Classification with Keras

Have you ever found yourself staring at thousands of customer support tickets or legal documents, wondering how to categorize them automatically?

In my four years as a Keras developer, I’ve realized that assigning just one category to a text is rarely enough for complex, real-world data.

Most documents belong to multiple topics at once, such as a product review mentioning both “Price” and “Durability.”

In this tutorial, I will show you how to handle large-scale multi-label text classification using Python Keras based on my professional experience.

This Tutorial Covers:

Set Up Your Environment for Multi-Label Classification

Before we dive into the architecture, we need to ensure our environment is ready to handle large datasets efficiently.

I always recommend using a virtual environment to manage your Python Keras dependencies and avoid version conflicts.

# Install the necessary libraries
pip install tensorflow pandas scikit-learn numpy

Prepare a Real-World USA News Dataset

For this example, imagine we are categorizing news articles from major US outlets into labels like ‘Politics’, ‘Finance’, and ‘Technology’.

I prefer using Pandas for the initial data cleaning because it handles large CSV files seamlessly before feeding them into Keras.

import pandas as pd
import numpy as np

# Sample data representing US News headlines and multiple tags
data = {
    'text': [
        "Wall Street stocks rally as Federal Reserve hints at interest rate pause.",
        "New healthcare bill passed in Washington to lower prescription costs.",
        "Silicon Valley startup unveils AI chip to compete with industry giants.",
        "NASA's latest Mars rover sends back high-resolution images of the crater."
    ],
    'tags': [
        ['Finance', 'Economy'],
        ['Politics', 'Healthcare'],
        ['Technology', 'Finance'],
        ['Science', 'Technology']
    ]
}

df = pd.DataFrame(data)
print(df.head())

You can refer to the screenshot below to see the output.

Large-Scale Multi-Label Text Classification Keras

Encode Labels with MultiLabelBinarizer

In multi-label tasks, we cannot use simple integer encoding; we need a binary matrix where each column represents a label.

I use the MultiLabelBinarizer from Scikit-Learn because it is the most reliable way to transform nested lists into a format Keras understands.

from sklearn.preprocessing import MultiLabelBinarizer

# Initialize the binarizer
mlb = MultiLabelBinarizer()
y = mlb.fit_transform(df['tags'])

# View the classes and the transformed labels
print(f"Classes: {mlb.classes_}")
print(f"Encoded Labels:\n{y}")

You can refer to the screenshot below to see the output.

Large-Scale Multi-Label Text Classification with Keras

Text Tokenization for Python Keras Models

To process text, we must convert words into numerical sequences that the neural network can interpret.

I use the Keras Tokenizer class to create a vocabulary and pad sequences so that every input has the same length.

from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences

# Parameters
max_words = 10000
max_len = 100

tokenizer = Tokenizer(num_words=max_words, lower=True)
tokenizer.fit_on_texts(df['text'])

sequences = tokenizer.texts_to_sequences(df['text'])
X = pad_sequences(sequences, maxlen=max_len)

Method 1: Build a Deep Neural Network with Python Keras

This method involves creating a standard Feed-Forward network, which is excellent for speed and smaller text snippets.

I use a Dense architecture with a sigmoid activation in the final layer to allow for multiple labels to be predicted simultaneously.

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout, Embedding, GlobalMaxPool1D

def build_dnn_model():
    model = Sequential([
        Embedding(max_words, 128, input_length=max_len),
        GlobalMaxPool1D(),
        Dense(64, activation='relu'),
        Dropout(0.2),
        Dense(len(mlb.classes_), activation='sigmoid') # Sigmoid is key for multi-label
    ])
    
    model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
    return model

model_dnn = build_dnn_model()
model_dnn.summary()

Method 2: Utilize LSTM for Sequential Text Classification

When the context and order of words in a sentence matter, I always switch to Long Short-Term Memory (LSTM) layers.

This Python Keras approach captures the relationship between words, making it much more accurate for long US legal or medical documents.

from tensorflow.keras.layers import LSTM, SpatialDropout1D

def build_lstm_model():
    model = Sequential([
        Embedding(max_words, 128, input_length=max_len),
        SpatialDropout1D(0.2),
        LSTM(64, return_sequences=False),
        Dense(len(mlb.classes_), activation='sigmoid')
    ])
    
    model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
    return model

model_lstm = build_lstm_model()

Train the Python Keras Model on Large Datasets

Training on large-scale data requires monitoring the loss closely to prevent overfitting.

I usually implement an EarlyStopping callback to halt training once the validation loss stops improving, saving time and compute power.

from tensorflow.keras.callbacks import EarlyStopping

# Training the model
history = model_dnn.fit(
    X, y, 
    epochs=20, 
    batch_size=32, 
    validation_split=0.2,
    callbacks=[EarlyStopping(monitor='val_loss', patience=3)]
)

Handle Imbalanced Labels in Python Keras

In large-scale datasets, some labels (like ‘General News’) appear much more frequently than others (like ‘Niche Science’).

I use class weights or custom loss functions to ensure the model doesn’t ignore the minority labels during the training process.

# Example of calculating basic weights (simplified)
from sklearn.utils.class_weight import compute_sample_weight

weights = compute_sample_weight('balanced', y)
# You would then pass this to the fit method
# model.fit(X, y, sample_weight=weights)

Evaluate Model Performance with Python Keras

Accuracy can be misleading in multi-label classification because predicting all zeros might still result in high accuracy.

I prefer using the F1-score or Precision-Recall curves to get a true sense of how the model performs across all labels.

from sklearn.metrics import classification_report

# Getting predictions
predictions = model_dnn.predict(X)
# Thresholding at 0.5
predictions_binary = (predictions > 0.5).astype(int)

print(classification_report(y, predictions_binary, target_names=mlb.classes_))

Save and Load Your Python Keras Model

Once you are satisfied with the results, you need to save the model for production use in your US-based applications.

I use the .h5 or SavedModel format to store both the architecture and the weights in a single file.

# Save the model
model_dnn.save('multi_label_news_model.h5')

# Load the model back
from tensorflow.keras.models import load_model
new_model = load_model('multi_label_news_model.h5')

Deploy Predictions on New Data

To use the model in a real scenario, you must preprocess the new text exactly like the training data.

I always create a helper function to wrap the tokenizer and the model prediction for a cleaner production workflow.

def predict_tags(text):
    seq = tokenizer.texts_to_sequences([text])
    padded = pad_sequences(seq, maxlen=max_len)
    pred = model_dnn.predict(padded)
    
    # Get labels where probability is > 0.5
    tags = [mlb.classes_[i] for i, prob in enumerate(pred[0]) if prob > 0.5]
    return tags

print(predict_tags("The economy is seeing a shift due to new tech innovations."))

In this tutorial, I showed you how to build and train a large-scale multi-label text classification model using Python Keras.

I have used these exact methods to categorize high-volume data in various industries with great success.

It is often a matter of trial and error to find the right balance between model complexity and training speed.

You may read:

Bijay Kumar

Bijay Kumar is an experienced Python and AI professional who enjoys helping developers learn modern technologies through practical tutorials and examples. His expertise includes Python development, Machine Learning, Artificial Intelligence, automation, and data analysis using libraries like Pandas, NumPy, TensorFlow, Matplotlib, SciPy, and Scikit-Learn. At PythonGuides.com, he shares in-depth guides designed for both beginners and experienced developers. More about us.

enjoysharepoint.com/