TensorFlow Fully Connected Layer

Recently, I was working on a deep learning project that involved analyzing customer data, where I needed to implement neural networks. One of the fundamental building blocks I used was the fully connected layer in TensorFlow.

If you’re building neural networks with TensorFlow, you’ll inevitably work with fully connected layers (also called dense layers). These layers are the workhorses of deep learning models.

In this tutorial, I’ll share how to implement fully connected layers in TensorFlow, explain their inner workings, and provide practical examples you can use in your projects.

Fully Connected Layer in TensorFlow

A fully connected layer, implemented as tf.keras.layers.Dense in TensorFlow, is a neural network layer where each neuron is connected to every neuron in the previous layer.

These layers learn complex patterns by applying weights to input features, adding a bias term, and then applying an activation function.

In practical terms, they transform input data into more meaningful representations that help solve your specific task, whether it’s classification, regression, or something else.

Read Tensorflow Convert String to Int

Create a Basic Fully Connected Layer

The simplest way to create a fully connected layer in TensorFlow is to use the Dense layer from Keras:

import tensorflow as tf

# Create a basic fully connected layer with 64 neurons
dense_layer = tf.keras.layers.Dense(units=64, activation='relu')

The two most important parameters here are:

  • units: The number of neurons in the layer
  • activation: The activation function (ReLU is commonly used)

Build a Complete Neural Network with Fully Connected Layers

Let’s build a complete neural network for classifying customer data:

import tensorflow as tf
from tensorflow import keras

# Number of input features
features = 10  # change this to match your data

# Create a sequential model
model = keras.Sequential([
    keras.layers.Dense(128, activation='relu', input_shape=(features,)),
    keras.layers.Dense(64, activation='relu'),
    keras.layers.Dense(1, activation='sigmoid')  # for binary classification
])

# Compile the model
model.compile(
    optimizer='adam',
    loss='binary_crossentropy',
    metrics=['accuracy']
)

# Print model summary
model.summary() 

I executed the above example code and added the screenshot below.

fully connected layer tensorflow

In this example, I’ve created a three-layer network that can be used for tasks such as predicting customer churn or purchase likelihood.

Check out TensorFlow Variable

Understand Fully Connected Layer Parameters

When working with fully connected layers, you’ll encounter several important parameters:

Units (Neurons)

The units parameter defines how many neurons are in the layer:

# Layer with 256 neurons
dense_layer = tf.keras.layers.Dense(units=256, activation='relu')

More neurons can capture more complex patterns but require more computation and may lead to overfitting.

Activation Functions

The activation function introduces non-linearity, allowing your network to learn complex patterns:

# Using different activation functions
relu_layer = tf.keras.layers.Dense(64, activation='relu')
sigmoid_layer = tf.keras.layers.Dense(64, activation='sigmoid')
tanh_layer = tf.keras.layers.Dense(64, activation='tanh')

ReLU (Rectified Linear Unit) is often preferred for hidden layers due to its computational efficiency and ability to mitigate the vanishing gradient problem.

Weight Initialization

How weights are initialized can significantly impact training:

# Using different initializers
dense_layer = tf.keras.layers.Dense(
    64, 
    activation='relu',
    kernel_initializer='he_normal'  # Good for ReLU
)

For ReLU activations, ‘he_normal’ or ‘he_uniform’ initializers work well.

Read Tensor in TensorFlow

Implement a Practical Example: Customer Churn Prediction

Let’s implement a practical example for predicting customer churn:

import tensorflow as tf
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

# Create dummy customer data (1000 samples, 10 features)
np.random.seed(42)
X_data = np.random.rand(1000, 10)  # 1000 customers with 10 features
y_data = np.random.randint(0, 2, size=(1000,))  # Binary churn labels (0 or 1)

# Normalize the data
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X_data)

# Split the data
X_train, X_test, y_train, y_test = train_test_split(
    X_scaled, y_data, test_size=0.2, random_state=42
)

# Build the model
model = tf.keras.Sequential([
    tf.keras.layers.Dense(64, activation='relu', input_shape=(X_train.shape[1],)),
    tf.keras.layers.Dropout(0.2),
    tf.keras.layers.Dense(32, activation='relu'),
    tf.keras.layers.Dropout(0.2),
    tf.keras.layers.Dense(1, activation='sigmoid')
])

# Compile the model
model.compile(
    optimizer=tf.keras.optimizers.Adam(learning_rate=0.001),
    loss='binary_crossentropy',
    metrics=['accuracy', tf.keras.metrics.AUC()]
)

# Train the model
history = model.fit(
    X_train, y_train,
    validation_split=0.2,
    epochs=50,
    batch_size=32,
    callbacks=[
        tf.keras.callbacks.EarlyStopping(
            monitor='val_loss',
            patience=5,
            restore_best_weights=True
        )
    ]
)

# Evaluate the model
test_results = model.evaluate(X_test, y_test, verbose=0)
print(f"Test accuracy: {test_results[1]:.4f}")
print(f"Test AUC: {test_results[2]:.4f}")

# Make predictions
predictions = model.predict(X_test)
print("Sample predictions:", predictions[:5].flatten())

I executed the above example code and added the screenshot below.

keras fully connected layer

This model uses multiple dense layers with dropout regularization to predict customer churn, a common business problem.

Check out Compile Neural Network in Tensorflow

Advanced Techniques for Fully Connected Layers

Implementing Batch Normalization

Batch normalization helps stabilize and accelerate training:

model = tf.keras.Sequential([
    tf.keras.layers.Dense(64, input_shape=(features,)),
    tf.keras.layers.BatchNormalization(),
    tf.keras.layers.Activation('relu'),

    tf.keras.layers.Dense(32),
    tf.keras.layers.BatchNormalization(),
    tf.keras.layers.Activation('relu'),

    tf.keras.layers.Dense(1, activation='sigmoid')
])

Note how I’ve separated the dense layer, batch normalization, and activation into distinct steps.

Using Regularization

To prevent overfitting, apply regularization:

# L2 regularization (weight decay)
dense_layer = tf.keras.layers.Dense(
    64, 
    activation='relu',
    kernel_regularizer=tf.keras.regularizers.l2(0.001)
)

L2 regularization penalizes large weights, encouraging the network to learn simpler patterns.

Creating a Custom Layer

Sometimes, you might need a custom fully connected layer:

class CustomDense(tf.keras.layers.Layer):
    def __init__(self, units, activation=None):
        super(CustomDense, self).__init__()
        self.units = units
        self.activation = tf.keras.activations.get(activation)

    def build(self, input_shape):
        self.w = self.add_weight(
            shape=(input_shape[-1], self.units),
            initializer='glorot_uniform',
            trainable=True,
        )
        self.b = self.add_weight(
            shape=(self.units,),
            initializer='zeros',
            trainable=True,
        )

    def call(self, inputs):
        output = tf.matmul(inputs, self.w) + self.b
        if self.activation is not None:
            output = self.activation(output)
        return output

# Use the custom layer
model = tf.keras.Sequential([
    CustomDense(64, activation='relu', input_shape=(features,)),
    CustomDense(32, activation='relu'),
    CustomDense(1, activation='sigmoid')
])

I executed the above example code and added the screenshot below.

tensorflow fully connected layer

This custom implementation gives you more control over the layer’s behavior.

Read Build an Artificial Neural Network in Tensorflow

Common Issues and Solutions

Now, I will explain the common issues that occur while working with TensorFlow and their solutions.

Vanishing Gradients

If you’re experiencing vanishing gradients:

# Use ReLU or Leaky ReLU activations
model = tf.keras.Sequential([
    tf.keras.layers.Dense(128, activation='relu', input_shape=(features,)),
    tf.keras.layers.Dense(64, activation='relu'),
    tf.keras.layers.Dense(1, activation='sigmoid')
])

# Or with Leaky ReLU
model = tf.keras.Sequential([
    tf.keras.layers.Dense(128, input_shape=(features,)),
    tf.keras.layers.LeakyReLU(alpha=0.1),
    tf.keras.layers.Dense(64),
    tf.keras.layers.LeakyReLU(alpha=0.1),
    tf.keras.layers.Dense(1, activation='sigmoid')
])

Overfitting

If your model is overfitting:

model = tf.keras.Sequential([
    tf.keras.layers.Dense(128, activation='relu', input_shape=(features,)),
    tf.keras.layers.Dropout(0.3),  # Add dropout
    tf.keras.layers.Dense(64, activation='relu', 
                         kernel_regularizer=tf.keras.regularizers.l2(0.001)),  # Add L2
    tf.keras.layers.Dropout(0.3),
    tf.keras.layers.Dense(1, activation='sigmoid')
])

Combining dropout and L2 regularization can help combat overfitting.

Check out Basic TensorFlow Constructs: Tensors and Operations

Visualize Fully Connected Layer Outputs

It’s often helpful to visualize what your fully connected layers are learning:

import matplotlib.pyplot as plt

# Create a model that outputs the activations of a specific layer
layer_outputs = [layer.output for layer in model.layers]
activation_model = tf.keras.Model(inputs=model.input, outputs=layer_outputs)

# Get activations for a sample input
activations = activation_model.predict(X_test[0:1])

# Plot the activations of the first dense layer
plt.figure(figsize=(10, 5))
plt.matshow(activations[0].reshape(8, 8), cmap='viridis')
plt.title('Activations of First Dense Layer')
plt.colorbar()
plt.show()

This visualization can help you understand what patterns your network is detecting.

I hope you found this article helpful for implementing fully connected layers in TensorFlow. These versatile layers form the backbone of many neural network architectures, and mastering them will significantly enhance your deep learning projects.

Other TensorFlow articles you may also like:

51 Python Programs

51 PYTHON PROGRAMS PDF FREE

Download a FREE PDF (112 Pages) Containing 51 Useful Python Programs.

pyython developer roadmap

Aspiring to be a Python developer?

Download a FREE PDF on how to become a Python developer.

Let’s be friends

Be the first to know about sales and special discounts.