Recently, I was working on a deep learning project that involved analyzing customer data, where I needed to implement neural networks. One of the fundamental building blocks I used was the fully connected layer in TensorFlow.
If you’re building neural networks with TensorFlow, you’ll inevitably work with fully connected layers (also called dense layers). These layers are the workhorses of deep learning models.
In this tutorial, I’ll share how to implement fully connected layers in TensorFlow, explain their inner workings, and provide practical examples you can use in your projects.
Fully Connected Layer in TensorFlow
A fully connected layer, implemented as tf.keras.layers.Dense in TensorFlow, is a neural network layer where each neuron is connected to every neuron in the previous layer.
These layers learn complex patterns by applying weights to input features, adding a bias term, and then applying an activation function.
In practical terms, they transform input data into more meaningful representations that help solve your specific task, whether it’s classification, regression, or something else.
Read Tensorflow Convert String to Int
Create a Basic Fully Connected Layer
The simplest way to create a fully connected layer in TensorFlow is to use the Dense layer from Keras:
import tensorflow as tf
# Create a basic fully connected layer with 64 neurons
dense_layer = tf.keras.layers.Dense(units=64, activation='relu')The two most important parameters here are:
units: The number of neurons in the layeractivation: The activation function (ReLU is commonly used)
Build a Complete Neural Network with Fully Connected Layers
Let’s build a complete neural network for classifying customer data:
import tensorflow as tf
from tensorflow import keras
# Number of input features
features = 10 # change this to match your data
# Create a sequential model
model = keras.Sequential([
keras.layers.Dense(128, activation='relu', input_shape=(features,)),
keras.layers.Dense(64, activation='relu'),
keras.layers.Dense(1, activation='sigmoid') # for binary classification
])
# Compile the model
model.compile(
optimizer='adam',
loss='binary_crossentropy',
metrics=['accuracy']
)
# Print model summary
model.summary() I executed the above example code and added the screenshot below.

In this example, I’ve created a three-layer network that can be used for tasks such as predicting customer churn or purchase likelihood.
Check out TensorFlow Variable
Understand Fully Connected Layer Parameters
When working with fully connected layers, you’ll encounter several important parameters:
Units (Neurons)
The units parameter defines how many neurons are in the layer:
# Layer with 256 neurons
dense_layer = tf.keras.layers.Dense(units=256, activation='relu')More neurons can capture more complex patterns but require more computation and may lead to overfitting.
Activation Functions
The activation function introduces non-linearity, allowing your network to learn complex patterns:
# Using different activation functions
relu_layer = tf.keras.layers.Dense(64, activation='relu')
sigmoid_layer = tf.keras.layers.Dense(64, activation='sigmoid')
tanh_layer = tf.keras.layers.Dense(64, activation='tanh')ReLU (Rectified Linear Unit) is often preferred for hidden layers due to its computational efficiency and ability to mitigate the vanishing gradient problem.
Weight Initialization
How weights are initialized can significantly impact training:
# Using different initializers
dense_layer = tf.keras.layers.Dense(
64,
activation='relu',
kernel_initializer='he_normal' # Good for ReLU
)For ReLU activations, ‘he_normal’ or ‘he_uniform’ initializers work well.
Read Tensor in TensorFlow
Implement a Practical Example: Customer Churn Prediction
Let’s implement a practical example for predicting customer churn:
import tensorflow as tf
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
# Create dummy customer data (1000 samples, 10 features)
np.random.seed(42)
X_data = np.random.rand(1000, 10) # 1000 customers with 10 features
y_data = np.random.randint(0, 2, size=(1000,)) # Binary churn labels (0 or 1)
# Normalize the data
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X_data)
# Split the data
X_train, X_test, y_train, y_test = train_test_split(
X_scaled, y_data, test_size=0.2, random_state=42
)
# Build the model
model = tf.keras.Sequential([
tf.keras.layers.Dense(64, activation='relu', input_shape=(X_train.shape[1],)),
tf.keras.layers.Dropout(0.2),
tf.keras.layers.Dense(32, activation='relu'),
tf.keras.layers.Dropout(0.2),
tf.keras.layers.Dense(1, activation='sigmoid')
])
# Compile the model
model.compile(
optimizer=tf.keras.optimizers.Adam(learning_rate=0.001),
loss='binary_crossentropy',
metrics=['accuracy', tf.keras.metrics.AUC()]
)
# Train the model
history = model.fit(
X_train, y_train,
validation_split=0.2,
epochs=50,
batch_size=32,
callbacks=[
tf.keras.callbacks.EarlyStopping(
monitor='val_loss',
patience=5,
restore_best_weights=True
)
]
)
# Evaluate the model
test_results = model.evaluate(X_test, y_test, verbose=0)
print(f"Test accuracy: {test_results[1]:.4f}")
print(f"Test AUC: {test_results[2]:.4f}")
# Make predictions
predictions = model.predict(X_test)
print("Sample predictions:", predictions[:5].flatten())I executed the above example code and added the screenshot below.

This model uses multiple dense layers with dropout regularization to predict customer churn, a common business problem.
Check out Compile Neural Network in Tensorflow
Advanced Techniques for Fully Connected Layers
Implementing Batch Normalization
Batch normalization helps stabilize and accelerate training:
model = tf.keras.Sequential([
tf.keras.layers.Dense(64, input_shape=(features,)),
tf.keras.layers.BatchNormalization(),
tf.keras.layers.Activation('relu'),
tf.keras.layers.Dense(32),
tf.keras.layers.BatchNormalization(),
tf.keras.layers.Activation('relu'),
tf.keras.layers.Dense(1, activation='sigmoid')
])Note how I’ve separated the dense layer, batch normalization, and activation into distinct steps.
Using Regularization
To prevent overfitting, apply regularization:
# L2 regularization (weight decay)
dense_layer = tf.keras.layers.Dense(
64,
activation='relu',
kernel_regularizer=tf.keras.regularizers.l2(0.001)
)L2 regularization penalizes large weights, encouraging the network to learn simpler patterns.
Creating a Custom Layer
Sometimes, you might need a custom fully connected layer:
class CustomDense(tf.keras.layers.Layer):
def __init__(self, units, activation=None):
super(CustomDense, self).__init__()
self.units = units
self.activation = tf.keras.activations.get(activation)
def build(self, input_shape):
self.w = self.add_weight(
shape=(input_shape[-1], self.units),
initializer='glorot_uniform',
trainable=True,
)
self.b = self.add_weight(
shape=(self.units,),
initializer='zeros',
trainable=True,
)
def call(self, inputs):
output = tf.matmul(inputs, self.w) + self.b
if self.activation is not None:
output = self.activation(output)
return output
# Use the custom layer
model = tf.keras.Sequential([
CustomDense(64, activation='relu', input_shape=(features,)),
CustomDense(32, activation='relu'),
CustomDense(1, activation='sigmoid')
])I executed the above example code and added the screenshot below.

This custom implementation gives you more control over the layer’s behavior.
Read Build an Artificial Neural Network in Tensorflow
Common Issues and Solutions
Now, I will explain the common issues that occur while working with TensorFlow and their solutions.
Vanishing Gradients
If you’re experiencing vanishing gradients:
# Use ReLU or Leaky ReLU activations
model = tf.keras.Sequential([
tf.keras.layers.Dense(128, activation='relu', input_shape=(features,)),
tf.keras.layers.Dense(64, activation='relu'),
tf.keras.layers.Dense(1, activation='sigmoid')
])
# Or with Leaky ReLU
model = tf.keras.Sequential([
tf.keras.layers.Dense(128, input_shape=(features,)),
tf.keras.layers.LeakyReLU(alpha=0.1),
tf.keras.layers.Dense(64),
tf.keras.layers.LeakyReLU(alpha=0.1),
tf.keras.layers.Dense(1, activation='sigmoid')
])Overfitting
If your model is overfitting:
model = tf.keras.Sequential([
tf.keras.layers.Dense(128, activation='relu', input_shape=(features,)),
tf.keras.layers.Dropout(0.3), # Add dropout
tf.keras.layers.Dense(64, activation='relu',
kernel_regularizer=tf.keras.regularizers.l2(0.001)), # Add L2
tf.keras.layers.Dropout(0.3),
tf.keras.layers.Dense(1, activation='sigmoid')
])Combining dropout and L2 regularization can help combat overfitting.
Check out Basic TensorFlow Constructs: Tensors and Operations
Visualize Fully Connected Layer Outputs
It’s often helpful to visualize what your fully connected layers are learning:
import matplotlib.pyplot as plt
# Create a model that outputs the activations of a specific layer
layer_outputs = [layer.output for layer in model.layers]
activation_model = tf.keras.Model(inputs=model.input, outputs=layer_outputs)
# Get activations for a sample input
activations = activation_model.predict(X_test[0:1])
# Plot the activations of the first dense layer
plt.figure(figsize=(10, 5))
plt.matshow(activations[0].reshape(8, 8), cmap='viridis')
plt.title('Activations of First Dense Layer')
plt.colorbar()
plt.show()This visualization can help you understand what patterns your network is detecting.
I hope you found this article helpful for implementing fully connected layers in TensorFlow. These versatile layers form the backbone of many neural network architectures, and mastering them will significantly enhance your deep learning projects.
Other TensorFlow articles you may also like:
- Tensorflow Gradient Descent in Neural Network
- Tensorflow Activation Functions
- Use TensorFlow’s get_shape Function

I am Bijay Kumar, a Microsoft MVP in SharePoint. Apart from SharePoint, I started working on Python, Machine learning, and artificial intelligence for the last 5 years. During this time I got expertise in various Python libraries also like Tkinter, Pandas, NumPy, Turtle, Django, Matplotlib, Tensorflow, Scipy, Scikit-Learn, etc… for various clients in the United States, Canada, the United Kingdom, Australia, New Zealand, etc. Check out my profile.