PiT_MNIST_Colab / Google-ML-Crash-Course_MNIST_model.py
MartialTerran's picture
Update Google-ML-Crash-Course_MNIST_model.py
613ec7f verified
raw
history blame
12.4 kB
# Upgraded to use TensorFlow 2.x and the Keras API. Provides multiple visualizations of the weights.
# Ready to copy, paste, run in colab using TensorFlow 2.x
"""
Corrected and upgraded by the Martial Terran, from
https://github.com/spiderPan/Google-Machine-Learning-Crash-Course/blob/master/multi-class_classfication_of_handwritten_digits.py
The architecture of this model is not a CNN (Convolutional Neural Network).
It is a Dense Neural Network (DNN), also commonly known as a Multilayer Perceptron (MLP).
Let's break down why and look at the specific architecture.
Why it's a DNN and Not a CNN
The defining characteristic of a CNN is its use of convolutional layers (Conv2D). These layers are specifically designed to work with grid-like data, such as images. They use filters (or kernels) to slide across the input image, detecting spatial patterns like edges, textures, and shapes.
This model does not use any convolutional layers. Instead, its core components are Dense layers (tf.keras.layers.Dense).
DNN Approach: The 28x28 pixel image is flattened into a single vector of 784 numbers. The Dense layers treat these numbers as a simple list, with no inherent understanding that pixel #29 is directly below pixel #1. It learns patterns from the pixel values themselves, but loses all the spatial relationships between them.
CNN Approach: A CNN would take the input as a 2D grid (e.g., shape=(28, 28, 1)) and use Conv2D layers to analyze neighboring pixels, preserving the spatial structure of the image.
The Specific Architecture of this Model
You can see the exact architecture from the code or by printing the model's summary (model.summary()).
Based on the code with hidden_units = [100, 100], the architecture is as follows:
Layer # Layer Type Description Output Shape
1 Input A flat vector of 784 pixel values (28x28). (None, 784)
2 Dense First fully-connected hidden layer. Every one of its 100 neurons is connected to all 784 input pixels. (None, 100)
3 Dense Second fully-connected hidden layer. Every one of its 100 neurons is connected to all 100 neurons before it. (None, 100)
4 Dropout Regularization layer. Randomly sets 20% of neuron activations to zero during training to prevent overfitting. (None, 100)
5 Dense The final output layer. It has 10 neurons, one for each class (digits 0-9). (None, 10)
Softmax The activation function on the output layer that converts the outputs into a probability distribution. (None, 10)
(Note: "None" in the output shape refers to the batch size, which can vary.)
In summary:
It's a DNN/MLP: It uses stacked Dense (fully-connected) layers.
It's not a CNN: It lacks Conv2D and MaxPooling2D layers, and it flattens the image data, discarding the crucial 2D spatial information that CNNs are built to exploit.
Model Summary:
/usr/local/lib/python3.11/dist-packages/keras/src/layers/core/input_layer.py:27: UserWarning: Argument `input_shape` is deprecated. Use `shape` instead.
warnings.warn(
Model: "sequential_1"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Layer (type) ┃ Output Shape ┃ Param # ┃
┑━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
β”‚ dense_3 (Dense) β”‚ (None, 100) β”‚ 78,500 β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ dense_4 (Dense) β”‚ (None, 100) β”‚ 10,100 β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ dropout_1 (Dropout) β”‚ (None, 100) β”‚ 0 β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ dense_5 (Dense) β”‚ (None, 10) β”‚ 1,010 β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
Total params: 89,610 (350.04 KB)
Trainable params: 89,610 (350.04 KB)
Non-trainable params: 0 (0.00 B)
Final accuracy (on validation data): 0.96
Evaluating on test data...
Accuracy on test data: 0.96
"""
import glob
import math
import os
import numpy as np
import pandas as pd
import seaborn as sns
import tensorflow as tf
from IPython.display import display
from matplotlib import pyplot as plt
from sklearn import metrics
# Set pandas display options
pd.options.display.max_rows = 10
pd.options.display.float_format = '{:.1f}'.format
def parse_labels_and_features(dataset):
"""Parses a dataset into features and labels.
Args:
dataset: A Pandas DataFrame with the first column being the label
and the remaining columns as pixel data.
Returns:
A tuple of (labels, features), where both are Pandas Series/DataFrame.
"""
labels = dataset[0]
# The remaining 784 columns are the pixel values.
features = dataset.loc[:, 1:784]
# Normalize the feature values to be in the range [0, 1].
features = features / 255
return labels, features
def create_and_train_nn_model(
learning_rate,
epochs,
batch_size,
hidden_units,
training_examples,
training_targets,
validation_examples,
validation_targets):
"""
Creates, trains, and evaluates a Deep Neural Network model using tf.keras.
Args:
learning_rate: The learning rate for the optimizer.
epochs: The number of times to iterate through the training data.
batch_size: The number of examples to use in each training step.
hidden_units: A list of integers, where each integer is the number of nodes
in a hidden layer.
training_examples: DataFrame of training features.
training_targets: Series of training labels.
validation_examples: DataFrame of validation features.
validation_targets: Series of validation labels.
Returns:
The trained tf.keras.Model object and the training history.
"""
# 1. Define the model architecture
model = tf.keras.models.Sequential()
# Input layer (no feature columns needed for dense input)
model.add(tf.keras.layers.InputLayer(input_shape=(784,)))
# Add hidden layers
for units in hidden_units:
model.add(tf.keras.layers.Dense(units, activation='relu'))
# Add a dropout layer for regularization to prevent overfitting
model.add(tf.keras.layers.Dropout(0.2))
# Output layer with 10 units for 10 classes (0-9) and softmax activation
model.add(tf.keras.layers.Dense(10, activation='softmax'))
# 2. Compile the model
model.compile(
optimizer=tf.keras.optimizers.Adagrad(learning_rate=learning_rate),
loss="sparse_categorical_crossentropy",
metrics=['accuracy']
)
# Print a summary of the model
print("Model Summary:")
model.summary()
print("\nTraining Model...")
# 3. Train the model
history = model.fit(
x=training_examples.values,
y=training_targets.values,
batch_size=batch_size,
epochs=epochs,
shuffle=True,
validation_data=(validation_examples.values, validation_targets.values),
# Suppress verbose logs, show one line per epoch
verbose=2
)
print("Model training finished.")
# 4. Plot the results
training_loss = history.history["loss"]
validation_loss = history.history["val_loss"]
epochs_range = range(1, epochs + 1)
plt.figure(figsize=(10, 5))
plt.ylabel("Loss (Sparse Categorical Crossentropy)")
plt.xlabel("Epochs")
plt.title("Loss vs. Epochs")
plt.plot(epochs_range, training_loss, label="Training")
plt.plot(epochs_range, validation_loss, label="Validation")
plt.legend()
plt.show()
# 5. Show a confusion matrix
# Get predictions for the validation set
validation_probabilities = model.predict(validation_examples.values)
validation_predictions = np.argmax(validation_probabilities, axis=1)
cm = metrics.confusion_matrix(validation_targets, validation_predictions)
cm_normalized = cm.astype("float") / cm.sum(axis=1)[:, np.newaxis]
plt.figure(figsize=(8, 8))
ax = sns.heatmap(cm_normalized, cmap="bone_r", annot=True, fmt=".2f")
ax.set_aspect(1)
plt.title("Confusion Matrix")
plt.ylabel("True Label")
plt.xlabel("Predicted Label")
plt.show()
# Print final validation accuracy from the last epoch
final_validation_accuracy = history.history["val_accuracy"][-1]
print(f"Final accuracy (on validation data): {final_validation_accuracy:.2f}")
return model, history
# --- Main Execution ---
# Load the datasets
mnist_dataframe = pd.read_csv('sample_data/mnist_train_small.csv', sep=",", header=None)
mnist_test_dataframe = pd.read_csv('sample_data/mnist_test.csv', sep=',', header=None)
# Shuffle and select a subset of the training data
mnist_dataframe = mnist_dataframe.head(10000)
mnist_dataframe = mnist_dataframe.reindex(np.random.permutation(mnist_dataframe.index))
display(mnist_dataframe.head())
# Parse features and labels
training_targets, training_examples = parse_labels_and_features(mnist_dataframe[:7500])
validation_targets, validation_examples = parse_labels_and_features(mnist_dataframe[7500:10000])
testing_targets, testing_examples = parse_labels_and_features(mnist_test_dataframe)
display(training_examples.describe())
display(validation_examples.describe())
# Show a random example from the training set
rand_example_idx = np.random.choice(training_examples.index)
_, ax = plt.subplots()
ax.matshow(training_examples.loc[rand_example_idx].values.reshape(28, 28))
ax.set_title(f"Label: {training_targets.loc[rand_example_idx]}")
ax.grid(False)
plt.show()
# Define hyperparameters
# The original script used `steps=1000`, `batch_size=30`, `periods=10`.
# With 7500 training examples, one epoch is 7500/30 = 250 steps.
# To get a similar amount of training (1000 steps), we need 1000/250 = 4 epochs.
# We will use 10 epochs to match the 10 "periods" from the original for better visualization.
LEARNING_RATE = 0.05
EPOCHS = 25
BATCH_SIZE = 30
HIDDEN_UNITS = [100, 100]
# Train the model
trained_model, history = create_and_train_nn_model(
learning_rate=LEARNING_RATE,
epochs=EPOCHS,
batch_size=BATCH_SIZE,
hidden_units=HIDDEN_UNITS,
training_examples=training_examples,
training_targets=training_targets,
validation_examples=validation_examples,
validation_targets=validation_targets
)
# Evaluate the model on the test data
print("\nEvaluating on test data...")
loss, accuracy = trained_model.evaluate(testing_examples.values, testing_targets.values, verbose=0)
print(f"Accuracy on test data: {accuracy:.2f}")
# Visualize the weights of the first hidden layer
print("\nVisualizing weights of the first hidden layer...")
# The first Dense layer is at index 0 in the model.layers list
# get_weights() returns a list [kernel, bias], we need the kernel [0]
weights0 = trained_model.layers[0].get_weights()[0]
print('Weights 0 shape:', weights0.shape)
num_nodes = weights0.shape[1]
num_rows = int(math.ceil(num_nodes / 10.0))
fig, axes = plt.subplots(num_rows, 10, figsize=(20, 2 * num_rows))
for coef, ax in zip(weights0.T, axes.ravel()):
# Weights are reshaped from (784,) to (28, 28) for visualization.
ax.matshow(coef.reshape(28, 28), cmap=plt.cm.pink)
ax.set_xticks(())
ax.set_yticks(())
plt.suptitle("First Hidden Layer Weights", fontsize=20)
plt.show()