Skip to content
← Back to blog

What CNNs Actually See When They Look at Your Clothes

2020-03-18 · 2 min read

machine-learningdeep-learningpython

Originally published on Medium / The Startup


I wanted to understand what happens inside a CNN — not at the textbook level, but at the "I can see the filters activating" level. Fashion-MNIST was the perfect playground: 10 classes of clothing, 28x28 grayscale images, just complex enough to be interesting.

The setup

60,000 training images. 10,000 test images. Perfectly balanced classes. The data was almost too clean — which became the first lesson.

Making it harder on purpose

Real-world images aren't perfectly centered and aligned. So I tripled the dataset by applying rotations and translations. If the model can recognize a rotated sneaker, it'll generalize better.

def rotate_img(img, rot_deg):
    rows, cols = img.shape[0], img.shape[1]
    M = cv2.getRotationMatrix2D((cols/2, rows/2), rot_deg, 1)
    return cv2.warpAffine(img, M, (cols, rows))

180,000 images. Shuffled. Now we're talking.

The architecture

Three conv layers, each followed by max pooling. Nothing fancy — 45,770 parameters total. The point wasn't to build the biggest model. It was to understand what each layer does.

cnn_model = Sequential()
cnn_model.add(Conv2D(128, (5,5), activation='relu', input_shape=(28,28,1)))
cnn_model.add(MaxPooling2D(pool_size=(3,3), strides=(3,3)))
cnn_model.add(Conv2D(64, (2,2), activation='relu'))
cnn_model.add(MaxPooling2D(pool_size=(2,2)))
cnn_model.add(Conv2D(32, (2,2), activation='relu'))
cnn_model.add(MaxPooling2D(pool_size=(2,2), strides=(2,2)))
cnn_model.add(Flatten())
cnn_model.add(Dense(32, activation='relu'))
cnn_model.add(Dense(10, activation='softmax'))

What I actually learned

91.1% accuracy. Sounds good on paper. But the interesting part was the failure mode — the model was over-predicting sneakers by 60%. It had developed a bias.

This taught me something I keep coming back to in my work at AWS: a single aggregate metric can hide serious problems. You have to look at per-class performance. You have to understand where your system fails, not just how often.

What I'd do differently today

Dropout from the start. Learning rate scheduling. And honestly, I'd spend more time on the failure analysis than the architecture. The model was fine — understanding its blind spots was the real work.