Building a Neural Network Classifier from Scratch

Building a Neural Network Classifier from Scratch

Abstract

This project documents my first hands-on journey into machine learning. I built a binary image classifier from scratch using logistic regression and NumPy—no machine learning libraries, no magic boxes. Just math, code, and curiosity. The goal: to identify whether an image contains a cat.

By tackling this from a neural network mindset, I practiced essential concepts like vectorization, forward and backward propagation, loss functions, and gradient descent. These fundamentals are crucial as I move toward applying AI to wildfire modeling, fire investigation, and predictive fire protection research.

Key Concepts and Mathematical Intuition (Deep Dive)

1. Vectorization

What it is: Performing operations on entire matrices or vectors at once.

Why: It's way faster than for-loops and scales efficiently with large datasets.

How: We use NumPy's dot function to apply weights to all examples in one go.

2. Sigmoid Activation

What it is: A function that maps inputs into a range from 0 to 1.

σ(z) = 1 / (1 + e^(-z))

Why: It transforms raw model outputs into probabilities.

3. Forward Propagation

z = np.dot(w.T, X) + b
A = sigmoid(z)

In Layman’s Terms: Imagine you're trying to decide if a picture has a cat. Each pixel gives you a clue. The model (like your brain) weighs all those clues, adds them up, and decides: "Hmm, looks about 85% like a cat!"

How It Works: The model takes input features (pixel data) and applies learned weights (what the model thinks is important) and a bias (adjustable offset). It adds them together to get a score z. Then it uses the sigmoid function to squash that score into a probability A.

Why It’s Important: This is the step where your model makes a prediction using what it has learned so far.

Analogy: Like scanning a checklist: "Does this have ears? Fur? Whiskers?" You tally up the results and output a confidence score.

Memory Tip: Forward propagation = “Is this a cat? Let me check...”

4. Cross-Entropy Loss

cost = -1/m * np.sum(Y * np.log(A) + (1 - Y) * np.log(1 - A))

In Layman’s Terms: Imagine you're grading how close your guess was. If you said "90% cat" but it wasn't a cat, that's a big miss. Cross-entropy measures that gap between what you said and what was true.

How It Works: The closer your prediction is to the actual label, the smaller the cost. Confident mistakes are penalized more.

Analogy: Like guessing the spiciness of a pepper from 1 to 10 — the further off you are, the higher your penalty.

5. Backward Propagation

dw = 1/m * np.dot(X, (A - Y).T)
db = 1/m * np.sum(A - Y)

In Layman’s Terms: Now that you know how far off you were, how can you improve? Backpropagation is like tracing back your mistake to figure out which part of your thought process (aka weights) was off.

How It Works: We take the derivative (slope) of the loss with respect to each parameter. This shows how sensitive the error is to each weight or bias.

Analogy: Like getting feedback from a coach: "Your kick was off because your foot placement was wrong."

6. Gradient Descent

w = w - learning_rate * dw
b = b - learning_rate * db

In Layman’s Terms: Imagine you’re on a foggy hill and want to get to the bottom. You can’t see far, but you can feel the slope — and take steps downhill.

How It Works: Using the gradients from backprop, you subtract a little from each weight to reduce the error. Over time, you reach a low-cost state.

Analogy: Learning is like hiking: small steps in the right direction add up to big progress.

Conclusion and Reflections

This project has been a foundational journey into the world of machine learning. By building a logistic regression model entirely from scratch, I learned how core concepts like vectorization, forward and backward propagation, gradient descent, and loss functions work not just in theory, but in actual code.

Using Python and NumPy allowed me to harness the power of efficient numerical computation and matrix operations—two critical pillars of modern machine learning frameworks.

Most importantly, it fuels my future plans to apply AI to wildfire prediction (WUI), fire investigation, and smoke modeling in real-world fire protection engineering projects.

Logistic Regression Classifier Code

Click to view full Python + NumPy implementation
import numpy as np

def sigmoid(z):
    return 1 / (1 + np.exp(-z))

def initialize_with_zeros(dim):
    w = np.zeros((dim, 1))
    b = 0.0
    return w, b

def propagate(w, b, X, Y):
    m = X.shape[1]
    A = sigmoid(np.dot(w.T, X) + b)
    cost = -1/m * np.sum(Y * np.log(A) + (1 - Y) * np.log(1 - A))
    dw = 1/m * np.dot(X, (A - Y).T)
    db = 1/m * np.sum(A - Y)
    grads = {"dw": dw, "db": db}
    return grads, cost

def optimize(w, b, X, Y, num_iterations=2000, learning_rate=0.005, print_cost=False):
    costs = []
    for i in range(num_iterations):
        grads, cost = propagate(w, b, X, Y)
        dw = grads["dw"]
        db = grads["db"]
        w = w - learning_rate * dw
        b = b - learning_rate * db
        if i % 100 == 0:
            costs.append(cost)
            if print_cost:
                print(f"Cost after iteration {i}: {cost:.6f}")
    return {"w": w, "b": b}, grads, costs

def predict(w, b, X):
    A = sigmoid(np.dot(w.T, X) + b)
    return (A > 0.5).astype(int)

def model(X_train, Y_train, X_test, Y_test, num_iterations=2000, learning_rate=0.005, print_cost=False):
    w, b = initialize_with_zeros(X_train.shape[0])
    params, grads, costs = optimize(w, b, X_train, Y_train, num_iterations, learning_rate, print_cost)
    w = params["w"]
    b = params["b"]
    Y_prediction_train = predict(w, b, X_train)
    Y_prediction_test = predict(w, b, X_test)
    print(f"Train accuracy: {100 - np.mean(np.abs(Y_prediction_train - Y_train)) * 100:.2f}%")
    print(f"Test accuracy: {100 - np.mean(np.abs(Y_prediction_test - Y_test)) * 100:.2f}%")
    return {
        "costs": costs,
        "Y_prediction_train": Y_prediction_train,
        "Y_prediction_test": Y_prediction_test,
        "w": w,
        "b": b,
        "learning_rate": learning_rate,
        "num_iterations": num_iterations
    }