catvsdog.live - Coming Soon

Model Training Code

"""
================================================================================
 GUIDE: How to Use This Script
================================================================================
1. Python Installation:
   - Ensure you have Python 3.9 or newer installed.
   - Python 3.9+ is recommended due to better compatibility with the latest PyTorch.

2. CUDA Installation (GPU Support):
   - If you plan to train on a GPU, install CUDA Toolkit (version compatible with your PyTorch installation).
   - For GPU usage, confirm that PyTorch has been installed with CUDA support (e.g., pip install torch==2.0.0+cu118 -f https://download.pytorch.org/whl/cu118).

3. Required Packages:
   - fastai >= 2.7 (pip install fastai)
   - torch >= 2.0 (pip install torch)
   - torchvision >= 0.15 (pip install torchvision)
   - scikit-learn (pip install scikit-learn)
   - seaborn (pip install seaborn)
   - matplotlib (pip install matplotlib)
   - numpy (pip install numpy)

4. Directory and File Requirements:
   - root_dir (default: ''): Must contain one subdirectory per class (e.g., class_1, class_2, etc.) with images inside.
   - class_weights_file (default: 'classweights.txt'): Must be a text file listing class names and their counts, one pair per line, in the format:
        class_name:count
     This file is used to calculate weights for each class to handle class imbalance.

5. Usage:
   - Update the variables `root_dir`, `save_dir`, and `class_weights_file` as needed.
   - Run this script from the command line or any Python IDE:
        python train_all.py
   - The script trains multiple ResNet models (resnet50, resnet34, resnet18), saves confusion matrices, and exports the trained models.

6. Outputs:
   - Checkpoints and final model export (.pkl) files are stored in `save_dir` (default: '').
   - Confusion matrix images for each model are also stored in `save_dir`.
   - Console output will report class distributions, training process, and final metrics.

================================================================================
Sources and References (Harvard style):
- fastai documentation (2023) Available at: https://docs.fast.ai/ (Accessed: 06 April 2025).
- PyTorch documentation (2023) Available at: https://pytorch.org/docs/stable/index.html (Accessed: 06 April 2025).
Source Critique:
The fastai and PyTorch official documentation are reliable sources maintained by the core development teams. 
They provide up-to-date information and best practices for building deep learning models.
================================================================================
"""

import os
import random
from fastai.vision.all import *
from torchvision.models import resnet50, resnet34, resnet18
from torchvision.models import ResNet50_Weights, ResNet34_Weights, ResNet18_Weights
import torch
from fastai.callback.tracker import EarlyStoppingCallback, SaveModelCallback
import numpy as np
from sklearn.metrics import confusion_matrix
import seaborn as sns
import matplotlib.pyplot as plt
from fastai.vision.core import PILImage
import warnings
import traceback
from pathlib import Path

# --------------------------------------------------------------------------------
# Suppress specific warnings to keep output clean
# --------------------------------------------------------------------------------
warnings.filterwarnings("ignore", category=UserWarning)
warnings.filterwarnings("ignore", message=".*does not have many workers.*")

# --------------------------------------------------------------------------------
# Function: set_seed
# Description: Sets random seeds for Python, NumPy, and PyTorch to ensure
#              reproducible training results. Also configures CUDA for
#              deterministic behavior.
# --------------------------------------------------------------------------------
def set_seed(seed=42):
    """
    Sets random seeds for all relevant libraries and ensures deterministic CUDA.

    Args:
        seed (int): The seed to use for RNGs.
    """
    random.seed(seed)                   # Python random
    np.random.seed(seed)                # NumPy
    torch.manual_seed(seed)             # PyTorch CPU
    torch.cuda.manual_seed(seed)        # PyTorch current GPU
    torch.cuda.manual_seed_all(seed)    # PyTorch all GPUs if multi-GPU
    torch.backends.cudnn.deterministic = True
    torch.backends.cudnn.benchmark = False

# Call the set_seed function to initialize everything at the start
set_seed(42)

# --------------------------------------------------------------------------------
# Function: calculate_class_weights
# Description: Reads class and count data from a file, computes weights to handle
#              class imbalance, and returns a weight tensor aligned with the
#              FastAI DataLoaders' vocabulary (dls.vocab).
# --------------------------------------------------------------------------------
def calculate_class_weights(class_weights_file, dls):
    """
    Reads class counts from a file and calculates class weights.

    Args:
        class_weights_file (str): Path to the class weights file.
        dls (DataLoaders): FastAI DataLoaders object.

    Returns:
        torch.Tensor: Tensor of class weights aligned with dls.vocab.
    """
    counts = {}
    # Read the class:count pairs line by line
    with open(class_weights_file, 'r') as f:
        for line in f:
            parts = line.strip().split(':')
            if len(parts) != 2:
                print(f"Skipping invalid line in class_weights_file: {line}")
                continue
            class_name, count = parts
            # Convert count to integer, default to zero if invalid
            try:
                counts[class_name] = int(count)
            except ValueError:
                print(f"Invalid count for class '{class_name}': {count}")
                counts[class_name] = 0

    # Sum up the counts for all classes
    total = sum(counts.values())
    if total == 0:
        # If no valid counts, assign uniform weights
        print("Total count of classes is zero. Assigning uniform weights.")
        weights = {cls: 1.0 for cls in dls.vocab}
    else:
        # The weight for each class is (total / individual class count),
        # providing heavier weights to classes with fewer samples
        weights = {
            cls: total / count if count > 0 else 1.0
            for cls, count in counts.items()
        }

    # Make sure every class in the vocabulary has at least some weight
    for cls in dls.vocab:
        if cls not in weights:
            weights[cls] = 1.0

    # Convert to a list in the correct order matching dls.vocab
    weight_list = [weights.get(cls_name, 1.0) for cls_name in dls.vocab]

    # Return the tensor on the same device as the DataLoaders
    return torch.tensor(weight_list).float().to(dls.device)

# --------------------------------------------------------------------------------
# Function: save_confusion_matrix
# Description: Generates a confusion matrix for the validation set and saves
#              it as an image file. The color heatmap helps visualize class
#              performance.
# --------------------------------------------------------------------------------
def save_confusion_matrix(learn, dls, save_path='confusion_matrix.png'):
    """
    Computes and saves the confusion matrix as an image.

    Args:
        learn (Learner): Trained FastAI Learner object.
        dls (DataLoaders): FastAI DataLoaders object.
        save_path (str): Path to save the confusion matrix image.
    """
    # Get predictions and targets from the validation DataLoader
    preds, targets = learn.get_preds(dl=dls.valid)
    preds_labels = preds.argmax(dim=1)
    
    # Build the confusion matrix
    cm = confusion_matrix(targets, preds_labels)

    # Plot the confusion matrix with seaborn
    plt.figure(figsize=(8, 6))
    sns.heatmap(cm, annot=True, fmt='d', xticklabels=dls.vocab, yticklabels=dls.vocab, cmap='Blues')
    plt.xlabel('Predicted Label')
    plt.ylabel('Actual Label')
    plt.title('Confusion Matrix')
    plt.savefig(save_path)
    plt.close()

# --------------------------------------------------------------------------------
# Define key paths for data and outputs
# --------------------------------------------------------------------------------
root_dir = ''     # Path where images are stored, each class in its subdirectory
save_dir = ''                     # Directory to save models, confusion matrices, etc.
class_weights_file = 'classweights.txt'  # File with class:count pairs

# Ensure the save_dir exists or create it
os.makedirs(save_dir, exist_ok=True)

# --------------------------------------------------------------------------------
# Verify that the dataset path exists and check for subdirectories (classes)
# --------------------------------------------------------------------------------
path = Path(root_dir)
if not path.exists():
    raise FileNotFoundError(f"Root directory does not exist: {root_dir}")

# Collect all class folder names (each subdirectory under root_dir)
classes = [cls.name for cls in path.iterdir() if cls.is_dir()]
if not classes:
    raise ValueError(f"No class subdirectories found in: {root_dir}")

print(f"Classes found: {classes}")

# For each class, check the number of images and warn if any class is empty
for cls in classes:
    num_images = len(list((path / cls).glob('*.*')))
    print(f"Class '{cls}' has {num_images} images.")
    if num_images == 0:
        print(f"Warning: Class '{cls}' has no images.")

# --------------------------------------------------------------------------------
# Define ImageNet normalization statistics
# --------------------------------------------------------------------------------
imagenet_stats = ([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])

# --------------------------------------------------------------------------------
# Create the FastAI DataLoaders object
#   - Splits data into training (80%) and validation (20%).
#   - Applies data augmentation and normalization.
# --------------------------------------------------------------------------------
try:
    dls = ImageDataLoaders.from_folder(
        root_dir,
        valid_pct=0.20,               # 20% validation split
        item_tfms=[Resize(460)],      # Initial resize for all images
        batch_tfms=[
            # Standard augmentations
            *aug_transforms(
                size=224,
                min_scale=0.8,       # scale down to 80% of original size
                flip_vert=False,     # vertical flips disabled
                max_rotate=10,       # rotation up to 10 degrees
                max_zoom=1.1,        # zoom in up to 10%
                max_lighting=0.2,    # lighting changes up to 20%
                max_warp=0.2,        # warping transformations up to 20%
                p_affine=0.75,       # apply affine transforms 75% of the time
                p_lighting=0.75      # apply lighting changes 75% of the time
            ),
            Normalize.from_stats(*imagenet_stats)  # ImageNet normalization
        ],
        bs=32,                        # Batch size
        num_workers=0                 # Worker processes (tune based on your CPU/GPU)
    )
except Exception as e:
    print(f"Failed to create DataLoaders: {e}")
    traceback.print_exc()
    exit(1)

# Print basic info about the created DataLoaders
print(f"Number of training batches: {len(dls.train)}")
print(f"Number of validation batches: {len(dls.valid)}")
print(f"Classes in DataLoaders: {dls.vocab}")

if len(dls.train) == 0:
    print("Warning: Training DataLoader is empty.")
if len(dls.valid) == 0:
    print("Warning: Validation DataLoader is empty.")

# --------------------------------------------------------------------------------
# Validate the existence of the class_weights_file and compare it against dls.vocab
# --------------------------------------------------------------------------------
if not os.path.isfile(class_weights_file):
    raise FileNotFoundError(f"Class weights file not found: {class_weights_file}")

with open(class_weights_file, 'r') as f:
    file_classes = [line.strip().split(':')[0] for line in f if ':' in line]

print(f"Classes in class_weights_file: {file_classes}")

# Check for discrepancies between what's in the DataLoaders and the file
missing_in_file = set(dls.vocab) - set(file_classes)
missing_in_dls = set(file_classes) - set(dls.vocab)

if missing_in_file:
    print(f"Classes missing in class_weights_file: {missing_in_file}")
if missing_in_dls:
    print(f"Classes missing in DataLoaders: {missing_in_dls}")

# --------------------------------------------------------------------------------
# Calculate the class weights using the function defined earlier
# --------------------------------------------------------------------------------
try:
    class_weights = calculate_class_weights(class_weights_file, dls)
    print(f"Class weights: {class_weights}")
except Exception as e:
    print(f"Failed to calculate class weights: {e}")
    traceback.print_exc()
    exit(1)

# --------------------------------------------------------------------------------
# Import CrossEntropyLossFlat from fastai.losses for label smoothing capabilities
# --------------------------------------------------------------------------------
from fastai.losses import CrossEntropyLossFlat

# --------------------------------------------------------------------------------
# Define the list of models to train, including:
#   - Model name (ResNet variant)
#   - Model function (resnet50, resnet34, resnet18)
#   - Weights (pretrained)
#   - Learning rate (lr_max)
#   - Cut for the model body
#   - Dropout probability
#   - Weight decay
# --------------------------------------------------------------------------------
models_info = [
    {
        "name": "resnet50",
        "model_fn": resnet50,
        "weights": ResNet50_Weights.DEFAULT,
        "lr_max": 0.0001,
        "cut": -2, 
        "dropout_p": 0.3,
        "wd": 0.05
    },
    {
        "name": "resnet34",
        "model_fn": resnet34,
        "weights": ResNet34_Weights.DEFAULT,
        "lr_max": 0.0001,
        "cut": -2,  
        "dropout_p": 0.1,
        "wd": 0.025
    },
    {
        "name": "resnet18",
        "model_fn": resnet18,
        "weights": ResNet18_Weights.DEFAULT,
        "lr_max": 0.0001,
        "cut": -2,
        "dropout_p": 0.15,
        "wd": 0.01
    }
]

# --------------------------------------------------------------------------------
# Training Loop: For each model, build and train using a staged unfreezing approach
# --------------------------------------------------------------------------------
for model_info in models_info:
    model_name = model_info["name"]
    print(f"\nStarting training for model: {model_name}")

    try:
        # Load the pretrained backbone
        pretrained_model = model_info["model_fn"](weights=model_info["weights"])

        # Create the body of the model
        cut = model_info["cut"]
        body = create_body(pretrained_model, cut=cut)  # create_body from fastai

        # num_features_model determines the output feature size of the body
        nf = num_features_model(body)

        # Create a custom head to place on top of the model body
        dropout_p = model_info["dropout_p"]
        n_out = dls.c  # Number of classes as determined by dls

        head = nn.Sequential(
            nn.AdaptiveAvgPool2d((1, 1)),  # Pool features to 1x1
            nn.Flatten(),                 # Flatten for fully-connected layers
            nn.BatchNorm1d(nf),           # Normalize for stable training
            nn.Dropout(p=dropout_p),
            nn.Linear(nf, 512),
            nn.ReLU(),
            nn.BatchNorm1d(512),
            nn.Dropout(p=dropout_p),
            nn.Linear(512, 256),
            nn.ReLU(),
            nn.BatchNorm1d(256),
            nn.Dropout(p=dropout_p),
            nn.Linear(256, n_out)
        )

        # Combine body + head into one sequential model
        model = nn.Sequential(body, head)

        # Create a FastAI Learner object
        learn = Learner(
            dls,
            model,
            loss_func=CrossEntropyLossFlat(weight=class_weights, label_smoothing=0),
            metrics=[
                accuracy,
                Precision(average='macro'),
                Recall(average='macro'),
                F1Score(average='macro')
            ],
            wd=model_info["wd"],  # Weight decay for regularization
            path=save_dir         # Directory to save logs and models
        )

        # Define callbacks for model saving and early stopping
        callbacks = [
            SaveModelCallback(
                monitor='valid_loss',            # Metric to monitor
                fname=f'best_model_{model_name}', # Filename for best model
                comp=np.less,                     # Condition for "improvement"
                reset_on_fit=False
            ),
            EarlyStoppingCallback(
                monitor='valid_loss',  # Use validation loss for early stopping
                patience=4             # Stop if no improvement for 4 epochs
            )
        ]

        # Training configurations
        lr_max = model_info["lr_max"]

        # ----------------------------------------------------------
        # 1) Train only the newly added head
        # ----------------------------------------------------------
        learn.freeze()  # Freeze backbone
        print(f"Training the head for {model_name} with lr_max={lr_max}")
        learn.fit_one_cycle(
            20,               # Number of epochs
            lr_max=lr_max,    # Maximum learning rate
            cbs=callbacks     # Callbacks
        )

        # ----------------------------------------------------------
        # 2) Unfreeze last two layers and continue training
        # ----------------------------------------------------------
        learn.freeze_to(-2)
        print(f"Fine-tuning last two layers for {model_name}")
        learn.fit_one_cycle(
            20,
            slice(lr_max/5, lr_max),  # Gradual learning rate from lower to lr_max
            cbs=callbacks
        )

        # ----------------------------------------------------------
        # 3) Unfreeze last three layers and continue training
        # ----------------------------------------------------------
        learn.freeze_to(-3)
        print(f"Fine-tuning last three layers for {model_name}")
        learn.fit_one_cycle(
            20,
            slice(lr_max/100, lr_max/10),
            cbs=callbacks
        )

        # ----------------------------------------------------------
        # 4) Unfreeze all layers and train
        # ----------------------------------------------------------
        learn.unfreeze()
        print(f"Fine-tuning all layers for {model_name}")
        learn.fit_one_cycle(
            20,
            slice(lr_max/1000, lr_max/10),
            cbs=callbacks
        )

        # ----------------------------------------------------------
        # Load the best model saved by SaveModelCallback
        # ----------------------------------------------------------
        try:
            learn.load(f'best_model_{model_name}')
            print(f"Loaded best model for {model_name}")
        except FileNotFoundError:
            print(f"Best model for {model_name} not found. Skipping load step.")

        # ----------------------------------------------------------
        # Validate the best model and print metrics
        # ----------------------------------------------------------
        val_metrics = learn.validate()
        print(f'Best Model Validation Metrics for {model_name}:')
        print(f' - Loss: {val_metrics[0]:.4f}')
        print(f' - Accuracy: {val_metrics[1] * 100:.2f}%')
        print(f' - Precision: {val_metrics[2]:.4f}')
        print(f' - Recall: {val_metrics[3]:.4f}')
        print(f' - F1 Score: {val_metrics[4]:.4f}')

        # ----------------------------------------------------------
        # Save confusion matrix for the validation set
        # ----------------------------------------------------------
        confusion_matrix_path = os.path.join(save_dir, f'{model_name}_confusion_matrix.png')
        try:
            save_confusion_matrix(learn, dls, save_path=confusion_matrix_path)
            print(f"Confusion matrix saved at: {confusion_matrix_path}")
        except Exception as e:
            print(f"Failed to save confusion matrix for {model_name}: {e}")
            traceback.print_exc()

        # ----------------------------------------------------------
        # Export the trained model in a .pkl format
        # ----------------------------------------------------------
        model_save_filename = f'{model_name}_model.pkl'
        try:
            learn.export(os.path.join(save_dir, model_save_filename))
            print(f"Model exported to: {os.path.join(save_dir, model_save_filename)}")
        except Exception as e:
            print(f"Failed to export model for {model_name}: {e}")
            traceback.print_exc()

    except RuntimeError as e:
        # Check specifically for CUDA out of memory errors
        if 'out of memory' in str(e):
            print(f"Out of memory error with model {model_name}: {e}")
            print("Consider reducing the batch size or image size.")
        else:
            print(f"Runtime error with model {model_name}: {e}")
        traceback.print_exc()
    except Exception as e:
        print(f"An error occurred while training model {model_name}: {e}")
        traceback.print_exc()
Pawsitively Coming Soon

CAT

DOG

About catvsdog.live

Key Features

AI Classification

VADER Sentiment Gauge

Meme Token Game

Token Name Generator

Trending Words Tally

Real-time Token Stream

Connect With Us

Model Training Code