Candy Type Classifier using Computer Vision
Convolutional neural network that identifies and classifies different types of candy using image recognition and deep learning.
Overview
This project applies convolutional neural networks to the task of candy classification, combining computer vision with practical food recognition applications. The system can identify different candy types from smartphone photos, making it useful for dietary tracking, allergen detection, or inventory management.
The challenge lies in handling visual variations caused by lighting conditions, angles, packaging variations, and candy shapes. The solution leverages transfer learning and extensive data augmentation to achieve robust classification performance.
Software Architecture
The dataset consists of 2,000+ images across 10 candy categories, collected through web scraping and manual photography. Images were standardized to 224x224 pixels with normalization applied. Data augmentation techniques (rotation, flipping, brightness adjustment, zoom) expanded the effective training set to 10,000+ images.
The model architecture uses transfer learning with MobileNetV2 as the base, pre-trained on ImageNet. The final layers were replaced with custom dense layers (256 neurons) with dropout regularization to prevent overfitting. The model uses categorical cross-entropy loss with Adam optimizer.
Training employed early stopping and learning rate reduction callbacks to optimize convergence. The model was trained for 50 epochs with batch size of 32, using GPU acceleration for efficiency.
The inference pipeline preprocesses input images through the same normalization pipeline, performs prediction, and outputs the top-3 most likely candy types with confidence scores. OpenCV handles image preprocessing and augmentation.
Results & Achievements
The trained model achieved 92% accuracy on the validation set and 89% accuracy on real-world test images taken with various smartphones. Per-class accuracy ranged from 85% (for visually similar candies) to 97% (for distinctive shapes/colors).
Confusion matrix analysis revealed that errors primarily occurred between candy types with similar packaging colors (e.g., red wrappers). The model demonstrated good generalization to candy orientations and lighting conditions not seen during training.
Processing time averages 150ms per image on CPU, making it suitable for real-time mobile applications. The model size (23MB) is optimized for deployment on resource-constrained devices. Future improvements include expanding to 30+ candy types and implementing fine-grained classification for flavors within brands.