ViT
ViT
Overview
From scratch implementation of ViT
Key Features
- Transformer Architecture
Technical Details
- Framework: PyTorch
- Dataset: Custom
- Category: Computer Vision
Implementation Details
Implmented a ViT Architecture from Scratch using Pytorch on a subset of Food-101 dataset.
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Dataset Information
Dataset (Train): Subset of Food101 (3 classes-255 images total) Dataset (Test): Subset of Food101 (3 classes-75 images total)
Frameworks
Pytorch
Results
Training loss: 1.20
Test loss: 1.52
Authors
Source Code
๐ GitHub Repository: ViT
View the complete implementation, training scripts, and documentation on GitHub.