ViT

ViT

Category: Computer Vision
Framework: PyTorch
Dataset: Custom
Created: June 20, 2024

Overview

From scratch implementation of ViT

Key Features

  • Transformer Architecture

Technical Details

  • Framework: PyTorch
  • Dataset: Custom
  • Category: Computer Vision

Implementation Details

Implmented a ViT Architecture from Scratch using Pytorch on a subset of Food-101 dataset.

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

Dataset Information

Dataset (Train): Subset of Food101 (3 classes-255 images total) Dataset (Test): Subset of Food101 (3 classes-75 images total)

Frameworks

Pytorch

Results

Training loss: 1.20
Test loss: 1.52

Authors

Source Code

๐Ÿ“ GitHub Repository: ViT

View the complete implementation, training scripts, and documentation on GitHub.