Running AI Models Locally in Your Browser

Local Mode lets you run AI models entirely in your browser using WebGPU. Your conversations never leave your device - true privacy with zero server dependency.

Prerequisites

A modern browser with WebGPU support (Chrome 113+, Edge 113+, Opera 99+)
A GPU with sufficient VRAM for your chosen model
Enough storage space for model downloads (500MB - 16GB depending on model)

How It Works

Local Mode uses WebLLM to run machine learning models directly in your browser via WebGPU acceleration. Models are downloaded once and cached in your browser’s IndexedDB for instant loading on future visits.

Benefits:

Complete privacy - conversations never leave your device
Works offline after initial model download
No API costs or rate limits
No account required

Limitations:

Requires a capable GPU
Model quality varies (smaller models = less capable)
No image input support (text only)
Initial download can take several minutes

Getting Started

Step 1: Open Local Mode Setup

Click the Settings icon (gear) in the bottom left to open the sidebar, then click Local Mode in the Tool Belt section.

Step 2: Check WebGPU Support

The modal will automatically check if your browser and GPU support WebGPU. You’ll see either:

Green checkmark: WebGPU supported, showing your GPU name
Red X: WebGPU not supported - try updating your browser or enabling GPU acceleration

Step 3: Choose a Model

Select a model from the dropdown based on your GPU’s VRAM:

VRAM Available	Recommended Models
2GB or less	SmolLM2 135M/360M, Qwen 2.5 0.5B
4GB	TinyLlama 1.1B, Qwen3 1.7B, Phi 3 Mini
6GB	Llama 3.2 3B, Qwen 2.5 3B
8GB+	Llama 3.1 8B, Mistral 7B, Qwen3 8B
12GB+	CodeLlama 13B, Gemma 3 12B
16GB+	Gemma 2 27B

Step 4: Download the Model

Click Download & Enable. The progress bar will show:

Downloading model weights from HuggingFace
Compiling shaders for your GPU
Loading into memory

First download takes 1-10 minutes depending on model size and internet speed. Subsequent loads are much faster (cached locally).

Step 5: Start Chatting

Once downloaded, a Cloud/Local toggle appears next to the Model header in the sidebar. Click Local to switch to local mode, then select your model from the dropdown.

The toggle remembers your preference between sessions.

Switching Between Local and Cloud

After downloading at least one local model, you’ll see a toggle switch in the sidebar:

Cloud (default) - Uses cloud models via API (requires account for paid models)
Local - Uses your downloaded models (runs entirely in browser)

Click the toggle to switch modes. The appropriate model dropdown appears based on your selection.

When in Local mode:

Your selected local model is shown in the input placeholder
Responses are labeled with the model name + “(Local)”
All processing happens on your device

Available Model Categories

General Purpose

Llama 3.x - Meta’s open models, great all-rounders
Qwen 2.5/3 - Strong multilingual support
Mistral 7B - Efficient and capable
Gemma 2 - Google’s open models

Coding

Qwen 2.5 Coder - Optimized for code generation
CodeLlama - Meta’s code-specialized Llama

Math & Reasoning

Qwen 2.5 Math - Mathematical problem solving
WizardMath - Step-by-step math reasoning
DeepSeek R1 - Chain-of-thought reasoning models

Special Purpose

Phi 3.5 Vision - Can analyze images (experimental)
Gorilla OpenFunctions - Function/tool calling

Using Custom Models

Any model from the MLC-AI HuggingFace collection works with Local Mode.

To use a custom model:

Select “Custom Model ID…” from the dropdown
Enter the model ID exactly as it appears on HuggingFace
- Example: Mixtral-8x7B-Instruct-v0.1-q4f16_1-MLC
Click Download & Enable

Model IDs follow the format: ModelName-Size-Variant-Quantization-MLC

Troubleshooting

“WebGPU not supported”

Update your browser to the latest version
Enable hardware acceleration in browser settings
Try Chrome or Edge if using another browser

“Insufficient GPU memory”

Try a smaller model
Close other GPU-intensive applications
Check if your GPU meets minimum requirements

Model loads slowly

First load downloads from HuggingFace (1-10 min)
Subsequent loads use cached data (much faster)
Ensure stable internet for initial download

Poor response quality

Smaller models have limited capabilities
Try a larger model if your GPU supports it
Local models work best for straightforward tasks

Privacy Note

When using Local Mode:

Model weights are downloaded from HuggingFace CDN
All inference runs locally in your browser
No conversation data is sent to any server
Models are cached in browser storage (IndexedDB)

To fully clear local model data, clear your browser’s site data for Obscurify.ai.