CryptoTalks.ai is now Obscurify.ai. Sign in with your same username, password, and tokens.

Running AI Models Locally in Your Browser

Local Mode lets you run AI models entirely in your browser using WebGPU. Your conversations never leave your device - true privacy with zero server dependency.

Prerequisites

  • A modern browser with WebGPU support (Chrome 113+, Edge 113+, Opera 99+)
  • A GPU with sufficient VRAM for your chosen model
  • Enough storage space for model downloads (500MB - 16GB depending on model)

How It Works

Local Mode uses WebLLM to run machine learning models directly in your browser via WebGPU acceleration. Models are downloaded once and cached in your browser’s IndexedDB for instant loading on future visits.

Benefits:

  • Complete privacy - conversations never leave your device
  • Works offline after initial model download
  • No API costs or rate limits
  • No account required

Limitations:

  • Requires a capable GPU
  • Model quality varies (smaller models = less capable)
  • No image input support (text only)
  • Initial download can take several minutes

Getting Started

Step 1: Open Local Mode Setup

Click the Settings icon (gear) in the bottom left to open the sidebar, then click Local Mode in the Tool Belt section.

Step 2: Check WebGPU Support

The modal will automatically check if your browser and GPU support WebGPU. You’ll see either:

  • Green checkmark: WebGPU supported, showing your GPU name
  • Red X: WebGPU not supported - try updating your browser or enabling GPU acceleration

Step 3: Choose a Model

Select a model from the dropdown based on your GPU’s VRAM:

VRAM AvailableRecommended Models
2GB or lessSmolLM2 135M/360M, Qwen 2.5 0.5B
4GBTinyLlama 1.1B, Qwen3 1.7B, Phi 3 Mini
6GBLlama 3.2 3B, Qwen 2.5 3B
8GB+Llama 3.1 8B, Mistral 7B, Qwen3 8B
12GB+CodeLlama 13B, Gemma 3 12B
16GB+Gemma 2 27B

Step 4: Download the Model

Click Download & Enable. The progress bar will show:

  1. Downloading model weights from HuggingFace
  2. Compiling shaders for your GPU
  3. Loading into memory

First download takes 1-10 minutes depending on model size and internet speed. Subsequent loads are much faster (cached locally).

Step 5: Start Chatting

Once downloaded, a Cloud/Local toggle appears next to the Model header in the sidebar. Click Local to switch to local mode, then select your model from the dropdown.

The toggle remembers your preference between sessions.

Switching Between Local and Cloud

After downloading at least one local model, you’ll see a toggle switch in the sidebar:

  • Cloud (default) - Uses cloud models via API (requires account for paid models)
  • Local - Uses your downloaded models (runs entirely in browser)

Click the toggle to switch modes. The appropriate model dropdown appears based on your selection.

When in Local mode:

  • Your selected local model is shown in the input placeholder
  • Responses are labeled with the model name + “(Local)”
  • All processing happens on your device

Available Model Categories

General Purpose

  • Llama 3.x - Meta’s open models, great all-rounders
  • Qwen 2.5/3 - Strong multilingual support
  • Mistral 7B - Efficient and capable
  • Gemma 2 - Google’s open models

Coding

  • Qwen 2.5 Coder - Optimized for code generation
  • CodeLlama - Meta’s code-specialized Llama

Math & Reasoning

  • Qwen 2.5 Math - Mathematical problem solving
  • WizardMath - Step-by-step math reasoning
  • DeepSeek R1 - Chain-of-thought reasoning models

Special Purpose

  • Phi 3.5 Vision - Can analyze images (experimental)
  • Gorilla OpenFunctions - Function/tool calling

Using Custom Models

Any model from the MLC-AI HuggingFace collection works with Local Mode.

To use a custom model:

  1. Select “Custom Model ID…” from the dropdown
  2. Enter the model ID exactly as it appears on HuggingFace
    • Example: Mixtral-8x7B-Instruct-v0.1-q4f16_1-MLC
  3. Click Download & Enable

Model IDs follow the format: ModelName-Size-Variant-Quantization-MLC

Troubleshooting

“WebGPU not supported”

  • Update your browser to the latest version
  • Enable hardware acceleration in browser settings
  • Try Chrome or Edge if using another browser

“Insufficient GPU memory”

  • Try a smaller model
  • Close other GPU-intensive applications
  • Check if your GPU meets minimum requirements

Model loads slowly

  • First load downloads from HuggingFace (1-10 min)
  • Subsequent loads use cached data (much faster)
  • Ensure stable internet for initial download

Poor response quality

  • Smaller models have limited capabilities
  • Try a larger model if your GPU supports it
  • Local models work best for straightforward tasks

Privacy Note

When using Local Mode:

  • Model weights are downloaded from HuggingFace CDN
  • All inference runs locally in your browser
  • No conversation data is sent to any server
  • Models are cached in browser storage (IndexedDB)

To fully clear local model data, clear your browser’s site data for Obscurify.ai.