Supported Models

TensorSharp loads models in GGUF format and auto-detects the architecture from the file's general.architecture metadata. Pick a quantization that fits your hardware (Q4_K_M for low memory, Q8_0 for higher quality).

Supported architectures

ArchitectureGGUF arch keysExample modelsMultimodalThinkingToolsMTP spec
Gemma 4gemma4gemma-4-E4B, 31B, 26B-A4B (MoE)Image, Video, AudioYesYesYes (separate draft)
Gemma 3gemma3gemma-3-4bImageNoNo
Qwen 3qwen3Qwen3-4BText onlyYesYes
Qwen 3.5 / 3.6qwen35, qwen35moe, qwen3nextQwen3.5-9B, Qwen3.5/3.6-35B-A3B (MoE)ImageYesYesYes on 3.6 (embedded NextN)
GPT OSSgptoss, gpt-ossgpt-oss-20b (MoE)Text onlyYes (always)Yes
Nemotron-Hnemotron_h, nemotron_h_moeNemotron-H-8B, 47B, Nemotron 3 Nano OmniImage (Omni)YesYes
Mistral 3mistral3Mistral-Small-3.1-24B-InstructImageNoNo
DiffusionGemmadiffusion-gemma, diffusion_gemmadiffusion-gemma text-diffusion GGUFsText onlyNoNo

Detailed per-model architecture cards (forward graph, components, parameters, and how TensorSharp optimizes prefill/decode) live under docs/models/ in the repository.

Model downloads (GGUF)

ArchitectureModelDownload
Gemma 4gemma-4-E4B-itggml-org/gemma-4-E4B-it-GGUF
Gemma 4gemma-4-31B-itggml-org/gemma-4-31B-it-GGUF
Gemma 4gemma-4-26B-A4B-it (MoE)ggml-org/gemma-4-26B-A4B-it-GGUF
Gemma 3gemma-3-4b-itgoogle/gemma-3-4b-it-qat-q4_0-gguf
Qwen 3Qwen3-4BQwen/Qwen3-4B-GGUF
Qwen 3.5 / 3.6Qwen3.5-9Bunsloth/Qwen3.5-9B-GGUF
Qwen 3.5 / 3.6Qwen3.5-35B-A3B (MoE)ggml-org/Qwen3.5-35B-A3B-GGUF
GPT OSSgpt-oss-20b (MoE)ggml-org/gpt-oss-20b-GGUF
Nemotron-HNemotron-H-8B-Reasoning-128Kbartowski/nvidia_Nemotron-H-8B-…
Nemotron-HNemotron-H-47B-Reasoning-128Kbartowski/nvidia_Nemotron-H-47B-…
Mistral 3Mistral-Small-3.1-24B-Instructbartowski/Mistral-Small-3.1-24B-…
🧩

Multimodal models need a projector (mmproj) file. The Gemma 4 / Gemma 3 / Qwen 3.5 / Mistral 3 projectors ship in or alongside the repos above; place the projector next to the model with a recognized name for auto-loading, or pass it explicitly with --mmproj.

Multimodal support

FamilyInputsNotes
Gemma 4Image · Video · AudioImages PNG/JPEG/HEIC; Video MP4 (1 fps via OpenCV); Audio WAV 16 kHz mono / MP3 / OGG. Projector: gemma-4-mmproj-F16.gguf.
Gemma 3ImagePNG / JPEG / HEIC. Projector: mmproj-gemma3-4b-f16.gguf.
Qwen 3.5 / 3.6ImageDynamic-resolution vision encoder. Projector: Qwen3.5-mmproj-F16.gguf.
Mistral 3ImagePixtral vision encoder. Projector: mistral3-mmproj.gguf.
Nemotron-H (Omni)ImageRADIO / v2_vl ViT encoder. Pass the matching --mmproj; image tokens expand at <image> placeholders.

Send images/audio/video via the CLI (--image, --video, --audio), the Web UI uploads, or the HTTP API (base64 images array for Ollama, image_url data URI for OpenAI).

Thinking / reasoning mode

Thinking-capable models (Qwen 3, Qwen 3.5/3.6, Gemma 4, GPT OSS, Nemotron-H) produce structured chain-of-thought before the final answer. The thinking content is separated from the visible response so the client can show or hide it.

Enable it via --think (CLI), "think": true (Ollama API / Web UI), or the thinking toggle in the browser. Responses expose the reasoning separately — e.g. message.thinking in the Ollama chat response.

Tool calling / function calling

Models can invoke user-defined tools and participate in multi-turn tool-call conversations. Define tools as JSON and pass them via --tools (CLI) or the tools parameter (API). Each architecture uses its own wire format, but the output parser extracts calls into structured tool_calls regardless:

See Tool calling over HTTP for a complete request/response example and the continuation loop.