Supported Models
TensorSharp loads models in GGUF format and auto-detects the architecture from the file's general.architecture metadata. Pick a quantization that fits your hardware (Q4_K_M for low memory, Q8_0 for higher quality).
Supported architectures
| Architecture | GGUF arch keys | Example models | Multimodal | Thinking | Tools | MTP spec |
|---|---|---|---|---|---|---|
| Gemma 4 | gemma4 | gemma-4-E4B, 31B, 26B-A4B (MoE) | Image, Video, Audio | Yes | Yes | Yes (separate draft) |
| Gemma 3 | gemma3 | gemma-3-4b | Image | No | No | — |
| Qwen 3 | qwen3 | Qwen3-4B | Text only | Yes | Yes | — |
| Qwen 3.5 / 3.6 | qwen35, qwen35moe, qwen3next | Qwen3.5-9B, Qwen3.5/3.6-35B-A3B (MoE) | Image | Yes | Yes | Yes on 3.6 (embedded NextN) |
| GPT OSS | gptoss, gpt-oss | gpt-oss-20b (MoE) | Text only | Yes (always) | Yes | — |
| Nemotron-H | nemotron_h, nemotron_h_moe | Nemotron-H-8B, 47B, Nemotron 3 Nano Omni | Image (Omni) | Yes | Yes | — |
| Mistral 3 | mistral3 | Mistral-Small-3.1-24B-Instruct | Image | No | No | — |
| DiffusionGemma | diffusion-gemma, diffusion_gemma | diffusion-gemma text-diffusion GGUFs | Text only | No | No | — |
Detailed per-model architecture cards (forward graph, components, parameters, and how TensorSharp optimizes prefill/decode) live under docs/models/ in the repository.
Model downloads (GGUF)
| Architecture | Model | Download |
|---|---|---|
| Gemma 4 | gemma-4-E4B-it | ggml-org/gemma-4-E4B-it-GGUF |
| Gemma 4 | gemma-4-31B-it | ggml-org/gemma-4-31B-it-GGUF |
| Gemma 4 | gemma-4-26B-A4B-it (MoE) | ggml-org/gemma-4-26B-A4B-it-GGUF |
| Gemma 3 | gemma-3-4b-it | google/gemma-3-4b-it-qat-q4_0-gguf |
| Qwen 3 | Qwen3-4B | Qwen/Qwen3-4B-GGUF |
| Qwen 3.5 / 3.6 | Qwen3.5-9B | unsloth/Qwen3.5-9B-GGUF |
| Qwen 3.5 / 3.6 | Qwen3.5-35B-A3B (MoE) | ggml-org/Qwen3.5-35B-A3B-GGUF |
| GPT OSS | gpt-oss-20b (MoE) | ggml-org/gpt-oss-20b-GGUF |
| Nemotron-H | Nemotron-H-8B-Reasoning-128K | bartowski/nvidia_Nemotron-H-8B-… |
| Nemotron-H | Nemotron-H-47B-Reasoning-128K | bartowski/nvidia_Nemotron-H-47B-… |
| Mistral 3 | Mistral-Small-3.1-24B-Instruct | bartowski/Mistral-Small-3.1-24B-… |
Multimodal models need a projector (mmproj) file. The Gemma 4 / Gemma 3 / Qwen 3.5 / Mistral 3 projectors ship in or alongside the repos above; place the projector next to the model with a recognized name for auto-loading, or pass it explicitly with --mmproj.
Multimodal support
| Family | Inputs | Notes |
|---|---|---|
| Gemma 4 | Image · Video · Audio | Images PNG/JPEG/HEIC; Video MP4 (1 fps via OpenCV); Audio WAV 16 kHz mono / MP3 / OGG. Projector: gemma-4-mmproj-F16.gguf. |
| Gemma 3 | Image | PNG / JPEG / HEIC. Projector: mmproj-gemma3-4b-f16.gguf. |
| Qwen 3.5 / 3.6 | Image | Dynamic-resolution vision encoder. Projector: Qwen3.5-mmproj-F16.gguf. |
| Mistral 3 | Image | Pixtral vision encoder. Projector: mistral3-mmproj.gguf. |
| Nemotron-H (Omni) | Image | RADIO / v2_vl ViT encoder. Pass the matching --mmproj; image tokens expand at <image> placeholders. |
Send images/audio/video via the CLI (--image, --video, --audio), the Web UI uploads, or the HTTP API (base64 images array for Ollama, image_url data URI for OpenAI).
Thinking / reasoning mode
Thinking-capable models (Qwen 3, Qwen 3.5/3.6, Gemma 4, GPT OSS, Nemotron-H) produce structured chain-of-thought before the final answer. The thinking content is separated from the visible response so the client can show or hide it.
- Qwen 3 / Qwen 3.5/3.6 / Nemotron-H —
<think>…</think>tags. - Gemma 4 —
<|channel>thought …<channel|>tags. - GPT OSS — Harmony format:
<|channel|>analysisfor thinking,<|channel|>finalfor the answer.
Enable it via --think (CLI), "think": true (Ollama API / Web UI), or the thinking toggle in the browser. Responses expose the reasoning separately — e.g. message.thinking in the Ollama chat response.
Tool calling / function calling
Models can invoke user-defined tools and participate in multi-turn tool-call conversations. Define tools as JSON and pass them via --tools (CLI) or the tools parameter (API). Each architecture uses its own wire format, but the output parser extracts calls into structured tool_calls regardless:
- Qwen 3 / Qwen 3.5/3.6 / Nemotron-H —
<tool_call>{"name": …, "arguments": {…}}</tool_call> - Gemma 4 —
<|tool_call>call:function_name{args}<tool_call|> - GPT OSS (Harmony) — tools declared as a TypeScript namespace; calls emitted on the commentary channel.
See Tool calling over HTTP for a complete request/response example and the continuation loop.