Features

A complete catalog of what TensorSharp does today. Each item links to the page where you can use it.

Highlights

🧠

Multi-architecture

Gemma 4 / 3, Qwen 3 / 3.5 / 3.6, GPT OSS, Nemotron-H, Mistral 3, DiffusionGemma.

🖼️

Multimodal

Image, video, and audio inputs (Gemma 4); image input for several others.

💭

Thinking mode

Structured chain-of-thought, separated from the visible answer.

🛠️

Tool calling

Multi-turn function calling across all three API styles.

📦

Native quantized compute

Q4_K_M, Q8_0, MXFP4, IQ2_XXS and more run in matmul without dequantizing to FP32.

🔀

Continuous batching

vLLM-style paged KV cache with cross-request prefix sharing.

Speculative decoding

MTP / NextN draft heads accelerate solo decode losslessly.

🔌

Ollama & OpenAI APIs

Drop-in endpoints for existing tooling, plus a browser chat UI.

Models & modalities

Generation & control

Performance & scale

Interfaces & integration