⚡ C# · .NET 10 · GGUF · GPU-accelerated

TensorSharp

A native .NET LLM inference engine for GGUF models — with a command-line tool, a browser chat server, and Ollama- & OpenAI-compatible APIs for programmatic access.

Get started → What is TensorSharp? GitHub ↗

Everything runs on your own hardware: your laptop, workstation, or server. No data leaves the machine, there are no per-token fees, and the same engine powers a quick command-line test, a shared internal chatbot, and a production REST endpoint. This wiki is the complete guide — pick a starting point below or use / to search.

Explore the wiki

🚀

Quick start in ~30 seconds

After installing the .NET 10 SDK, you are four commands away from a streaming reply (model download aside).

Clone & build

The native GGML library compiles automatically on the first build.

git clone https://github.com/zhongkaifu/TensorSharp.git
cd TensorSharp
dotnet build TensorSharp.slnx -c Release

Download a model

A small, well-tested starting point is Gemma-4-E4B (Q8_0) from Hugging Face. More in Model downloads.

Run it

Pick the --backend for your hardware.

echo "Explain mixture-of-experts in one sentence." > prompt.txt

# macOS (Apple Silicon)
./TensorSharp.Cli --model gemma-4-E4B-it-Q8_0.gguf --input prompt.txt --backend ggml_metal

# Windows / Linux + NVIDIA
./TensorSharp.Cli --model gemma-4-E4B-it-Q8_0.gguf --input prompt.txt --backend ggml_cuda

Prefer a UI + API?

Start the server and open the browser chat — it also serves the compatibility endpoints.
```
./TensorSharp.Server --model gemma-4-E4B-it-Q8_0.gguf --backend ggml_metal
# open http://localhost:5000
```

Why TensorSharp?

🔒

Private by default

Inference happens on your hardware. Prompts, documents, and images never leave the machine.

💸

No per-token bill

Run as much as your hardware allows — predictable cost, no metered API.

🔁

Drop-in compatible

Speaks the Ollama and OpenAI wire formats, so existing tools and SDKs just work.

🖥️

Runs anywhere

NVIDIA (CUDA), Apple Silicon (Metal/MLX), or pure CPU — with automatic fallbacks.

🧠

Modern model support

Gemma, Qwen, GPT-OSS, Nemotron-H, Mistral, plus vision, audio, reasoning & tools.

⚙️

Built in .NET

A native C# engine you can embed in your apps, not just a black-box binary.

Who is this for?

TensorSharp serves a wide range of visitors. Here is the fastest path for each.

Beginners & students

Start with the Glossary & FAQ, then Getting Started.

Developers

Jump to the HTTP API, C# Library, and API Reference.

Senior / principal engineers

Read Advanced Features — paged KV, continuous batching, speculative decoding.

Managers, CTOs & CEOs

See the business value and capability matrix.

Sales & marketing

Use the feature catalog and benchmarks for positioning.

Researchers & professors

Explore model architectures and the cross-engine matrix.

Welcome to the TensorSharp wiki. Next: Overview & Architecture →

TensorSharp

Explore the wiki

Getting Started

Command Line

Server & Web UI

HTTP API

C# Library

API Reference

Models

Glossary & FAQ