Run PyTorch in .NET with TorchSharp

In this blog post Practical ways to run PyTorch in .NET with TorchSharp and more we will walk through reliable ways to use PyTorch from .NET, when to choose each approach, and how the pieces work under the hood.

At a high level, you have three good options: write and run models directly in .NET with TorchSharp; train in Python and deploy in .NET via ONNX Runtime; or keep Python for inference behind a service boundary and call it from .NET. Each route can be production-grade with the right packaging and testing.

What’s happening under the hood

PyTorch is a tensor library and autograd system with a rich operator set, a module system (torch.nn), and multiple backends (CPU, CUDA). The C++ core of PyTorch is called LibTorch and exposes high-performance kernels and the JIT runtime.

TorchSharp is a .NET binding over LibTorch. It provides C# and F# APIs for tensors, autograd, and nn modules, and calls into the same native kernels that Python PyTorch uses. That means it’s fast, supports CPU or GPU, and is deployable as a pure .NET application with native dependencies.

ONNX is an open model format. You can export many PyTorch models to ONNX in Python, then load and run them in .NET with Microsoft’s ONNX Runtime. This is excellent for inference, especially when you want a minimal runtime without shipping the whole PyTorch stack.

Finally, a service boundary (REST/gRPC) lets you keep Python in production for inference while .NET owns the app and business logic. This is often the quickest bridge when you have existing Python models or teams.

Three production patterns to choose from

1) TorchSharp for end-to-end .NET

Pros: Single tech stack, full control, no Python in prod, great performance.
Cons: API surface isn’t identical to Python; you’ll port training code to C#.
Best for: Teams committed to .NET who want training and inference in one runtime.

2) PyTorch to ONNX to .NET

Pros: Keep training in Python; lightweight, fast inference with ONNX Runtime.
Cons: Some models/operators don’t export cleanly; no training, inference only.
Best for: Inference at scale, simpler deployment, minimal native dependencies.

3) Python service with a .NET client

Pros: Reuse existing Python code/libs as-is; easy iteration.
Cons: Two runtimes to operate; network hop; latency considerations.
Best for: Fast time-to-value when models frequently change or are Python-heavy.

Getting started with TorchSharp in .NET

Install packages

Add the TorchSharp NuGet package to your .NET project. For GPU acceleration, add the matching CUDA-enabled native runtime for your OS/CUDA version as documented by TorchSharp. For CPU-only scenarios, use the CPU runtime (often brought in by default).

dotnet add package TorchSharp

Note: TorchSharp ships native LibTorch binaries per OS/arch. Align your package choice with your deployment target, and prefer 64-bit builds.

A minimal TorchSharp example

The sample below trains a tiny binary classifier in C#. It shows device selection (CPU/GPU), model definition, a training loop, and inference.

using System;
using System.Linq;
using TorchSharp;
using static TorchSharp.torch;
using static TorchSharp.torch.nn;

class Program
{
    static void Main()
    {
        // Select device
        var device = cuda.is_available() ? CUDA : CPU;
        Console.WriteLine($"Using device: {device.type}");

        // Synthetic dataset: points in R^2 labeled by a linear boundary
        var n = 4096;
        var x = randn(new long[] { n, 2 }, device: device);  // (n, 2)
        var wTrue = tensor(new float[] { 1f, 1f }, device: device).unsqueeze(1); // (2,1)
        var logitsTrue = matmul(x, wTrue); // (n,1)
        var y = logitsTrue.gt(0f).to_type(ScalarType.Float32); // (n,1), 0/1

        // Model: 2 -> 16 -> 1 with ReLU, train with BCEWithLogits
        var model = Sequential(
            ("fc1", Linear(2, 16)),
            ("relu1", ReLU()),
            ("fc2", Linear(16, 1))
        ).to(device);

        var lossFn = BCEWithLogitsLoss();
        var optimizer = optim.Adam(model.parameters(), lr: 0.05);

        var epochs = 50;
        var batch = 256;

        model.train();
        for (int epoch = 1; epoch <= epochs; epoch++)
        {
            var perm = randperm(n, device: device).to_type(ScalarType.Int64);
            var totalLoss = 0.0;
            for (int i = 0; i < n; i += batch)
            {
                var idx = perm.slice(0, i, Math.Min(i + batch, n));
                var xb = x.index_select(0, idx);
                var yb = y.index_select(0, idx);

                var logits = model.forward(xb);
                var loss = lossFn.forward(logits, yb);

                optimizer.zero_grad();
                loss.backward();
                optimizer.step();

                totalLoss += loss.to_double();

                // Dispose batch tensors to keep memory stable
                xb.Dispose(); yb.Dispose(); logits.Dispose(); loss.Dispose();
            }
            Console.WriteLine($"epoch {epoch}/{epochs}  loss {totalLoss * batch / n:F4}");
        }

        // Inference: compute predictions on a few samples
        model.eval();
        using var noGrad = no_grad();
        var test = tensor(new float[,] {{ 1f, 1f }, { -1f, -0.5f }, { 0.5f, -2f }}, device: device);
        var pred = sigmoid(model.forward(test));
        var probs = pred.cpu().to(DeviceType.CPU).data().ToArray();
        Console.WriteLine("Predicted probabilities: " + string.Join(", ", probs.Select(p => p.ToString("F3"))));

        // Cleanup
        test.Dispose(); pred.Dispose(); x.Dispose(); y.Dispose(); wTrue.Dispose(); logitsTrue.Dispose();
        model.Dispose(); optimizer.Dispose(); lossFn.Dispose();
    }
}

What to notice:

Same concepts as Python PyTorch: tensors, modules, optimizers, autograd.
Device placement mirrors PyTorch. If CUDA is available, GPU is used.
Dispose intermediate tensors in tight loops to keep memory steady.

Train in Python, run in .NET with ONNX Runtime

Export to ONNX in Python

import torch, torch.nn as nn

class M(nn.Module):
    def __init__(self):
        super().__init__()
        self.net = nn.Sequential(
            nn.Linear(2, 16), nn.ReLU(), nn.Linear(16, 1)
        )
    def forward(self, x):
        return self.net(x)

model = M().eval()
dummy = torch.randn(1, 2)

torch.onnx.export(
    model, dummy, "model.onnx",
    input_names=["input"], output_names=["logits"],
    dynamic_axes={"input": {0: "batch"}, "logits": {0: "batch"}},
    opset_version=17
)

Run the ONNX model in .NET

dotnet add package Microsoft.ML.OnnxRuntime

using Microsoft.ML.OnnxRuntime;
using Microsoft.ML.OnnxRuntime.Tensors;

using var session = new InferenceSession("model.onnx");

// Create a 3x2 input (batch=3)
var input = new DenseTensor<float>(new[] { 3, 2 });
var data = new float[] { 1f,1f, -1f,-0.5f, 0.5f,-2f };
for (int i = 0; i < data.Length; i++) input.Buffer.Span[i] = data[i];

var inputs = new List<NamedOnnxValue> {
    NamedOnnxValue.CreateFromTensor(session.InputMetadata.Keys.First(), input)
};
using var results = session.Run(inputs);

var output = results.First().AsEnumerable<float>().ToArray();
Console.WriteLine("Logits: " + string.Join(", ", output.Select(v => v.ToString("F3"))));

Tip: For GPU inference with ONNX Runtime, use the appropriate GPU-enabled package and ensure CUDA/cuDNN drivers are present on the host image.

Keep Python, call it from .NET

When you already have stable Python inference code, wrap it behind a small HTTP or gRPC service. FastAPI makes this easy.

Minimal FastAPI wrapper (Python)

from fastapi import FastAPI
import torch

app = FastAPI()
model = torch.jit.load("model.pt").eval()  # or load a PyTorch nn.Module

@app.post("/predict")
def predict(payload: dict):
    x = torch.tensor(payload["data"]).float()
    with torch.no_grad():
        y = model(x).tolist()
    return {"pred": y}

.NET client call

using System.Net.Http;
using System.Text;
using System.Text.Json;

var http = new HttpClient();
var payload = JsonSerializer.Serialize(new { data = new float[][] {
    new float[]{1f,1f}, new float[]{-1f,-0.5f}
}});
var resp = await http.PostAsync(
    "http://ml-service/predict",
    new StringContent(payload, Encoding.UTF8, "application/json"));
resp.EnsureSuccessStatusCode();
var body = await resp.Content.ReadAsStringAsync();
Console.WriteLine(body);

Keep requests small, batch where possible, and consider gRPC for low-latency, high-throughput scenarios.

How to choose

If you want a single runtime and full control, pick TorchSharp.
If you want Python for training and a slim, fast inference in .NET, use ONNX Runtime.
If you want to move fast with existing Python code, expose a service.

Performance and deployment tips

TorchSharp

Package native LibTorch with your app. Choose the CPU or CUDA runtime matching your OS/arch.
For containers, start from an image that includes the right CUDA drivers if using GPU.
Use batches and disable grads for inference (no_grad()). Warm up the model before first request.
Dispose temporary tensors in loops to avoid memory growth.
Align TorchSharp version with its documented LibTorch version to avoid ABI mismatches.

ONNX Runtime

Validate the ONNX export as part of CI. Mismatched opsets cause runtime errors.
Use dynamic axes in export if your batch sizes vary.
Normalize inputs in .NET the same way as in Python. Shape and layout (NCHW vs NHWC) must match.
Consider the Execution Providers you need (CPU vs CUDA vs DirectML) based on your hardware.

Python service

Set concurrency with uvicorn/gunicorn workers tuned to your hardware.
Batch requests server-side to maximize GPU utilization.
Version your model artifact and expose a health/metadata endpoint.
Secure the service (auth, TLS) and rate-limit externally.

Common pitfalls to avoid

Data type mismatches: INT64 vs INT32 indices, float32 vs float64 tensors.
Silent shape errors: log and assert shapes at boundaries; add unit tests.
Export gaps: some custom ops or control flow don’t export well to ONNX. Use TorchScript or service boundary if needed.
Driver/ABI mismatches: keep CUDA and LibTorch versions aligned across build and deploy.

Wrapping up

You have solid, well-supported paths to run PyTorch with .NET. TorchSharp gives you native training and inference in a single stack. ONNX Runtime delivers lightweight, fast inference for many exported models. A Python service is a pragmatic bridge when you need full PyTorch flexibility right now.

Pick the pattern that fits your team and deployment constraints, automate validation in CI, and standardize packaging for your target environments. With that foundation, bringing ML to your .NET applications becomes straightforward and maintainable.

Discover more from CPI Consulting

Subscribe to get the latest posts sent to your email.