Processing GB/s of network traffic in .NET: a zero-drop architecture

For Netigo we process IPFIX/NetFlow flows at GB/s volumes. The key is buffering, backpressure and horizontal scaling — no sync processing. What has to be right so the system doesn't start shedding data under load.

High-performance.NETNetworkBackpressure

A sensor sends 50 000 NetFlow flows at peak per second. The typical beginner system: sync parsing, sync DB insert. Result — the server can't keep up, UDP packets drop, 30% of data vanishes without a trace. Nobody notices unless someone has a reason to check.

For Netigo we built a pipeline that handles peaks of 2 GB/s network traffic with zero drops even under sustained growth. The architecture has several load-bearing principles, and following all of them is the condition.

Principle 1: UDP ingestion off the hot path

NetFlow/IPFIX is UDP. UDP has no retry; if you don't pick it up fast enough, it's gone. First thing — the UDP socket listener must do nothing other than grab bytes and push them into an in-memory queue.

In .NET:

while (true) {
  var result = await udpClient.ReceiveAsync();
  await channel.Writer.WriteAsync(new RawPacket {
    Bytes = result.Buffer,
    ReceivedAt = DateTime.UtcNow,
    SourceEndPoint = result.RemoteEndPoint
  });
}

No parsing. No validation. No DB call. Just move bytes into a System.Threading.Channels bounded channel. Per-packet time: ~15 μs. That means one thread handles ~60 000 packets/s.

If you need more: a second UDP listener on another port + load balancer in front. Horizontal ingestion scaling.

Principle 2: Bounded channel = implicit backpressure

When the processing pipeline slows (DB slow, network blip), the in-memory queue grows. An unbounded queue = out of memory in minutes. The right solution: a bounded channel with a limit.

var options = new BoundedChannelOptions(capacity: 100_000) {
  FullMode = BoundedChannelFullMode.Wait
};
var channel = Channel.CreateBounded<RawPacket>(options);

When the queue is full, the writer waits. What happens to UDP packets? Drops — but detected drops (metric increment). You don't get them back, but you see when it happens and can scale up.

Alternative: kernel-level buffers (setsockopt SO_RCVBUF). Raising socket buffer to 32 MB buys several seconds of headroom before UDP drops at OS level.

Principle 3: Parser in dedicated threads, not thread pool

The parser reads from the channel, decodes the IPFIX template, extracts fields. CPU-intensive work. If it runs on the default thread pool, it competes with HTTP API, logging and everything else.

Dedicated thread pool for the parser. Thread count = CPU core count. In .NET:

var parserTasks = Enumerable.Range(0, Environment.ProcessorCount)
  .Select(_ => Task.Factory.StartNew(
    () => ParseLoop(channel.Reader),
    TaskCreationOptions.LongRunning))
  .ToArray();

Each task pulls from channel.Reader, parses a packet, produces 1-30 parsed flows (an IPFIX packet contains multiple flow records) and pushes them into a second channel for the DB batch writer.

Principle 4: DB writes in batches, not one by one

Inserting a single row into TimescaleDB = ~1 ms (network + parse + write). 500 000 rows/s as single inserts is impossible — 500 000 ms = 500 seconds of work per second of data.

Solution: COPY FROM (Postgres bulk insert). Batch 5 000-10 000 rows into one COPY statement. Latency ~30 ms per batch. Throughput ~200 000 rows/s per writer, and writers scale horizontally.

In .NET via NpgsqlBinaryImporter:

using var writer = conn.BeginBinaryImport(
  "COPY flows (ts, src_ip, dst_ip, bytes, ...) FROM STDIN BINARY");
foreach (var flow in batch) {
  writer.StartRow();
  writer.Write(flow.Timestamp, NpgsqlDbType.TimestampTz);
  writer.Write(flow.SrcIp, NpgsqlDbType.Inet);
  // ...
}
await writer.CompleteAsync();

Versus individual INSERTs: 100× higher throughput.

Principle 5: Per-stage metrics, not just end-to-end

When the system slows, you need to know where. Our metrics per stage:

UDP receive: packets/s, bytes/s, drop count (socket stats)
Raw channel: current depth, enqueue wait time (when full)
Parser: packets parsed/s, parse errors/s, parse time p99
Parsed channel: current depth, enqueue wait time
DB writer: batches/s, rows/s, COPY time p99, DB pool utilisation
End-to-end latency: UDP receive to DB commit (p99)

When end-to-end p99 latency grows but UDP drop count stays 0 — backpressure is working; some stage is slow. Look at per-stage metrics. Parser p99 deteriorated? That's where. DB writer batches/s down? The DB is slow.

Principle 6: Horizontal scaling, not vertical

When you hit single-node limits (~1M packets/s on our HW), don't scale RAM/CPU. Scale out.

Each sensor sends flows to one of four ingestion nodes (stateless, load-balanced UDP via consistent hashing on sensor IP). Each node has the full pipeline (parser + writer) and writes to a shared TimescaleDB.

4 nodes × 500k packets/s = 2M packets/s capacity. If DB becomes the bottleneck, we'd add multi-node TimescaleDB or shard by sensor ID.

Netigo outcome after a year in production

Peak throughput: 850 000 flows/s (1.7 GB/s network data)
Average: 350 000 flows/s
UDP drop rate: < 0.001% (less than 1 in a million)
End-to-end p99 latency: 2.3 s (UDP → DB committed)
Single-node TimescaleDB, 4 ingestion nodes
Downtime in a year: 0

The takeaway

Processing large volumes of network data is largely about not trying to do too much in one place. Each pipeline stage has its own queue size, its own threads, its own metrics. When a stage slows, backpressure isolates it from the rest.

A sync end-to-end pipeline (UDP receive → parse → insert) caps out around 20-50 MB/s. Beyond that you have to split. The principle is language-agnostic — Java, Go, Rust all look the same. .NET has good tools (Channels, NpgsqlBinaryImporter) but isn't dramatically better or worse than other mainstream environments.

LinkedIn X

Working on something similar?

Book a 30-minute technical call. No sales process — direct architectural feedback.

Our service:

Build systems that scale — without bottlenecks →

Pick a time