IPFIX/NetFlow parsing in .NET: binary protocols in production

Binary protocols in .NET at high throughput. For Netigo we parse millions of IPFIX packets daily. How to minimise allocations, what Span<T>/Memory<T> buys you, and where the line sits between "elegant" and "fast".

.NETBinary parsingPerformanceSpan

An IPFIX packet arrives as a UDP datagram with binary content. Header 16 bytes, then template records, then data records, each parsed according to the template defined in a prior packet. A naive approach with BinaryReader and byte[] allocations handles 30 000 packets/s. We need 500 000/s.

The difference is how you handle memory. Here is a recap of the patterns we use and why they matter.

Problem 1: Per-packet allocations

Naive code:

var reader = new BinaryReader(new MemoryStream(packet));
var version = reader.ReadUInt16();
var length = reader.ReadUInt16();
// ... further fields
var flows = new List<Flow>();
while (reader.BaseStream.Position < length) {
  var flow = new Flow { ... };
  flows.Add(flow);
}
return flows;

Per-packet allocations: MemoryStream, BinaryReader, List, 1-30 Flow objects. In .NET GC every allocation carries non-zero overhead. At 500k packets/s that's millions of allocations per second — GC burns most of your CPU.

Goal: zero-alloc parsing on the hot path. Everything stack-allocated or pooled.

Solution: Span<T> instead of byte[]

Span<byte> is a view over memory — no new allocation. The packet arrives as byte[] from the UDP socket; everything else operates over the Span:

public static void ParsePacket(
  ReadOnlySpan<byte> packet,
  ref IpfixHeader header,
  Span<Flow> flowsBuffer,
  out int flowCount
) {
  header.Version = BinaryPrimitives.ReadUInt16BigEndian(packet[0..2]);
  header.Length = BinaryPrimitives.ReadUInt16BigEndian(packet[2..4]);
  // ... further fields

  int position = 16;
  flowCount = 0;
  while (position < header.Length && flowCount < flowsBuffer.Length) {
    ParseFlow(packet.Slice(position), ref flowsBuffer[flowCount]);
    position += flowSize;
    flowCount++;
  }
}

No new allocation. BinaryPrimitives (namespace System.Buffers.Binary) reads primitive types from Span<byte> with explicit endianness. IPFIX is big-endian, x86 byte order is little-endian — conversion is required.

Solution: ArrayPool for DB writer buffers

The parser produces flows that feed the DB writer in batches. It needs a Flow[] array of ~5000 items. Without pooling:

var batch = new Flow[5000];

= allocation of 5000 × sizeof(Flow) = ~640 KB every batch, every 3 seconds. With pooling:

var batch = ArrayPool<Flow>.Shared.Rent(5000);
try {
  // parse, fill batch
  await WriteBatchAsync(batch, flowCount);
} finally {
  ArrayPool<Flow>.Shared.Return(batch, clearArray: true);
}

ArrayPool keeps a pool of reused arrays. Rent returns an existing array from the pool, Return gives it back. Allocations → 0 after the first few seconds of operation.

Problem 2: IPFIX template state

IPFIX protocol ships TEMPLATE records that define the shape of subsequent DATA records. A template arrives once, then a flood of data records says “I'm template ID 256” and you have to know what that means.

Per-sensor template cache. ConcurrentDictionary? No — heavy synchronisation at 500k packets/s. Per-sensor ImmutableDictionary swap:

private ImmutableDictionary<ushort, Template> _templates =
  ImmutableDictionary<ushort, Template>.Empty;

public void AddTemplate(ushort id, Template tmpl) {
  _templates = _templates.Add(id, tmpl);  // atomic swap
}

public Template? GetTemplate(ushort id) {
  return _templates.GetValueOrDefault(id);  // lock-free read
}

Template updates are rare (once per 10 min per sensor). Lookups are common (every flow packet). Lock-free read with an occasional-write pattern is the optimum.

Problem 3: String allocations for IP addresses

An IP address in IPFIX = 4 bytes (IPv4) or 16 bytes (IPv6). In code you often want a string for logging or DB write. Naively:

var ipString = new IPAddress(bytes).ToString();  // allocation

Allocates an IPAddress object plus a string. Solution: feed the DB writer through Npgsql's binary protocol directly with 4 bytes (no string round-trip):

writer.Write(new IPAddress(bytes), NpgsqlDbType.Inet);
// alternatively, for super-hot paths:
writer.WriteRaw(bytes, NpgsqlDbType.Inet);

Postgres INET type is 5 or 17 bytes on disk. The binary protocol pushes raw bytes without formatting.

Problem 4: Error handling on the hot path

Throwing exceptions in .NET is EXPENSIVE — stack walks, object allocations, logging hooks. In a hot path parsing 500k packets/s, 1000 exceptions/s (bad templates, malformed data) = 10-20% extra CPU.

Pattern: error codes instead of exceptions:

public enum ParseResult { Success, UnknownTemplate, Truncated, BadLength }

public static ParseResult ParseFlow(
  ReadOnlySpan<byte> data,
  ref Flow flow
) {
  if (data.Length < 16) return ParseResult.Truncated;
  // ... parse
  return ParseResult.Success;
}

Exceptions only for genuinely exceptional cases (unexpected parser state, which should tear the program down). Business errors = error codes.

Performance numbers

Measured on Netigo production HW (32-core Xeon, 64 GB RAM):

Naive BinaryReader + List: 28 000 packets/s (single thread)
Span<T> + stack buffers: 180 000 packets/s (single thread) — 6× speedup
+ ArrayPool + ImmutableDictionary templates: 260 000 packets/s (single thread)
Multi-threaded (16 cores, dedicated thread pool): 2 100 000 packets/s — ~8× scaling (the rest is GC and I/O)

GC allocation rate (ETW trace): ~50 MB/s in the naive version, ~3 MB/s in the final. 15× less GC pressure.

The takeaway

For high-throughput binary parsing in .NET there is a set of patterns every application processing > 100k packets/s must use. The key points:

Span<T> + BinaryPrimitives instead of BinaryReader/byte[]
ArrayPool for repeatedly used buffers
Immutable collections + atomic swap for shared state with rare-write pattern
Error codes instead of exceptions on the hot path
Binary protocol on the DB writer, no string round-trips

This is territory where .NET really shows its wings. The language — after the Rust revolution — has a reputation as “enterprise slow.” Reality is that, written right, it lands within 10-20% of Rust or C++ performance, at 10× better developer experience.

LinkedIn X

Working on something similar?

Book a 30-minute technical call. No sales process — direct architectural feedback.

Our service:

Build systems that scale — without bottlenecks →

Pick a time