No description
  • Rust 86.5%
  • Nix 13.5%
Find a file
Christian Lengert 33796324c2 Add TimeoutStopSec to systemd service for graceful shutdown
Set TimeoutStopSec=60 to allow async flush operations to complete
before systemd kills the process. Previously, the service was terminated
before the consumer could flush buffered data.

Fixes: No data written during redeployments despite graceful shutdown handler.

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2026-06-27 11:27:41 +02:00
src Add graceful shutdown on SIGTERM with auto-flush 2026-06-27 10:50:23 +02:00
.gitignore Track Cargo.lock for reproducible builds and nix integration 2026-06-27 10:31:12 +02:00
Cargo.lock Track Cargo.lock for reproducible builds and nix integration 2026-06-27 10:31:12 +02:00
Cargo.toml Fix nix flake devShell and package dependencies 2026-06-27 10:29:57 +02:00
flake.lock Update flake.nix devShell with proper dependency scoping 2026-06-27 10:27:08 +02:00
flake.nix Fix nix flake devShell and package dependencies 2026-06-27 10:29:57 +02:00
module.nix Add TimeoutStopSec to systemd service for graceful shutdown 2026-06-27 11:27:41 +02:00
NIXOS.md Add Nix flake and NixOS module 2026-06-27 10:23:55 +02:00
package.nix Add curl to build dependencies for rdkafka compilation 2026-06-27 10:37:25 +02:00
README.md Initial commit: Kafka exporter standalone service 2026-06-27 10:22:29 +02:00
rust-toolchain.toml Initial commit: Kafka exporter standalone service 2026-06-27 10:22:29 +02:00

Kafka Exporter

A high-performance Kafka message exporter that captures Kafka topic messages to Parquet files with Hive partitioning.

Features

  • Multi-topic support: consume from multiple Kafka topics simultaneously
  • Buffered writes: in-memory buffering with configurable size and age limits
  • Atomic operations: safe file operations with temp file + rename pattern
  • Hive partitioning: organized output by year/month/day
  • Compression: Snappy-compressed Parquet files
  • Metrics: Prometheus metrics for monitoring
  • Replay support: HTTP API to reset consumer offsets and replay topics

Quick Start

docker build -t kafka-exporter .
docker run \
  -e KAFKA_BOOTSTRAP_SERVERS=kafka:9092 \
  -e TOPICS=sensor-readings,other-topic \
  -e EXPORT_DIR=/data/exports \
  -v data:/data \
  kafka-exporter

Configuration

Environment variables:

  • KAFKA_BOOTSTRAP_SERVERS - Kafka bootstrap servers (default: localhost:9092)
  • TOPICS - Comma-separated list of topic names to consume (required)
  • EXPORT_DIR - Root export directory (default: /mnt/safepool/data/feeder-kafka-exporter)
  • WORKDIR - Working directory for temp files (default: {EXPORT_DIR}/.tmp)
  • MAX_FILE_SIZE_MB - Max buffer size before flush (default: 256)
  • FLUSH_INTERVAL_SECS - Max age before flush (default: 3600)
  • HOST - HTTP server bind address (default: 127.0.0.1)
  • PORT - HTTP server port (default: 8090)

Output Format

Messages are written to Hive-partitioned Parquet files:

{EXPORT_DIR}/{topic}/year=YYYY/month=MM/day=DD/{uuid}.parquet

Parquet schema:

  • partition (int32) - Kafka partition number
  • offset (int64) - Kafka offset
  • key (binary, nullable) - Message key
  • kafka_timestamp (timestamp[ms], nullable) - Broker timestamp
  • received_at (timestamp[ms]) - Consumer receive time
  • payload (large_binary) - Raw message payload

Querying

With DuckDB:

SELECT * FROM read_parquet('./exports/sensor-readings/**/*.parquet', hive_partitioning=true)

HTTP API

  • GET /healthz - Liveness probe
  • GET /status - Status of all topics
  • POST /replay/{topic} - Reset offsets and replay topic

Building

cargo build --release