Skip to main content

Delivery Guarantees

Pipeline pair: Source and Sink

The File-chunk source and sink connectors must operate as a pipeline pair. The source connector populates message headers with file metadata which is required by the sink connector to merge chunks. Chunk merging involves ordering, checksums, file creation time and partitions: this is only possible using the File Chunk sink connector.

Exactly Once delivery

Source

A file chunk pipeline guarantees an exactly-once pipeline - a combination of an at-least delivery guarantee for the source connector and duplicate handling by the sink connector. The Kafka Connect worker (or Source connector) must be configured for unlimited retry for producers:

acks = ALL
max.in.flight.requests.per.connection = 1
retries = 2147483647
retry.backoff.ms = 500
warning

The Kafka Connect worker producer properties are common to all running connectors; which may offer different delivery guarantees. It is important that a Kafka Connect server hosting a file-chunk source connector is configured for unlimited retry. This may mean hosting a file-chunk source connector separately, alongside other connectors that are configured for unlimited retry.

Sink

Duplicate file-chunks (caused either by a source-connector resend, or by a manual replay) results in replay of file chunks by the sink connector. If the prior file-chunk is still on the filesystem then it is simply over-written. If the prior file-chunk has already been merged and deleted, then it is recreated, and the sink connector will attempt to consume all chunks, merge the file and over-write the prior file.

note

If an upstream file is recreated and queued a second time, then although the file has the same name, it has a different file create time. When merging chunks, the sink connector verifies that file create time for chunks are the same, to avoid merging chunks from files named the same (but with different create times). When file chunks from two different file generations are detected, the old generation is discarded and only the new generation is merged