Skip to main content

Sink Connector

files.dir

The directory to write files that have been processed. This directory must exist and be writable by the user running Kafka Connect. The connector will automatically create subdirectories for builds, chunks, locked and merged. The completed files will be in the merged subdriectory. The other three subdirectories are self-managed by the connector. If the source and sink connectors are running on the same machine (for testing) then ensure that the files.dir property is not set to the same directory for both connectors.

  • Importance: HIGH
  • Type: STRING
warning

The Sink connector does not delete Merged Files from the files.dir. Removal of files from this directory must be down by the downstream process that uses the merged file.

topics

The topic to consume data from a paired File Chunk Source Connector. Multiple consumers (sink connectors) configured to consume from the same topic is supported. The topic can have one, or multiple partitions. The topic should be created manually prior to startup of the source or sink connectors. A schema registry is not required, as all file-chunk events are serialized as bytestream.

  • Importance: HIGH
  • Type: STRING
  • Default Value: none
halt.on.error

Stop all tasks if an error is encountered while processing file merges

  • Importance: HIGH
  • Type: BOOLEAN
  • Default: true
file.maximum.size.bytes.for.md5

Release 2.9: The upper size limit (in bytes) where an MD5 check is done to verify that the downloaded file is identical to the uploaded file. Input files below this size are verified using MD5 checks by the uploader (source connector) and the downloader (sink connector). Input files equal-to or larger-than this size are verified by checking that the downloaded file is exactly the same size as the uploaded file. This behaviour is configurable becuase MD5 checks are computationally costly and create memory pressure on the JVM, causing OutOfMemoryError for files exceeding 2GB in size. The default value is 1073741824 (1GB): this means that no MD5 check is performed for files exceeding 1GB in size). This is the same as the Uploader default value. The maximum value is 1932735283 (1.8GB): this is margainally below the 2GB memory size; above which failures are more likely. This property can be configured by the uploader or the downloader, or; ideally; both.

If it is configured at the uploader only, then the downloader respects the uploader configuration and behaves the same way. If it is configured at the downloader but not the uploader, then the downloader checks the filesize and skips the MD5 check for files exceeding its configured limit. This prevents mis-configured uploaders (configured with more memory) from breaking the downloader by attempting to perform MD5 checks on files that are too large for the downloader JVM memory to handle.

  • Importance: HIGH
  • Type: STRING
  • Default Value: 1073741824
  • Maximum Value: 1932735283
topic.partitions

(Deprecated in release 2.6): the topic.partitions used by the uploader is now passed to the downloader in the message header. (Deprecated in release 2.6): the binary.chunk.size.bytes used by the uploader is now passed to the downloader in the message header.