Skip to main content

File Integrity Checks

The file-chunk source connector generates a MD5 checksum for each uploaded file. The file-chunk sink connector generates a MD5 checksum for each merged file. The MD5 checksums are compared by the Sink connector. If they match, then the file is renamed to filename__FINISHED. If they do not match then the file is renamed to filename__ERROR.

Various integrity checks are performed during pipeline streaming:

  • File chunks are numbered, from 1 to n; where n = (filesize / binary.chunk.size.bytes). Each chunk will be exactly binary.chunk.size.bytes bytes; except for the final chunk, which is usually smaller than binary.chunk.size.bytes.
  • The Sink connector verifies that the size of the (partially) merged target file is always (binary.chunk.size.bytes * chunk-number).
  • After merging the final chunk, the MD5 checksum for the target file is compared with the previously generated MD5 checksum for the source file; which is populated into the header for each chunk.