Skip to main content
print this page

Files get re-ingested during s3 ingestion to append type datasets

· 2 min read
Fix Available

It is observed that during consecutive runs of S3 ingestion for datasets of append type, existing files within the dataset are erroneously re-ingested.

Affected Versions: 2.3 2.2 2.1 2.0

Fix Version: 2.4

Root cause(s)

The system used Etags of S3 files to determine file existence. However, due to the Etags not being the MD5 hash for larger files, different Etags were generated, causing failed comparisons and resulting in the ingestion of duplicate files.

Impact

This issue results in a failure to accurately identify previously ingested files, leading to their inadvertent re-ingestion. This recurrence may cause duplication of files, impacting data integrity and overall system efficiency.

Mitigation

Fix available

A fix is available in Amorphic v2.4. Please upgrade to the latest version to resolve this issue.

Timeline

  • 2023-09-11: Bug reported/identified (CLOUD-3937)
  • 2023-09-11: Bug triaged
  • 2023-10-05: Bug fixed
  • 2023-10-06: Testing completed and fix is available