A serious issue recently surfaced in OpenZFS 2.2.0, implemented in operating systems such as FreeBSD 14. The new feature, introduced a month ago, is called “block cloning” and aims to optimize data management. However, a critical bug, identified by the Gentoo team and reported as bug #15526, led to the immediate release of OpenZFS 2.2.1, which temporarily disables this feature.
This incident raises concerns about the reliability of OpenZFS, which is known for its robustness and data integrity. Additionally, for systems like FreeBSD 14, which include this version of OpenZFS, caution is advised. BSD expert Colin Percival pointed out on Twitter that the “block cloning” feature is disabled by default in FreeBSD 14 and warned against enabling it to avoid data loss.
The bug manifests itself as file corruption – copied files show a combination of zeros and blocks of data that appear to be encoded in Base64. Interestingly, file system health checks do not detect the problem, something that requires attention.
The nature of the bug appears to be tied to specific and rare circumstances. Bronek Kozicki explained on GitHub that the bug could occur during asynchronous writing of a file, if at that precise moment the part of the file that is still being written is read. This can lead to corrupt data, a problem not easily detected without comparing checksums.
The developers have identified an issue related to ZFS dnodes and their handling, which may be an older, hidden bug. The bug became more noticeable with the introduction of the new block cloning feature, especially in multi-core systems.
For Linux users, it appears that the bug is also related to the version of coreutils used, particularly versions later than 9.x. It's not yet clear whether Ubuntu 23.10 has enabled this feature by default in its recently returned but still experimental ZFS support.
OpenZFS 2.2.2 is expected to arrive soon to fix these issues. In the meantime, the community is encouraged to maintain a cautious approach and consider backups to other file systems to ensure data safety.
We've already mentioned the work of BSD expert Colin Percival, but anyone who has already installed this “ground zero” release should heed his warning on Twitter X: “FreeBSD 14's ZFS code supports 'block cloning'. This is disabled by default. DO NOT ACTIVATE THIS FEATURE IF YOU DO NOT WANT TO LOSE DATA.”
FreeBSD 14's ZFS code supports "block cloning". This is turned off by default.
DO NOT ENABLE THIS FEATURE UNLESS YOU WANT TO LOSE DATA. https://t.co/Embps7Te4e
— Colin Percival (@cperciva) November 23, 2023
At the time of writing, it is unclear exactly what causes it. It appears to be an extremely specific (and therefore unlikely) combination of circumstances, meaning it almost never occurs, as Bronek Kozicki explains on GitHub:
It is necessary to understand the mechanism that causes corruption. It may have been present for a decade and only cause problems in very specific scenarios, which do not normally occur.
Unless you can adapt your backup mechanism to the conditions described below, you are very unlikely to have been affected by it.
- A file is being written (typically asynchronously – meaning the writing is not complete at the time the writing process “thinks” it is)
- At the same time that ZFS is still writing data, the modified part of the file is being read. Same moment means “hitting a very specific window of time,” measured in microseconds (one millionth of a second).
- If it is read at this very specific time, the reader will see zeros where the data written is actually something else
- If the player then stores the incorrectly read zeros somewhere else, that's where the data gets corrupted. One of the bug hunters wrote a script, reproducer.sh, that targets ZFS volumes and checks to see if the files get corrupted. One of the problems surrounding this question is that there is no way to write a program that can report whether or not a file has been corrupted by inspecting its contents: it is perfectly normal for some types of files to contain long stretches of zeros. The only way to be sure is to compare checksums before and after copy operations – so worried users who don't have backups on other types of file systems can't easily tell. OpenZFS's built-in tool for checking the validity of storage pools cannot detect the problem.
A possible fix is open, and the investigation appears to have uncovered a different, pre-existing underlying bug, which may have been present as early as 2013. The bug affects ZFS dnodes and the logic of how the code checks whether a dnode is “ dirty” or not, which governs whether it should download it: synchronize any changes to the disk.
It is possible that this unique cause was deeply hidden, and therefore very unlikely to target. Unfortunately, the new faster copy feature meant that what was a bug that would only corrupt data once in tens of millions of file copies suddenly became more likely, especially on machines with many processor cores all in simultaneous use.
For Linux users, an additional condition seems to be that the operating system has a recent version of the coreutils package – above version 9.x. This is the tool that provides the cp command functionality. So far, we haven't been able to verify whether Ubuntu 23.10 has the block cloning feature enabled by default in its recently returned (but still experimental) support for installing on ZFS, but at least one comment on the original bug is from someone who reproduced it on Ubuntu.
It seems very likely that OpenZFS 2.2.1, which simply disables block cloning, will be quickly followed by a 2.2.2 release to fix the underlying treatment of dnodes.