From 9e729390298c25adbd1cbf19948a5e4a882d6066 Mon Sep 17 00:00:00 2001 From: Aaron Gao Date: Thu, 6 Apr 2017 18:08:53 -0700 Subject: [PATCH] only FALLOC_FL_PUNCH_HOLE when ftruncate is buggy Summary: In RocksDB, we sometimes preallocate the estimated space for a file to have better perf with fallocate (if supported). Usually it is a little bit bigger than the real resulting file size. At this time, we have to let the Filesystem reclaim the space not used. Ideally, calling ftruncate to truncate the file to its real size should be enough. HOWEVER, it isn't on tmpfs, which we witness in our case, with some buggy kernel version. ftruncate a file with preallocated space doesn't change number of the blocks used by the file, which means the space not used by the file is not returned to the filesystems. So in this case we need fallocate with FALLOC_FL_PUNCH_HOLE to explicitly reclaim the used blocks. It is a hack to cope with the kernel bug and usually we should not need it. Closes https://github.com/facebook/rocksdb/pull/2102 Differential Revision: D4848934 Pulled By: lightmark fbshipit-source-id: f1b40b5 --- env/io_posix.cc | 20 ++++++++++++++++---- 1 file changed, 16 insertions(+), 4 deletions(-) diff --git a/env/io_posix.cc b/env/io_posix.cc index 776b99feb..da02951b5 100644 --- a/env/io_posix.cc +++ b/env/io_posix.cc @@ -768,10 +768,22 @@ Status PosixWritableFile::Close() { // of the file, but it does. Simple strace report will show that. // While we work with Travis-CI team to figure out if this is a // quirk of Docker/AUFS, we will comment this out. - IOSTATS_TIMER_GUARD(allocate_nanos); - if (allow_fallocate_) { - fallocate(fd_, FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE, filesize_, - block_size * last_allocated_block - filesize_); + struct stat file_stats; + fstat(fd_, &file_stats); + // After ftruncate, we check whether ftruncate has the correct behavior. + // If not, we should hack it with FALLOC_FL_PUNCH_HOLE + if ((file_stats.st_size + file_stats.st_blksize - 1) / + file_stats.st_blksize != + file_stats.st_blocks / (file_stats.st_blksize / 512)) { + fprintf(stderr, + "Your kernel is buggy (<= 4.0.x) and does not free preallocated" + "blocks on truncate. Hacking around it, but you should upgrade!" + "\n"); + IOSTATS_TIMER_GUARD(allocate_nanos); + if (allow_fallocate_) { + fallocate(fd_, FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE, filesize_, + block_size * last_allocated_block - filesize_); + } } #endif }