[ Friday, 23 May 2008, optimizationkit ]
There is a myth that “linux filesystems don’t need to be defragmented.” As it may be truth in general, it still can be dispelled by a simple script, which creates a certain number of directories and in each of them creates and deletes a certain number of files – in, let’s say, two passes. So, does your filesystem need defragmentation?
(!!!WARNING!!! In this article we use the fsck.ext3 program to check the degree of filesystem fragmentation – before using this program on your own better get familiar with its documentation. An unskilled use of fsck on a mounted filesystem can cause data failure. !!!WARNING!!!)
Let’s make an experiment – we’re going to start with creating a filesystem image:
dd if=/dev/zero of=filesystem.img bs=256M count=1
Next we’re going to create a new filesystem (I suggest ext2 or ext3, because it’s easy to obtain information about the degree of filesystem fragmentation for those).
Now we can check the fragmentation degree of the newly-created filesystem:
/sbin/fsck.ext3 -nfv filesystem.img
e2fsck 1.40.4 (31-Dec-2007) Pass 1: Checking inodes, blocks, and sizes Pass 2: Checking directory structure Pass 3: Checking directory connectivity Pass 4: Checking reference counts Pass 5: Checking group summary information 11 inodes used (0.02%) 1 non-contiguous inode (9.1%) number of inodes with blocks ind/dind/tind: 0/0/0 18561 blocks used (7.08%) 0 bad blocks 0 large files 0 regular files 2 directories 0 character device files 0 block device files 0 fifos 0 links 0 symbolic links (0 fast symbolic links) 0 sockets -------- 2 files
The information about the fragmentation degree contains the line:
1 non-contiguous inode (9.1%)
This is how the situation looks like on a “fresh” filesystem. It’s very easy to simulate how it would look like on an frequently used filesystem (on which we delete and save new files a lot). Let’s execute a the frag.sh script
It’s enough to mount a test-filesystem-image:
mount -o loop filesystem.img /mnt/loop0/
and next give to the script the directory name, in which the filesystem image is mounted:
The script will show many notifications, like:
Creating file /mnt/loop0//8/file-128 28444+0 records read 28444+0 records saved 85332 bytes (85 kB) copied, 0,173525 s, 492 kB/s step = 6 Deleting file /mnt/loop0//8/file-0
Next we have to unmount our test-filesystem:
and again check the fragmentation degree:
/sbin/fsck.ext3 -nfv filesystem.img
Because the script creates randomly sized files and deletes randomly chosen files, the data can significantly differ from the given below
e2fsck 1.40.4 (31-Dec-2007) Pass 1: Checking inodes, blocks, and sizes Pass 2: Checking directory structure Pass 3: Checking directory connectivity Pass 4: Checking reference counts Pass 5: Checking group summary information 895 inodes used (1.37%) 315 non-contiguous inodes (35.2%) number of inodes with blocks ind/dind/tind: 735/0/0 77202 blocks used (29.45%) 0 bad blocks 0 large files 875 regular files 11 directories 0 character device files 0 block device files 0 fifos 0 links 0 symbolic links (0 fast symbolic links) 0 sockets -------- 886 files
As you can see — 35.2% of the inodes are not placed continuously — the fragmentation degree of files in our image is very high.
To have a point of reference for further experiments I suggest to make a backup of this image:
cp filesystem.img filesystem-backup.img
The ok_defrag program that we’re going to use for defragmenting can be downloaded from here.
In order to for the program to work, the
python-dialog package is required, so install it using your package manager of choice.
The working principles of this program are based on the ones shown in Con Kolivas’ defrag. However, ok_defrag doesn’t fumble at the filesystem’s data structures, as professional programs like Diskeeper or Windows Defrag do.
The program is to be executed as follows:
ok_defrag -l log.txt -d /mnt/loop0/
-d are obligatory. The log of the program work will let us recover data, if something goes wrong, e.g. if the power-supply would fail during the job. The
-d flag shows the directory with files, which we want to defragment.
First ok_defrag creates a list of directories, which are placed in the pointed directory — the list is sorted by directory size – from the largest to the smallest. Next for each directory of the list, the program creates a list of files, which are placed in those directories – it’s also sorted by size, from the largest to the smallest. Such a file sorting method should provide the best results during defragmentation.
Each file is copied to the
/tmp/ok_defrag_tmp file, and then the content of this file is moved to the original location – such a simple method repeated a few times for each file (3 times by default) should provide the desired result. Let’s see, how it works out.
We have to mount our test-filesystem again:
sudo mount -o loop filesystem.img /mnt/loop0/
Then we run ok_defrag:
ok_defrag -l log.txt -d /mnt/loop0/
Now we have to wait a while (!!!WARNING!!! The defragmentation process should not be interrupted !!!WARNING!!!)… When the process finishes, we can unmount the filesystem:
and check the final result:
/sbin/fsck.ext3 -nfv filesystem.img e2fsck 1.40.4 (31-Dec-2007) Pass 1: Checking inodes, blocks, and sizes Pass 2: Checking directory structure Pass 3: Checking directory connectivity Pass 4: Checking reference counts Pass 5: Checking group summary information 895 inodes used (1.37%) 84 non-contiguous inodes (9.4%) number of inodes with blocks ind/dind/tind: 735/0/0 77202 blocks used (29.45%) 0 bad blocks 0 large files 875 regular files 11 directories 0 character device files 0 block device files 0 fifos 0 links 0 symbolic links (0 fast symbolic links) 0 sockets -------- 886 files
It was able to decrease fragmentation level from 35.2% to 9.4%. Another run of the process should decrease the fragmentation level even more. Instead of running ok_defrag a couple of times, you can add a
-f flag to the program command line — then the defragmentation process will be repeated a specified number of times.
Opposite to Con Kolivas’ defrag, ok_defrag has a paranoic approach to data safety. Each defragment operation is divided into the following stages:
- writing down the time of the last file modification,
- writing down the file’s checksum,
- copying the file to
- testing the checksum of the
- comparing both checksums — if they differ from each other, it means you have problems with your hardware or our system — the defragmentation process for this file is interrupted,
- another testing of the modification time for the original file — if it has been modified after the copy had been made, the defragmentation process for this file is interrupted,
- if all of the above stages are completed successfully, the content of the
/tmp/ok_defrag_tmpfile is moved to the original file,
- another testing of the file’s checksum – if the checksum differs from the original, it means something went really wrong – the program terminates.
An attentive user will notice that on the seventh stage, when moving the content of
/tmp/ok_defrag_tmp to the original file,
/tmp/ok_defrag_tmp is not being blocked for writing to other programs. It’s a defect, which (I hope) can be eliminated, because it hinders the defragmentation process in some directories, in which the data is still being written, e.g. logs.
If something during the data defragmentation process went wrong, e.g. unexpected power failure occurred – you need to run
fsck on the filesystem. Filesystems with journaling should handle this problem – because
ok_defrag doesn’t really make any abracadabra, but only moves files.
But if a file got damaged, you need to check the log – there is information about the performed tasks and checksums of files from all stages.
Checking /mnt/loop0/8/file-91 md5sum md5sum = b8b828bda1b12173f8f8b0b87d8cd872 Copying /mnt/loop0/8/file-91 to /tmp/ok_defrag_tmp Done. Checking /tmp/ok_defrag_tmp md5sum md5sum = b8b828bda1b12173f8f8b0b87d8cd872 Moving /tmp/ok_defrag_tmp to /mnt/loop0/8/file-91 Done. Checking /mnt/loop0/8/file-91 md5sum md5sum = b8b828bda1b12173f8f8b0b87d8cd872
If, e.g. a log breaks at “Moving /tmp/ok_defrag_tmp to /foo/bar/bas” it means, the file hasn’t realy been fully moved — its copy is placed in
/tmp/ok_defrag_tmp — you just need to replace the file.
Some systems have a functionality to remove files located in
/tmp during bootup. The best thing would be to turn it off for the time of defragmentation.
To check the effects of defragmentation the best thing is to use seekwatcher (it requires blktrace, CONFIG_BLK_DEV_IO_TRACE and CONFIG_DEBUG_FS options in kernel, and
python-matplotlib with its dependencies).
A simple test:
seekwatcher -t find.trace -o find.png -p 'sync; echo 1 > /proc/sys/vm/drop_caches; for file in `find /mnt/loop0/ \ -type f`; do cat $file > /dev/null; done ' -d /dev/
shows the difference in file reading speed between a fragmented filesystem (filesystem-copy.img)
and a defragmented filesystem (filesystem.img)
I suggest to perform the defragmentation of directories, where the system can write data, after the system bootup to
/bin/sh — parameter
init=/bin/sh in the kernel command line. This should guarantee a trouble-free operation without worries about any data failure.
When is there no need to perform a defragmentation? Generally speaking, if
fsck returns non-contiguous inode below 10%.
How long does the defragmentation process take? It depends on the disk speed, the amount and size of files — each file is copied and moved several times, so this operation can take a very long, long time if you have thousands of files.
I also have to remind of one more thing – it’s worth to check, how much free space we have on
/tmp — the lack of free space won’t cause file damage — the checksums are tested the whole time — but it’s still worth doing. Remember that the defragmented disk should always have a reasonable amount of free space. If this is always the case, it may happen that you indeed will never need to perform the process. But it’s still good to be aware of the fact that you may occassionally do it on Linux as well.
This is a translation of the article Defragmentacja linuksowych systemów plików. Translated by Adam Dziuba. Proof-read by michuk.