Back to 32bit Linux?

In the beginning of this year I build myself a new PC with Intel E6750 processor, Gigabyte P35DS3R motherboard, Samsung HD501LJ 500Gb disk (370Gb with EXT3 for Linux), Nvidia 8600GT graphics card and 4Gb of memory. I decided to install a 64bit Gentoo Linux, currently with kernel 2.6.24-r3. On first impression the system seems to be quite quick. The software compiled very fast. The first evidence, that something was not quite right, were during playbacks of mpeg2 files with VLC, where there were frequent sound drops. I didn’t experience these drops with my old AMD XP 2800+.

It really became very annoying during my Audacity sessions. The loading of a 2 hours MP3 started pretty fast, I guess until the buffer cache was filled, then the IO activity basically grind to a halt. Internet radio streams continued playing, but even switching desktops didn’t produce any reaction until the MP3 was nearly loaded into Audacity.

Apparently I’m not alone with my experiences. There is an extended thread in the Gentoo forums. From this thread I collected some hints and configured the system with

echo 2 > /proc/sys/vm/dirty_ratio
echo 1 > /proc/sys/vm/dirty_background_ratio
echo deadline > /sys/block/sda/queue/scheduler
echo 1 >/sys/block/sda/device/queue_depth

I even reduced the main memory by starting the kernel with the mem=2G argument, but all these changes had practically no effect.

Gentoo with 64bit kernel

I then made a little test (not very scientific) and took a vmstat 5 log during the load of a given MP3 file into Audacity with the above config. It took Audacity 8 minutes and 24 seconds to read the file. Audacity’s data directory was filled with 3.34 Gb of audio data after the load. This is the text output. I loaded the data into OpenOffice and produced the diagram to the right. During the beginning of the loading the block out (bo) rate is very high (in the 20,000 range), then after a while it drops to 2000-4000 blocks. Once it drops the system is basically over 90% in wa (wait i/o).

Gentoo 64bit with 4Gb main memory
Gentoo 64bit with CFG scheduler

Update 2008-03-24: In my original post I compared Gentoo 64bit running in 2Gb of memory against Knoppix running in 4Gb. I repeated my little test with Audacity a couple more times. To the very left is a diagram running the test under Gentoo 64bit with 4Gb of memory. The result is not really different as compared with 2Gb of main memory. A couple of seconds faster maybe. The Audacity progress bar in the end reported 8:08 for loading the MP3 file. vmstat text output

The next diagram belongs to a test, where I changed the I/O-schedulter to CFQ and set queue_depth to 31. (Ok ok, first rule of benchmarking: Never change more then one parameter for each test run. Me Culpa). This improved the situation in so far, that it took Audacity only 6:50 to load the MP3 file. However, the system is still most of the time in wait-I/O. vmstat text output

Gentoo 64bit with XFS-fs in an image
Now, it gets interesting. For the next two runs I created a 6Gb image file with dd and then created a XFS file system inside the image. This file system was then mounted onto the directory into which Audacity loads its audio data. Now, this tests showed a performance, which I would expect from this particuar hardware setup. It took Audacity only 1:22 to complete the loading. The bo rate was nearly the complete runtime in the high 80,000 range. vmstat text output

Gentoo 64bit with EXT3-fs in an image
Now repeated the test one last time, but instead with a XFS-fs I reformatted the image file with a EXT3 file system. Again, the test completed pretty fast, but not as fast as with the XFS-fs. It took Audacity 1:46 to load the MP3 file. However, if you look at the diagram and compare it to the XFS diagram, you’ll see, that quite a bit of more wait-I/O is involved. vmstat text output

I’m wondering, if XFS simply the better file system for this kind of data intensive application or if the still somewhat higher wait-I/O rate for the EXT3 fs is an indication of the same problem, that causes so very high wait-I/O, when the test is run in the EXT3 root file system. Can a switch to XFS be the solution for my problem or do I really need to change to 32bit Linux. End of Update

With Knoppix 5.3 and 32bit kernel

To have a comparison I booted the Knoppix 5.3 DVD, which uses 2.6.24 as well, but in 32bit. I mounted the hard disk, configured Audacity to have the data directory on it and repeated the test. Here is the text output of vmstat 5 from that run. Here the system behaves just as it should. The block out (bo) count is in the 30,000 to 35,000. The whole operation completed in 1 minute and 46 seconds. Unfortunately I forgot to reduce the memory to 2Gb to make this measurement a little more comparable with the previous take. Anyway I don’t think, that this would explain the different between 1:46 and 8:24.

As a last resort I moved the kernel config file from the Knoppix DVD to the Gentoo system and recompiled a kernel with this config, changing only those parameters to change the kernel into a 64bit version, the rest remaining the same. I restarted the Audacity run, but half way through the test I saw, that it was just slow as with the original Gentoo kernel. So this really appears to be a 32bit vs. 64bit issue.

So, in the end I guess I’m going to move back to a 32bit kernel.