I’m still struggling to get a handle on the performance problems, that I seeing on my Core 2 Duo box. Somehow I can’t convince myself to go back to 32bit Linux. I’m still hoping, that the Linux community will produce a fix in the not too distant future. Personally I still think, that the problem is hidden somewhere in the file system layer.
While I can reliably reproduce the problem by loading a pretty big MP3 file into Audactiy, I didn’t have any luck to produce the observed behavior any other way. For instance, I loaded the particular MP3 file into Audacity. Then, from the content of Audacity’s data directory I produced a shell script, which would produce an equivalent file tree in terms of directory names, file size and names, by executing a sequence of mkdir and dd commands. No luck however. The script ran with top speed and showed no performance problems what so ever. Another approach I tried is, that I created a big tar-archive from Audacity’s data directory and then untared it. Again, the tar-archive was unpacked with top speed and didn’t ran into these extended periods of large wait-I/O percentages. Apparently it takes the particular load profile, that Audacity produces, that the problem is triggered.
Since 2.6.25 was released not too long ago, I was eager to redo my tests with this version. But first I did another run under 2.6.24, just to be sure. The problem is indeed still very visible, but the pattern changed slightly. In the past Audacity would start running with top loading speed for a certain amount of time, then for about an ½ of the complete loading time the load process would slow down to a crawl. Then at the end of the load the speed would again increase to the max. Now, as I redid the test, Audacity would crawl along for the first half and ran at top speed for the rest of the time (as can be seen in the left most picture down below). The only difference, that I can see between this run and the ones from the past, that the file system was filled in the meantime with a couple of 3-4 Gb mpg video files.
I then redid the tests with the vanilla 2.6.25 kernel and the Gentoo patched kernel. With these kernels I activated the latencytop kernel parameter.
Here are the diagrams, that I produced from the overall 4 runs. The left most is for 2.6.24, the next for vanilla 2.6.25, then Gentoo 2.6.25 and the last another vanilla 2.6.25.

The runtime as observed from the Audacity progress bar was (in order): 3m36s, 4m35s, 2m49, 3m36. In general the the loading time is definitely much shorter than in the past, where I observed times in the 7-8 minutes range. The only difference I can think of between now and then is the different filling grade of the file system.
Since the latencytop kernel parameter was enabled, here are some snapshot from the latencytop command, which I did, when the loading slowed down to a crawl.
Cause Maximum Average
Writing back inodes 1142.8 msec 8.4 msec
Creating block layer request 358.3 msec 167.7 msec
Writing to file 76.1 msec 76.1 msec
Reading EXT3 block bitmaps 44.6 msec 36.0 msec
do_select core_sys_select sys_select system_call_a 5.0 msec 1.8 msec
Application requested delay 5.0 msec 2.1 msec
Cause Maximum Average
Reading EXT3 block bitmaps 1348.7 msec 1348.7 msec
Writing back inodes 476.7 msec 45.0 msec
Creating block layer request 452.5 msec 290.3 msec
do_select core_sys_select sys_select system_call_a 5.0 msec 1.6 msec
Application requested delay 4.9 msec 1.8 msec
Waiting for event (poll) 4.9 msec 0.9 msec
Cause Maximum Average
Reading EXT3 block bitmaps 1080.2 msec 501.2 msec
EXT3 Creating a file 657.6 msec 75.0 msec
Creating block layer request 564.8 msec 209.5 msec
Writing back inodes 147.4 msec 17.4 msec
do_select core_sys_select sys_select system_call_a 5.0 msec 1.3 msec
Cause Maximum Average
EXT3 Creating a file 902.6 msec 351.8 msec
Writing a page to disk 370.2 msec 99.4 msec
EXT3: Waiting for journal access 61.2 msec 61.2 msec
Truncating file 25.9 msec 25.9 msec
do_select core_sys_select sys_select system_call_a 5.0 msec 1.6 msec
At least these snapshots seem to indicate, that a pretty large amount of time is spend in the file system layer. The above snapshots come from the first vanilla 2.6.25 run.
Cause Maximum Average
Reading EXT3 block bitmaps 1113.8 msec 172.2 msec
EXT3 Creating a file 547.8 msec 24.7 msec
Writing a page to disk 535.6 msec 53.2 msec
Truncating file 509.0 msec 509.0 msec
EXT3: Waiting for journal access 348.3 msec 348.3 msec
Writing buffer to disk (synchronous) 126.6 msec 126.6 msec
Reading EXT3 indirect blocks 108.3 msec 50.8 msec
Creating directory 38.3 msec 32.0 msec
This snapshot from the Gentoo 2.6.25 run seems to point in the same direction.
With the following snapshots, the situation is not quite as clear, since this output comes from latencytop 0.4, while the other were from 0.3. Apparently the output format was changed.
Cause Maximum Percentage
sync_page sync_page_killable __lock_page_killable 973.7 msec 16.3 %
sync_buffer __wait_on_buffer bh_submit_read read_b120.3 msec 0.5 %
Scheduler: waiting for cpu block_write_begin ext3_ 26.0 msec 5.5 %
hrtimer_nanosleep sys_nanosleep system_call_after_ 5.0 msec 15.5 %
futex_wait do_futex sys_futex system_call_after_sw 5.0 msec 2.9 %
do_select core_sys_select sys_select system_call_a 5.0 msec 54.9 %
do_sys_poll sys_poll system_call_after_swapgs 5.0 msec 4.2 %
blk_execute_rq scsi_execute scsi_execute_req sr_te 1.9 msec 0.1 %
blk_execute_rq scsi_execute scsi_execute_req scsi_ 1.8 msec 0.0 %
Cause Maximum Percentage
sync_page sync_page_killable __lock_page_killable 1133.1 msec 17.9%
Scheduler: waiting for cpu 18.2 msec 5.0 %
futex_wait do_futex sys_futex system_call_after_sw 5.0 msec 3.0 %
do_select core_sys_select sys_select system_call_a 5.0 msec 58.7 %
do_sys_poll sys_poll system_call_after_swapgs 5.0 msec 5.1 %
hrtimer_nanosleep sys_nanosleep system_call_after_ 5.0 msec 10.1 %
blk_execute_rq scsi_execute scsi_execute_req sr_te 2.2 msec 0.1 %
blk_execute_rq scsi_execute scsi_execute_req scsi_ 1.8 msec 0.0 %
blk_execute_rq scsi_execute scsi_execute_req sr_te 1.3 msec 0.0 %
I’m wondering, what other people, who are experiencing the same problems, have as their file system layout. Do they have one large root-fs like me (with 300Gb)? Do people without problems have different file systems for os and user data? Could that be an option for me as well?
Currently I’m using an external Esata-drive for all my Audacity. This provides enough of an workaround at the moment.