Performances Improvements

The problems

The Core 2.2 cache experienced several performance problems:

So basically all the problems were due to the disk activity.

A faster index

The logic related to the index read and write has been rewritten. In desktop it means roughly a 2X improvement in read and 10X-15X in write. Also the use of OpSafeFile has been dropped by default, because it had a huge impact on performancies.

TWEAK_CACHE_FAST_INDEX and TWEAK_SAFE_FILE_INDEX control this feature.

Embedded files

It seems that a lot of the files usually present in the cache are very small. It is not uncommmon to have 50% of the files smaller than 2KB, or 20-25% smaller than 512 bytes. These files were managed in a very inefficient way, using one file each.

The new logic keeps these files in memory, and embed them directly in the index. This saves a interesting amount of space on disk, and of course improve the performances. Reading and writing the index is slower, but it should not have a big impact. The memory consumption is increased.

Even if the platform has not a lot of memory, it should be interesting to experiment with small values. For example, 15%-20% of the files could be smaller than 256 bytes, but they probably account just for 0.1%-0.2% of the cache size. So, assuming a 20 MB cache, maybe you could save 15% of the accesses just using 32KB of RAM.

TWEAK_CACHE_SMALL_FILES_SIZE and TWEAK_CACHE_SMALL_FILES_LIMIT control this feature.

Container Files

It is not uncommon to have 80%-90% of the files smaller than 16KB. Combining this files in a bigger file is a good way to reduce wasted space and also try to reduce the disk activity.

The current implementation puts in a container only files from the same site, betting on the probability to access them more or less in the same order as they were written. Several containers are kept in RAM at the same time, to increase the possibility to reuse them.

One draw back of the current implementation is that when a URL is deleted, all the container is deleted, and it can lead to a situation where more files than expected are deleted when the disk cache size is checked and enforced.

It should prove quite effective when the Operating System lacks a real Write Back cache.

In any case, this feature is usefull to greatly reduce the number of files, and leverage the fact that in several platforms accessing a 16KB file is more or less as fast as accessing a 2KB one. It can also help if there is not a lot of memory to use for embedded files.

TWEAK_CACHE_CONTAINERS_ENTRIES, TWEAK_CACHE_CONTAINERS_CONTAINER_LIMIT, TWEAK_CACHE_CONTAINERS_FILE_LIMIT, TWEAK_CACHE_CONTAINERS_BUFFERS control this feature.

This feature proved to be a bit more critical than the others, so if you are experiencing stability problems or an inconsistent behavior of the cache when the size is forced, try to disable this feature or use less agressive settings. Of course any stability problem need also to be notified to che module owner.

Multiple directories

Storing a large amount of files in a single directory is not good from a performance (and in some case stability) point of view. Also for the user is very slow to check what happens.

Now the cache can be split in several directories, called generations.

TWEAK_CACHE_MULTIPLE_FOLDERS controls this feature.

Faster synchronization

Containers and embedded files are great to reduce the number of files (oBench uses 80% less files with the standard desktop settings), but now there is also a possibility to speed up even more the synchronization.

A file is now written on disk to check if a crash happened after some cache activity, so that the synchronization is done only if the index content can be different than the disk.

TWEAK_CACHE_SYNC controls this feature.

Impact of OBML

When OBML is used, few big files are generated, so a lot of these improvements are negated.

Impact of Opera Turbo

With Opera Turbo, the number of files does not change, but they tend to be smaller. So embedded files and containers should prove even more effective.

Impact on the battery life

These improvements mean less CPU usage and less disk activity, so the power usage of the cache is supposed to decrease.