This document describes the binary file formats introduced with Opera version 9 for various files used for indexing/searching.
The new format of these files is based on dividing files into blocks of same size. The blocks for one record might be chained if the size of one block isn't enough for the record. This way it's easy to perform transactions on the files.
The format is NOT backwards compatible with any previous Opera versions. Conversions to the new format are performed after an update from an old version.
Files consist of chained blocks of the same size. The first block contains headers. Structure of the first block is:
| 8B | pointer to the first free block or 0 if there isn't any free block |
| 4B | 3154051328, magical number used for identification of the file |
| 4B | size of the blocks |
| optional headers may follow |
The other blocks may either contain valid data or they may be empty. The last block is never empty. Their structure is:
| 8B | pointer to the next block or 0 if this is the last block in the chain |
| 4B/0B | size of the current record, this field is present only in the first block of the chain |
| data follow |
These files contain words and with pointers to to .bt files. The records in them are of fixed size, they always fit to one block. The record is an array of fields. The structure of the field is in table bellow.
| 4B/8B | pointer to the next record (one of DiskChild or MemoryChild is set) or offset (both flags DiskChild and MemoryChild are 0) in the current record | |||||||||||||||
| 2B | flags and parent of the current node;
value parent is 0 if the parent is in another branch;
bits for flags are:
|
|||||||||||||||
| 4B | word ID, valid if flag IsWord is set |
The first field of the array contains number of not free fields in its word ID part and the rest of it is unused.
There is an optional header at offset 16. Its size is 19*4B. It is used to save a status of a generator of pseudo-random numbers.
These files contain BTrees of file IDs containing the appropriate word. The word ID from ACT is a number of the first block in a chain of file IDs for the word. File IDs are 32 bit integers. The block structure contains 32 bit number of the rightmost child followed by pairs of file ID and pointer to its left child. The pointers to children are positions of the children in the file divided by size of the blocks.
There are several BTrees in the file. There is no indication of roots, but the roots are pointed from the .act.