Documentation for the logdoc module

This document documents the Logical Document module.

Contents

  1. General Description
  2. API Overview
  3. API Details (Doxygen)
  4. Memory Issues
  5. Design Issues
  6. Special topics

General Description

The logdoc contains all code and datastructures of the logical document used in Opera. The logical document is the representation of the markup and data contained in the documents loaded.

The main data structure in the logical document is the document tree which is a tree structure made up of HTML_Element objects which represents the elements in the markup.

logdoc is supposed to support all web standards usable in documents loaded from the web.

API Overview

The logdoc module "exports" functionality in the following categories:

API Details (Doxygen)

API details generated by Doxygen here

Memory usage documentation

Used OOM policies

The logdoc module uses three different OOM strategies, OP_STATUS, TRAP/LEAVE and flagging.

The most common strategy used is the OP_STATUS method. Most functions that allocate memory return an OP_STATUS to indicate wether it was successful or not. That is the preferred choice.
Some functions rely on functions from other modules that uses the TRAP/LEAVE mechanism. In those cases it is sometimes easier to let the method leave instead of using OP_STATUS. Try to avoid that to cases where you have a single point to TRAP the LEAVEs.
During parsing the logdoc module uses the a flag in HLDocProfile that can be set if we run out of memory in an operation during parsing. When the parsing pass is finished we will check the flag and signal OOM error. This is used so that we don't have to check the return value for every function call and to make it easier to continue parsing as long as possible after having trouble allocating (potentially) large blocks of memory.

Who is handling OOM

The only handling of a OOM case in this module is to signal that we have one. There are no active attempts to recover except for not crashing.

Description of flow

The logical tree is contained in a logical document. The logical document is responsible for freeing up the memory used for the logical tree. In cases where there are ECMAScript references to any element in the tree ECMAScript takes over the responsibility of making sure the element is freed.

Heap memory usage

All elements in the logical tree are allocated on the head. There is roughly one element in the tree per element in the source document.
The logical document that holds the tree is also allocated on the heap. It is one per document and frame.

Stack memory usage

The conventional HTML-parser is recursive and the Load() method will call itself after encountering a start tag to parse the content. It will recurse as deep as the number of unclosed open tags in the document. For some malformed documents that can be quite a deep structure and the recursive calls will fill up the stack and Opera will crash.
To fix that problem we will probably at some point rewrite the parser to be iterative instead of recursive. A quick solution is to use the MAX_TREE_DEPTH constant to set the maximum nesting level for the Load() method.

Static memory usage

All global variables are moved to the LogdocModule object. Initialization of const arrays of strings are specially handled for the platforms that doesn't support complex globals.
Logdoc uses quite a few lookup tables for mapping element and attribute names to numeric constants and back. As mentioned above the string tables are initialized in the InitializeL() method of LogdocModule.

Caching and freeing memory

The logical document and the tree is freed from the FramesDocument it belongs to so when the document is cleared from the document cache the logical document is cleared as well.
There are no local caching mechanisms.

Freeing memory on exit

The memory that is allocated in the LogdocModule object at startup is freed on exit.
All memory allocated to logical documents or trees are either deleted when the FramesDocument is or when the ECMAScript objects referencing them are deleted.

Temp buffers

The shared temp buffers in the MemoryManager are used extensively throughout the logdoc module. There are no real checks to assure that they are not used by someone else. The use of those temp buffers is not encouraged and code that uses them should be rewritten.

Memory tuning

There is no real memory tuning.

Tests

None

Design choices

In earlier versions we used to allocate all data used by the logical document by using our own memory allocation mechanism that used large preallocated blocks. These blocks would live as long as the document and would be deleted when the document was thrown out.
When we started to allow pages to modify the tree using script and other mechanisms we realized that that could cause higher memory usage since nothing was deleted until the document was deleted and changes would just allocate more memory without freeing. We then switched to normal new/delete allocation of all data used in the logical document.

Suggestions of improvements

Some changes that should be done:

[footprint] Store string constants as single-byte

A lot of the constant strings used in the module is stored as Unicode strings requiring 2 bytes for every character. Most of them could probably be stored as single-byte characters.

Design Issues

Special topics

Appendix

Scripts

Character entitities

The script entities.pl is used to generate the list of character entities supported by Opera. It uses a list extracted from the HTML 4.01 specification, stored in the file entities.txt, plus a few extras and compatibility aliases listed in the script itself to generate the list in the file entities.inl.

Named colors

The script mkcolornames.pl is used to generate the list of named colors supported by Opera. It uses a list of RGB values, stored in the file colors.hex, to generate the list in the file html_col.h.

Supported tags and attributes

The scripts mktags_html.pl, mktags_omf.pl, mktags_svg.pl and mktags_wml.pl are used to generate the lists of tags and attributes supported by Opera. They use lists of strings, stored in the files tags_html.txt, tags_omf.txt, tags_svg.txt and tags_wml.txt, to generate the lists in the files specified in the script.