Doc

This module consists of code related to the FramesDocument class. That is the top level class for a document. A FramesDocument is handled by a DocumentManager in the dochand module.

History

The doc module in core-2 is what remains of the doc complex which was half of Opera's core including layout engine and document handling. It's purpose is to handle the loading, the state and the operations on one document, as defined as being one resource loaded from an url, with a number of sub resources ("inlines"), typically an html document with images, scripts and stylesheets as inlines.

The main class as seen from the outside is FramesDocument which inherits from Document. So does also HTML_Document which can cause some confusion between those two classes since they're really not related. This is an artifact from the time when Document was designed to be the base class of much more than these two classes. Since then FramesDocument has evolved into being the semi-official external API for a document and HTML_Document being a part if FramesDocument if the FramesDocument represents an HTML document.

Note the use of the words "evolved" and "became". Those are significant as a sign of the lack of design and long term plans for this code. Some cleanup has been done, but a full rewrite isn't planned until core-3 since the current code is both widely used and risky to change.

The position among other modules

The code in doc is used by many other modules, but it's controlled mainly by the DocumentManager and code in the dochand module while getting events from the display module.

It's using mainly the logdoc module to handle html document, the url module to handle loading of resources and the display somewhat to handle framesets and views.

Externally usable classes

See API Documentation for details. The methods that are available are spread out over Document, FramesDocument and HTML_Document without any externally obvious logic and often exists at several of the classes.
FramesDocument
A trash heap of information, data and methods.
Document
A trash heap of information, data and methods.
HTML_Document
A trash heap of information, data and methods.

Externally usable data types

See API Documentation for details.

Memory management

OOM policy

Out of memory handling is mainly handled by propagating OP_STATUS values, or by setting the global "OOM" flag. The main goal is to always be able to load a new page even after OOM.

Heap memory usage

Document objects are allocated on the heap, and while not being huge or many, they might exist in a fair number since they're stored in history and there is one for every frame or iframe. The size isn't really measured and depends some on the features a document uses, but might be in the range of 1-10 KB in addition to the document tree (LogicalDocument in the logdoc module) and the layout tree (owned by the logical tree).

For documents not in history a DocumentState is save which include form data entered by the user. The size of that is in theory unlimited but since only data actually changed will be saved, it's normally not that much. In the expected worst case it will be the size of the text in a wiki article or blog post.

Memory usage for a document is normaly minimized by calling the Free method on FramesDocument which deletes the document tree (and everything it owns), though parts might be kept alive by the script engine. The DocumentState object is kept through Free calls.

Stack memory usage

There might be some recursive methods that might theoretically use all stack, but no such methods are known to have caused problems in the doc module. In widely run code stack memory usage is limited to less than a couple of KB (estimation, not measured).

Memory ownership

FramesDocument owns an HTML_Document which owns a LogicalDocument. Much is owned by the logical document and the HTML_Elements and FramesDocument/HTML_Document only keep pointers to those objects. As such it's important to clear those pointers when objects (often HTML_Elements) are destroyed. That has caused problems historically so care must be taken to not add for instance HTML_Element pointers without a clear cleanup routine.

For an external user, deleting the FramesDocument object should be enough to free all memory allocated by a document with the exceptance of memory owned by the script engine and its garbage collector.

Temporary buffers

The global temporary buffers are avoided, but used in some code used only by the speech feature. Should probably be fixed.

Memory tuning

No specific memory tuning can be done in the doc module. Either there is a document or there isn't and it's up to other modules like dochand or the document cache to decide which documents should exist. If memory in a document needs to be decreased and the document is not displayed, Free() can be called, but that will force a reload (from network or cache) the next time the document is visited if walking in history.

Tests and coverage

To be written, the tests that are. There are no specific selftests to test the code and functionality in the doc module, but every page load in Opera runs through many of the central code paths in the doc module.

Design choices

As written in the history section above, not all code has been designed with any higher level design in mind, so some parts of the code are messy where noone has full knowledge of all sideeffects of changing it. That is one of the reasons for the long term plan to replace the doc code by a cleaned up and redesigned version more adapted to how documents are used today. Such a design exists and is documented on the wiki pages, but is still under development.

Improvements

Most of the document handling code will be rewritten in core-3. Until then only limited cleanup operations will be performed, except for bug fixes, feature additions and all the usual.