Doc
This module consists of code related to the FramesDocument class. That
is the top level class for a document. A FramesDocument is handled by
a DocumentManager in the dochand module.
History
The doc module in core-2 is what remains of the doc complex which was
half of Opera's core including layout engine and document
handling. It's purpose is to handle the loading, the state and the
operations on one document, as defined as being one resource loaded
from an url, with a number of sub resources ("inlines"), typically an
html document with images, scripts and stylesheets as inlines.
The main class as seen from the outside is FramesDocument which
inherits from Document. So does also HTML_Document which can cause
some confusion between those two classes since they're really not
related. This is an artifact from the time when Document was designed
to be the base class of much more than these two classes. Since then
FramesDocument has evolved into being the semi-official external API
for a document and HTML_Document being a part if FramesDocument if the
FramesDocument represents an HTML document.
Note the use of the words "evolved" and "became". Those are
significant as a sign of the lack of design and long term plans for
this code. Some cleanup has been done, but a full rewrite isn't
planned until core-3 since the current code is both widely used and
risky to change.
The position among other modules
The code in doc is used by many other modules, but it's controlled
mainly by the DocumentManager and code in the dochand module while
getting events from the display module.
It's using mainly the logdoc module to handle html document, the
url module to handle loading of resources and the display somewhat to
handle framesets and views.
Externally usable classes
See API Documentation for details. The methods that
are available are spread out over Document, FramesDocument and
HTML_Document without any externally obvious logic and often exists at
several of the classes.
- FramesDocument
- A trash heap of information, data and methods.
- Document
- A trash heap of information, data and methods.
- HTML_Document
- A trash heap of information, data and methods.
Externally usable data types
See API Documentation for details.
Memory management
OOM policy
Out of memory handling is mainly handled by propagating OP_STATUS
values, or by setting the global "OOM" flag. The main goal is to
always be able to load a new page even after OOM.
Heap memory usage
Document objects are allocated on the heap, and while not being huge
or many, they might exist in a fair number since they're stored in
history and there is one for every frame or iframe. The size isn't
really measured and depends some on the features a document uses, but
might be in the range of 1-10 KB in addition to the document tree
(LogicalDocument in the logdoc module) and the layout tree (owned by
the logical tree).
For documents not in history a DocumentState is save which include
form data entered by the user. The size of that is in theory unlimited
but since only data actually changed will be saved, it's normally not
that much. In the expected worst case it will be the size of the text
in a wiki article or blog post.
Memory usage for a document is normaly minimized by calling the
Free method on FramesDocument which deletes the document tree (and
everything it owns), though parts might be kept alive by the script
engine. The DocumentState object is kept through Free calls.
Stack memory usage
There might be some recursive methods that might theoretically use all
stack, but no such methods are known to have caused problems in the
doc module. In widely run code stack memory usage is limited to less
than a couple of KB (estimation, not measured).
Memory ownership
FramesDocument owns an HTML_Document which owns a
LogicalDocument. Much is owned by the logical document and the
HTML_Elements and FramesDocument/HTML_Document only keep pointers to
those objects. As such it's important to clear those pointers when
objects (often HTML_Elements) are destroyed. That has caused problems
historically so care must be taken to not add for instance
HTML_Element pointers without a clear cleanup routine.
For an external user, deleting the FramesDocument object should be
enough to free all memory allocated by a document with the exceptance
of memory owned by the script engine and its garbage collector.
Temporary buffers
The global temporary buffers are avoided, but used in some code used
only by the speech feature. Should probably be fixed.
Memory tuning
No specific memory tuning can be done in the doc module. Either there
is a document or there isn't and it's up to other modules like dochand
or the document cache to decide which documents should exist. If
memory in a document needs to be decreased and the document is not
displayed, Free() can be called, but that will force a reload (from
network or cache) the next time the document is visited if walking in
history.
Tests and coverage
To be written, the tests that are. There are no specific selftests to
test the code and functionality in the doc module, but every page load
in Opera runs through many of the central code paths in the doc
module.
Design choices
As written in the history section above, not all code has been
designed with any higher level design in mind, so some parts of the
code are messy where noone has full knowledge of all sideeffects of
changing it. That is one of the reasons for the long term plan to
replace the doc code by a cleaned up and redesigned version more
adapted to how documents are used today. Such a design exists and is
documented on the wiki pages, but is still under development.
Improvements
Most of the document handling code will be rewritten in core-3. Until
then only limited cleanup operations will be performed, except for bug
fixes, feature additions and all the usual.