Copyright © 1999-2012 Opera Software ASA. All rights reserved. This file is part of the Opera web browser. It may not be distributed under any circumstances.
For platforms with specific footprint requirements, you may want to
re-implement the converter code.
By setting FEATURE_TABLEMANAGER to NO you can
remove the code for the internal converters that are table-driven.
A few algorithmic converters will still be included,
please see below.
When disabling support for the table driven converters, the default
factories will be disabled, which means that you will need to implement
them for your platform.
There are two factories:
InputConverter::CreateCharConverter_real() creates instance of
InputConverter, i.e decoders, and
OutputConverter::CreateCharConverter() creates instance of
OutputConverter, i.e encoders.
The Opera core code assumes that when it requests a decoder for
iso-8859-1 it does instead get one for
windows-1252, as there are several web sites that are
mislabeled.
However, when requesting an encoder for iso-8859-1,
it must receive such an encoder, and not the 1252 variant.
The decoder for UTF-8 must support passing NULL
as the destination buffer parameter, and then return the number of bytes
needed to perform the conversion.
The Opera core implementation for UTF-8 does this, and it is
recommended that this implementation is
used.
See the Wiki page for various other quirks that are implemented in the Opera encodings support, together with the reasoning behind them. It should not be necessary to mimic most of those in re-implementations.
For best results, the platform converters should be forgiving for input errors, as Opera will most certainly encounter pages with mis-identified encodings, or containing garbage data. Opera converters never throw exceptions (leave) or give up when encountering faulty data, they simply flag these conversion errors using the internal APIs and continue as if nothing happened. Since it is the converters themselves that have knowledge on what proper data would look like, they are much better at performing error-recovery than the client code.
Converters inherit from the generic CharConverter interface
via InputConverter and OutputConverter classes.
Because their fields of use are slightly different, the set of additional
APIs differ between them.
To properly support FEATURE_USE_ENTITIES_IN_FORMS or
API_ENC_UNCONVERTIBLE, your encoders will need
to identify missing codepoints and to signal them using the proper APIs.
For stateful encodings, you will need to switch back to ASCII before
outputting an entity.
The feature can safely be turned off if you do not support it; using entities
in forms is a non-standard extension.
Care must be taken to support the reporting of unconvertible characters, as
this might be enabled by enabling other features or tweaks.
It is known to be imported by API_XMLUTILS_XMLTOSTRINGSERIALIZER and Opera
Mail.
No matter what, the encodings will supply converters for converting from
and to ISO 8859-1, UTF-8 and UTF-16.
There is also an implementation of UTF-7 that will be included
if the corresponding API_ENC_UTF7 is enabled (please note that
UTF-7 is not used on web pages, but may still be used in email
and other contexts).
All these conversions are pure algorithmic, and are the recommended
converters for use even when platform converters are used.