Supported protocols:
API documentation generated by Doxygen contains all necessary information for the external APIs.
URLs are contained in URL objects, and represents the entire documents. All actions involving URL are initiated through the URL classes API.
A URL object consists of two pointers, one to a URL_Rep object that contains all the URL informations, and one to a URL_RelRep object (owned by the URL_Rep object) that identifies the fragment identifier part of the URL (the "#name" part of the URLname). Refernece counters ar updated in these objects each time a new URL object is created or destroyed.
URL objects are primarily created by the g_url_api->GetURL() function call, which
have several options, such as creating a completely new URL, or a URL referenced relative to an already known URL.
Additionally, URL constructors can be used to create "copies" of URLs (they are not copies, just references), or adding fragment identifiers ("#name").
Documents intending to use or display the data contained by a URL must lock it, using a URL_InUse object, which will prevent destruction of the URL's data while it is being used.
There are two types of URLs, normal URLs, that are always retriveable through the cache and visited URL list, and unique URL that are not accessible through the cache and visited URL list. Unique URLs are primarily used for POST form requests and are created the same way as normal URLs, but with the unique flag set to TRUE, and unlike normal URLs cannot be accessed through new calls to g_url_api->GetURL(), and new references can only be created by creating copies of a URL object. When no document locks a unique URL the cached document is deleted, and when no URL objects references it, the URL_Rep is destroyed.
There are several fuctions available to load URLs
The preferred function is URL::LoadDocument, which will perform all necessary cache validation checks before deciding how to load, based on a caller specified policy. The other functions are now deprecated.
To resume a download URL::ResumeLoad is used. this fucntion will, if possible restart the load on the location where the loading was aborted previously.
Additionally, it is possible to create URLs that are created using the URL::WriteDocumentData() functions, but these should be used only in special cases, like email and news decoding.
A loading URL can post several messages, which the caller must listen for.
The messages are on the form (msg, par1, par2). par1 is ALWAYS the Id() of the URL posting the message (retrived by URL::Id() ), par2 depends on the message (msg) posted.
The data from a URL are retrived by requesting a URL_DataDescriptor object from the URL by calling URL::GetDescriptor. The data can be retrived in binary raw form, without content-encoding (in case they are compressed) or in UTF-16 form (converted from the document's original encoding).
Datadescriptors can be message driven (where messages are posted if there are more data), or polling based (where it is the caller's responsibility to retrieve all pending data).
A number of informations attributes exists inside a URL object, while some of these can be retrieved by dedicated functions, the primary API for retrieving and updating the attributes are the GetAttribute/GetAttributeL and SetAttribute/SetAttributeL functions.
These functions takes as argument an enumerated value that selects the actual attribute, and either returns the value, or updates it, depending on the function.
The enumerated values are grouped into lists depending on the type of data the corresponding attribute is representeted as: Unsigned integers (including enums and signed integers, which must be typecasted), strings, URLs and general "void" pointers (which must be type casted). Strings can be retrieved both as const strings (with a few exceptions, and be copied into a separate OpString* object
URL::KNameOfAttribute Most of the names are on the same for as the previous function, so GetNameOfAttribute(); can be replaced by [(typecast)] GetAttribute(URL::KNameOfAttribute);. The enums are arranged in groups concerning unsigned integers, strings, URLs and arbitrary typed void pointers.
const OpStringC* objects, that may be accessed directly. The results from the version returning unsigned ints must be typecasted to the appropriate type of the attribute before being used. (This approach was chosen to prevent having to created too many implementations of this function). The default return values unless specified otherwise are 0 (for integers), empty string (for string attributes), empty URLs (for the URL attributes) and NULL (for the void pointer attributes). Optionally, these fucntions can follow redirects.
URL url;
URLStatus status1 = (URLStatus) url.GetAttribute(URL::KLoadStatus); // Get the load status of the URL
URLStatus status2 = (URLStatus) url.GetAttribute(URL::KLoadStatus, TRUE); // Follow the redirect chain and get the load status of the URL at the end of the chain
OpStringC8 name1 = url.GetAttribute(URL::KName_Escaped); // Access the %XX escaped name of the URL as a const string
OpStringC name2 = url.GetAttribute(URL::KUniNamed); // Access the UTF-8 deescaped name of the URL as a const string
OpString8 name3;
url.GetAttributeL(URL::KName_Escaped, name3); // Retrive the %XX escaped name of the URL and store it in the "name3" string object
OpString name4
url.GetAttributeL(URL::KUniNamed, name4); // Access the UTF-8 deescaped name of the URL and store it in the "name4" string object
url.SetAttributeL(URL::KLoadStatus, URL_LOADED); // Set the load status of the URL
url.SetAttrubuteL(URL::KMIME_ForceContentType, "text/plain; charset=iso-8859-1"); // Force the MIME-type (and in this case, charset), of the URL
g_url_api is primarily used to construct new URLs, but also contain some cookie releated, and some UI action functions.
ServerName objects contains information about:
API documentation generated by Doxygen contains information about the internal organization of the module.
General layout of the URL_Rep and related classes (The list is abbreviated):
The URL manager maintains the following
The module is fairly large, as it requires a lot of functionality.
Various features can be enabled or disabled, either thorugh feature defines or specific defines, one example is the HTTP stack.
Due to the requirements from various modules (including the url module) and platforms it is very difficult to reduce the footprint
Most of internal module functions handles OOM locally, and signals an OOM by raising the OOM signal in the memory manager, and aborts the current action. If appropriate a message is posted to the document.
However, much of the public API is now LEAVE based, and in those cases the caller must TRAP errors and handle them. Some internal functions will also LEAVE, but these are TRAPed internally
In the case of LEAVE functions the caller must TRAP the errors, and handle the OOM situations. In the case of the internal functions these usually aborts their operation with an error message, and a raised status flag, which must be handled either by the caller, or the document.
Much of the module is message callback based, and these functions are not able to report OOM situations directly to the documents or UI. In these cases the current operation will be terminated, and errormessages sent.
Much of the external API is based on direct calls, but some classes do use virtual fucntions. In many cases these are LEAVE bases, and callers must TRAP them and handle them appropriately.
NOTE: these numbers tend to be estimates, not actual measurements
Unloaded URL will usually consume approximately 40 bytes, plus the URL's path segment
Loaded URL_Reps will probably,on average, use 300-400 bytes, depending on the lengh of the URL's name. URL_Reps that uses RAM cache will additionally store the entire document in RAM.
ServerName objects will usually consume less than 200 bytes per unique servername, but actual consumption depends on servername size, and to what extent authentication and secure session information is used (session information can consume at least 1 KB per port, depending on the certificate and encryption key sizes).
Cookies can consume up to 4 KB per cookie, but should usually average less than 300 bytes.
Sequence splitter and upload elements are usually not kept for long, and their allocated size depends on the number of elements and actual bodysize.
Usually large objects are allocated. In some cases sizeable objects are placed on the stack but only for shorter periods.
In most cases stack consumption should be less than 300 bytes.
The module uses several global pointers, and several static members. These are, for the most part pointers:
Several of these are buffers that (alongside buffers in URL_Manager) will grow as longer URLs are encountered.
Most of the allocated objects are dleted by URL_Manager or URL_API on exit
In addition a number of compiled const arrays exists. These may be automatically converted to allocated arrays on some platforms.
There are calls to free unused resources on URL_Manager and URLs that can be called when needed by the memory manager.
Additionally, the URL_Manager, either directly or through the Cache_Manager (from the cache module) keeps the number of URL, ServerNames, connections, cookies. etc. within the total number and size limits specified.
URL_Manager and URL_API destroys all allocated URLs, connections etc.
Additionally, several places use the memory Manager's tempbuffers, primarily TempBuf2 and TempBuf2k
There is no check for external use of these buffers, and the different buffers should prevent internal collisions, unless implementations also use them in calls to/from these functions.
At present there are no opportunities to tune memory use.
Selftests, but they do not check memory usage.
Selftests, ordinary surfing.
URL_Rep, URL_DataStorage and several other classes are independent objects owned by other objects to reduce the use of unnecesarily large objects. Common information about scheme/servername/port is stored in a single database linked from the URLs.
Possible improvements