Multimedia Cache
Multimedia Content
We wanted to improve as much as possible the experience of the users watching a video (a movie, but also a YouTube-like video) or listening to some audio file. We identified these requirements:
- Support for multiple, out of order, segments (a segment is a part of the file), so the user can move backward and forward in the video without having to download multiple times the same content.
- Persist these segments in the cache, so they can be available even across reboots.
- Support for a size limit.
- Support for a way to identify the "coverage" of the content, to be able to show graphically to the user what parts are still available.
- A crash should have as little as possible impact on the cached content. A corruption is not really acceptable, but the loss of one segment could be tolerated.
One limitation of the current solution, is that we can only have a finite (and relatively small) number of segments.
Also, the current solution does not allow override of content already download. This should really not happen. If the file changes, the cache object should be recreated.
How to create a Multimedia Cache Storage
It is usually better to have a dedicated context for Multimedia files, to not disrupt the work of the normal disk cache. The suggested method to create the context is using Context_Manager_Multimedia::CreateManager().
The URLs are managed as multimedia when the Cache Storage back-end is of type Multimedia_Storage. This requires that the URL has the URL::KMultimedia attribute set to TRUE, and it should happen as soon as possible (before the Cache_Storage object has been created).
How to download part of a file
To specify what part of the file to download, the URL::KHTTPRangeStart and URL::KHTTPRangeEnd attributes need to be set. The expectation is that the code using URL will download at least 1 or 2 KB of data at a time. Downloading very small parts can affect performances, as depending on the situation, a disk operation could be required.
How to retrieve the coverage
To know if a part of a file is available in the cache, the suggested method is URL::GetPartialCoverage().
In some cases, it could be required to fully download the URL, for example to save it on a file. The method to use is URL::GetNextMissingCoverage().
The complete situation, with all the segments available, is exposed by URL_DataDescriptor::GetSortedCoverage(), that is the right function to use to provide a UI feedback of what has already been downloaded.
The internal object that exposes this methods, is Cache_Storage.
How to retrieve the content
Users of URL can use URL_DataDescriptor as usual, but URL_DataDescriptor::SetPosition() has to be called to choose the starting byte. Of course it is important that the range requested is available.
For the internal objects (not intended for URL users), the method AccessReadOnly() has been added to File_Storage; it returns an OpFileDescriptor object.
For Multimedia content, this object is a logical view, so the content can be accessed in order (or out of order), without worrying about the physical disk position of the requested byte. As usual, the caller must ask only for bytes already present on the cache.
Some examples
The best place to get working examples is probably modules/cache/selftest/cache_download_multimedia.ot.
In any case, this is a short "not working", incomplete example, with just the critical parts:
// Create the dedicated context
Context_Manager_Multimedia::CreateManager(ctx, multimedia_folder, multimedia_folder, FALSE, PrefsCollectionNetwork::DiskCacheSize);
// Deal with Multimedia content (it should be set as soon as possible, before the Cache_Storage object has been created)
url.SetAttribute(URL::KMultimedia, TRUE);
// Asks for a partial download, after checking with URL::GetPartialCoverage() if the range is already (even only partially) available
url.SetAttribute(URL::KHTTPRangeStart, &start);
url.SetAttribute(URL::KHTTPRangeEnd, &end);
// [...] Start downloading the content with URL::LoadDocument()
// Retrieve the bytes required (typically in response to a MSG_URL_DATA_LOADED message)
URL_DataDescriptor *dd=rep->GetDescriptor(NULL, URL::KNoRedirect, TRUE);
...
if(OpStatus::IsSuccess(dd->SetPosition(move_pos))) { ... /* Call RetrieveData() and GetBuffer() */ }
// Get the segments available on the cache
OpAutoVector sort_seg;
op_err=dd->GetSortedCoverage(sort_seg);
Streaming
On devices that have limited disk capacity, viewing big videos can be problematic.
To solve this problem, the Multimedia Cache provides a streaming mode, meaning that a fixed amount of space is used to cache (in practice, to buffer ahead) part of the content.
Alternatively, the streaming can also be performed in RAM (but DISK_CACHE_SUPPORT is still required, even if the disk cache itself could be disabled via preference).
To activate the streaming directly from a "low level user" perspective:
- MultimediaCacheFile::ConstructFile() or MultimediaCacheFile::ConstructMemory() need to be called, specifying a maximum file size
- MultimediaCacheFile::ActivateStreaming() has to be called
These details are automatically managed by the cache subsystem (in particular by the class Multimedia_Storage), provided that API_MULTIMEDIA_CACHE_STREAM is imported.
The following criterias are used:
- If the disk cache is disabled, the streaming is always activated, and it will happen in RAM
- Based on PrefsCollectionNetwork::CacheHTTPS, HTTPS can be streamed in RAM or not
- URLs with KCachePolicy_NoStore set, will be streamed
- If PrefsCollectionNetwork::MultimediaStreamAlways is TRUE, streaming will always be used
- If PrefsCollectionNetwork::MultimediaStreamRAM is TRUE, when the streaming is used, it will happen on RAM
- PrefsCollectionNetwork::MultimediaStreamSize provides the size of the file/buffer used to stream
- When streaming, unique URLs should be used. While the cache enforce this by default, the user code should try to create unique URLs as soon as possible. The cache exports the internal logic via the Multimedia_Storage::IsStreamRequired() static method. Please be aware that this information is not 100% correct, as the Cache-Control header (possibly not available when the user calls IsStreamRequired()) can still force streaming at a later stage.
Please note that URLs streamed are meant to be unique.
The cache will not be reused anyway, as for performance reasons the file is not kept in a consistent state.
A new method is exposed by URL, to provide more information on the cache storage:
void URL::GetCacheInfo(BOOL &streaming, BOOL &ram);
The devil is in the details
A key concept when streaming is that the cache has a maximum size limit, and when the cache is full no new data can be written.
So it is vital to have a mechanism to free space, or to "consume" bytes. For performance reasons, consuming bytes does not recover space on disk, and does not move data, but it anyway makes room for new bytes.
From a user perspective, the cache mainly provides two automatic mechanisms for consuming bytes:
- consume on write (ConsumePolicy::CONSUME_ON_WRITE)
- consume on read (ConsumePolicy::CONSUME_ON_READ)
The preferred way is "Consume on write", as it saves as much bandwidth as possible, and it also forces the caller to take full responsibility of the buffering logic.
This is also the method used automatically by the cache if API_MULTIMEDIA_CACHE_STREAM is imported.
With "Consume on read", the bandwidth consumption can slightly increase (due to an internal operation that can drop more bytes than really required), but the caller module can somehow try to use the information provided by the cache to manage the buffering.
So while "Consume on read" can potentially simplify a bit the logic of the caller, it is discouraged.
ActivateStreaming() enables the streaming and let you choose the logic to use.
Consume on write
When "Consume on write" is enabled:
- When the cache is full (which after a while will always be true), a write operation consumes bytes at the beginning of the portion of content in cache, replacing immediately those bytes with the new content.
- The streaming effectively disable the multi segment support, even if internally the classes use it.
- If during a write operation, a seek is performed (maning that the content is not appendend to the end of the stram buffer), the whole content is deleted.
Consume on read
When "Consume on read" is enabled:
- When a read operation is performed, the bytes read are "consumed", so they will no longer be available, freeing space for new content.
- In reality, by default these bytes are still available and the cache will try to reuse them if the caller asks for them again. SetEnableEmptySpaceRecover() can change this behaviour.
- When the file/buffer reach the maximum size, no more writes are allowed, and a read operation is required to free some space. This imply that the user (for example the media module) needs to be aware of the size limit, and limit the read ahead, to avoid downloading bytes that cannot be stored.
- The streaming effectively disable the multi segment support, even if internally the classes use it.
- If during a read operation, a seek is performed (meaning that the content is read not from the beginning of the stream buffered), all the bytes skipped are lost, unless the empty space is recovered (default). Under some conditions, a part of these bytes can be lost anyway.
- If during a write operation, a seek is performed (maning that the content is not appendend to the end of the stram buffer), all the content is deleted.
Streaming with multiple segments
When streaming, if a seek operation outside of the cached content is performed (e.g. the user move to a point of the video that is not in the cache), the cache drops all the bytes.
It would be nice to keep them around and potentially save some bandwidth. This basically means streaming with multiple segments (even if internally the cache when streaming uses two segments, it's an implementation detail, and from a logic point of view it uses a single segment).
Streaming with multiple segments is not supported at the moment, but it would be relatively easy to add this functionality. The main problem is deciding which bytes to consume, as there would be multiple segments around, but it is feasible.