Strings module: string database

Copyright © 1995-2011 Opera Software ASA. All rights reserved. This file is part of the Opera web browser. It may not be distributed under any circumstances.

Introduction

Opera uses the OpLanguageManager interface is used for handling locale strings, as described in the documentation for the Locale module. The string that can be accessed through this interface are defined by the string database, english.db residing in this module.

The language database

The canonical source of language strings in Opera is the english.db in this module. This is the only file to which you should make changes if you want them to carry through.

You can use the updatedb.pl (see the "Scripts" section below) for a semi-automated method of adding strings.

Remember that is not allowed to hard-code user-visible text in the Opera core source code!

Identifier format

Each string defined in the database is assigned an identifier, which is used to refer to this string from the source code. This identifier must follow the pattern described below. Please note that only the string name may be referenced by core code, not the associated number (see below).

There are three logical sections in the database, in which the strings are designated by their initial letter. Please note that they are stored in the same section in the generated language file, so identifiers must be unique throughout all sections.

Naming conventions
Initial Use
D Dialog titles and components
S General strings
M Menus

Please notice that some strings have an I as the second letter, these were imported from Opera 6 or earlier.

New strings are to be named on this form:

X_DESCRIPTIVE_TEXT

Where X is one of the initial letters as described above, and DESCRIPTIVE_TEXT is a short text describing the use of the string. Please note that the identifier must be unique.

Required information

For each string, three different components must be defined in the english.db file, the fourth (description) should be defined. These are listed in the file on the form

identifier[.component]=data

The first entry is named only after the identifier, and does not have a component part. It is listed as none below. Please note that all these components are required for new strings.

Database entry components
Name Description
none The old unique number associated with this string (see below).
caption The actual language text associated with this string.
description A comment about this string to aid translators.
cformat Enable/disable c-format parsing for the string (optional).
scope A list of scopes to aid translators

The components are described in detail below.

Old unique identifier number (obsolete)

Previous versions allocated a hardcoded identifier to the string when it was created, but this system is no longer in use. New strings use only the new identifier, which means that you should assign the special identifier value of -1 to these strings. This will suppress the string from being output when building language files for older formats.

To keep the size of the language files down, the identifier string is inserted into an enumeration which is used run-time. This means that, at run-time all strings are identified by an integer value (the value of the enumeration). The numeric identifier is created by applying a djb2 hash on the name of the string, creating a unique number (see also the next section).

Whenever a string is changed in an incompatible way, the unique number must be changed, which means that a new string name must be created. The new string receives a new identifier, and the old string is kept so that translated versions etc. can be built for it.

The identifier name is the only one used in the Opera source. The makelang.pl script, described below, is used to create the required code to map between the two (see below for more information).

Please note that not all platform's implementations of OpLanguageManager necessarily use the hashed identifier number, especially this applies to some embedded platforms. This is why only the enumeration value must be used in the Opera source code.

Caption

This is the actual text that is to be displayed (and translated). This string should be written according to the Style Guide.

The language database is encoded using UTF-8, but most strings should be restricted to basic latin ("ASCII") text.

Description

A short and descriptive comment. This comment is visible in most of the generated language files, and in the translation files that are sent to the translators. To avoid errors and misunderstandings in the translations.

Runs of several strings with the same comment will be grouped in the language file, as long as they are kept together in the database. It is always a good idea to keep related strings together, to make it easier for translators to spot them.

Cformat

This optional parameter can be used to disabled c-format parsing for strings that the makelang.pl parser incorrectly triggers as being c-format (printf style). If you find that the parser does not flag a string as c-format where it should, that should be filed as a bug.

Scope (translator hints)

The scope parameter is used as a hint to translators to indicate which platforms use different strings.

The current scopes are listed below. A full list can also be found in makelang.pl.

String scopes
Scope Description
Core Components
base A string always required by Opera core
coremswin MSWIN-specific string in core code
corequick Quick-specific string in core code
java A string used by Java support
fileuploadA string used by file upload functionality
externalappsA string used by support for external applications
wml A string used by WAP WML support
ssl A string used by SSL support
ftp A string used by FTP support
ecmascriptA string used by Ecmascript support
operaurl A string used by opera:* (about:*) URL support
xml A string used by XML support
prefsdownloadA string used by preference download support
webforms2 A string used by Web Forms 2 support
ssp A string used by Site Specific Preferences or User Profiles
internal A string not meant for public builds
Non-core Components
ui A string used the Quick user interface
m2 A string used by the M2 client
im A string used by the IM client
voice A string used by the voice browser
Platforms
mac A string used by Opera for MacIntosh
qt A string used by Opera Qt
windows A string used by Opera for Windows
kyocera A string used by Opera for Kyocera
symbian A string used by Opera for Symbian OS
nanox A string used by Opera for NanoX
brew A string used by Opera for BREW
ezx A string used by Opera for EZX
juix A string used by Opera for Juix
pp A string used by Opera for Powerparts
Deprecated scopes
unsorted An unclassified string used by Opera core code

A string may list several scopes (comma separated), which means that it is valid in ALL of these scopes. Thus, a string defining the scope "mac,windows" is used on both MacIntosh and Windows.

Scopes should be added for all components that can be taken out of the program.

To create a new scope, edit the makelang.pl script, described below, to list your new scope, else rebuilding the language files will fail. Look for the variable %scope in the script and add your new scope in the appropriate place.

Adding strings or updating existing ones

Please note that only the module owner may add strings to the work branch of the strings module. More information is available in the main documentation.

To add new strings to the database, please read the information above on the format of the database and then edit the english.db file appropriately. You can also use the updatedb.pl script, described below.

Updating existing strings is allowed to correct spelling errors and grammatical mistakes. It is not allowed to change a string to mean something else, once an id and number has been allocated to a string, that id and number may never be used for something else. For the same reasons, strings should not be removed from the database, they should rather be "abandoned" so that they are not included in the language files that are built. Abandoning strings is done by removing the references to them from the module.strings files.

To check that the database format is valid, run makelang.pl.

Database version

To make sure the correct version of a language file is used with Opera, LanguageManager uses a database version number. Whenever a new release is made of the language database, the database version number must be increased.

Scripts

See also the corresponding section in the main documentation.

If you modify the scripts, there is a selftest script in the selftest subdirectory that can be used to verify that there are no regressions.

updatedb.pl

This script is used to add new strings to the string database. It provides a slightly less user-hostile way to add them than editing the database file manually. It lets you interactively enter new strings, and performs some basic consistency checks. It also allocates the new string id numbers automatically, and updates the database version for you when it writes the file out.

makelang.pl

This script serves several purposes. For a list of valid parameters please issue the command

perl makelang.pl -?
  1. To generate the source code header files required to compile the Opera source code, since the source code need to be able to reference the strings that are listed in the language database.

    In core-2 and later, the script is run automatically from the operasetup script from the hardcore module, which generates a build.strings file appropriate for your build and feeds it to this script. You will need to keep a copy of the build.strings for building language files (see below).

    When the script has been run by the build system, the following files include files used by the locale module and the code in Opera that accesses language strings are created:

    FileDescription
    modules/locale/language-dbversion.h Header files used for accessing information from the language database in Opera.
    modules/locale/language-enum.inc
    modules/locale/language-map.inc

    None of these files are checked in to the CVS. The internal test version of the language file can be used for debugging builds, it is not a proper language file for distribution.

  2. To generate platform-specific output formats to allow the integration into various platforms preferred format for string resources and/or translations.

    See documentation on the -l (ell) command line parameter for more details on what formats can be created.

  3. The makelang.pl script is also used to generate the run-time language files, either the English version embedded in the english.db file, or a translated version by specifying a PO file on the command line. This usage is described in the translations module documentation. Please note that you will need to keep a copy of the build.strings file (core-2 and later) for reference when creating the corresponding language files, including the English one, since it contains the actual list of language strings used for your build.

  4. To generate translation template files that are sent to translators. This mode is usually not invoked directly, but instead the makepot.pl wrapper script from the translations module is used. Please see the documentation for this script for more details.

updatescope.pl

This script takes a string list for a build (the build.strings file) and tags all the strings listed in it with the given scope. This can be used by translators and the strings module maintainer to get a hint on where and by whom a certain string is used.

makespell.pl

This script creates a text file with just the captions from the database, to make it possible to run a spelling checker on the contents.