cppgen - Code generator

Introduction

cppgen is a Python package which takes care of generating the necessary C++ classes/files from the proto files. The package is designed to be reused in custom tools but is also integrated with the operasetup system. This makes it easy to use proto files as the C++ files are generated when a new build is started.

Requirements

cppgen requires Python 2.4 or higher (untested in Python 3), no external libraries are needed

Structure

The Python package is located in modules/protobuf/cppgen. The main starting point is in the script.py file, both the CLI script and the operasetup call happens via this file.

The files in the package are:

Additionally the package also contains selected files from the external tool called hob, these files handles parsing of the .proto files into Python structures. Care must be taken to avoid modifying these files, if they are modified then the updates must also be applied to hob. These files are located in the ext/hob subfolder.

The tests subfolder contains unittests (run by nose) but is currently outdated.

Running

The generator can be started in three ways. The first is by calling the script.py file using Python, it has several options, use -h to get an overview. Calling the script is safe as it only checks what files needs to created or updated. The second is via the operasetup process, this will generate/update the code when run. The third is an extension to hob, this is how the tool was originally used and is no longer relevant.It should be removed in the future.

Example run - Display files which are to be created or updated.

python modules/protobuf/cppgen/script.py

A useful option is the --diff option, if supplied it changes the output to show the difference between the current codebase and what will be updated. This is quite handy when modifying the cppgen generator, you should be able to spot mistakes quite quickly.

python modules/protobuf/cppgen/script.py --diff

Running it via operasetup is normally done when starting a new build, but it also possible by calling modules\hardcore\scripts\operasetup.py

Configuration

The generator can be configured on a per-module basis by creating a file called cpp.conf in the root of the module. It controls how the generated code looks like for that module, e.g. to add additional helper methods or change names to avoid conflicts. For more details see cpp.conf

How it works

The generator consists of several steps, they are:

Discovery

The first step is to scan the modules in the source tree for proto files. This is done by looking for modules.proto files in each module. This file contains a list of filenames or directories which are to be included, if a directory is specified it will include all files which look like proto files (ends in .proto). The system will also find proto files in non-core setups, for instance Desktop adds new scope services.

Each module.proto is scanned by stripping away comments (starts with a #) and inspecting each line in the file. An example:


# The following files are included
# The ecmascript service
ecmascript.proto

# Include all .proto files in the services sub-folder
services

This is handled by script.ProtobufSystem.scan.

Parsing

Once all files are known the system parses the proto files and builds a structure for each one. The result will be placed in cache files by pickling the Python structure to speed up parsing the next time it is run. The parsing is handled by script.ProtoFile.loadPackage or script.ProtoFile.loadCache.

After the proto structure is available it builds the corresponding C++ structure by attaching objects representing the C++ classes, enums and other types (see cpp.py). This is initiated by script.ProtoFile.scan by calling cpp.buildCppElements on the package object. It will iterate over all proto objects and attach a builder object (subclass of cpp.CppBuilder). Each builder object contains a set of properties and methods which the generators will later on use, for instance they can return the names of C++ types or objects representing types (e.g. an enum). The builder is accessible on the proto object via the cpp property.

Any message which is marked as an OpMessage will be added to a script.OpMessageSystem which will be processed after the scan is done. This is needed to generate global unique IDs for each OpMessage.

Generate

The final step is to generate the C++ code and write them out to the .cpp and .h files. The system is designed to avoid writing to the files if no changes are found, this is very important as the build systems used by Opera will always recompile any changed C++ file, in some cases the entire source tree will be recompiled. The system also tries to figure out if it even needs to parse and generate the code to speed up the process. This has in some cases in the past led to problems were code were not updated, see Troubleshooting below for more details.

What gets generated depends on what is found in the proto files. For instance if a service declaration is found it assumes it is a scope service and will use scope_service.py, if it finds messages marked as OpMessage it will use op_message.py. Both of them will use message.py to generate the C++ classes for the messages but will embed the generated code in different ways. They are currently mutually exclusive, this is something that could be improved in the future, for instance by having the message classes in a separate namespace/class and generate the service or OpMessage classes if needed.

Additionally a global service manager class is created, it keeps track of all service classes in the system and manages object lifetime and service registration. This is generated by scope_manager.py and the scope_manager.CppManagerGenerator class. This manager is used in the scope module only.

Generators

The various generators used in the system are explained in more detail below.

Proto messages

Each message which are found in the proto files (including nested ones) will be converted to a C++ class which represents the message file. The C++ class will for the most part reflect the message by having members with similar names to the fields. All messages are placed in an outer class called the message-set, ideally this should have been a namespace but that cannot be used yet.

The nested structure of the messages are not reflected in the C++ code as there are problems referencing classes which have not yet been defined. Instead a flat structure is used with the name of the C++ class reflecting the nested structure, for instance consider following structure:

message Points
{
	message Point
	{
	}
}

The inner message Point will get a name reflecting the entire path of the message, i.e. the class would be named something like Points_Point. To make it easier for the C++ developers additional typedefs are added to mimic the original structure, e.g. a typedef is placed inside the Points class to the Points_Point class but with the name Point, this makes it possible to reference the class with just Points::Point.

The generator is handled by message.py and the Python class CppMessageGenerator. It generates a list of C++ methods by inspecting the message fields, their types and their options.

Service generation

The services are generated by the scope_service.CppServiceGenerator class, it generates both the header and implementation code. The class scope_service.CppScopeServiceGenerator handles configuration and knows which files to generate.

OpMessage generation

Each marked proto message will be converted to a C++ class which is a subclass of OpTypedMessage. It will first generate the class for the proto message itself using message.py, then it will generate the OpMessage wrapper class which contains the specific methods and constructors, the actual data is stored in a private member using the generated proto class. The Python class CppOpMessageGenerator is responsible for generating the files, while the classes CppOpMessageDeclaration and CppOpMessageImplementation handles the actual code generation. All generated classes for a single module is placed in a message-set, this is handled by CppOpMessageSetGenerator.

All OpMessages in the system are assigned a unique enum which is used for knowing which type a serialized message has. The Python class CppOpMessageGlobalGenerator handles the generated files for this system and the generator are handled by the classes CppOpTypedMessageBaseDeclaration and CppOpTypedMessageBaseImplementation

cpp

The cppgen module contains lots of functionality for representing C++ code as Python objects and generating text output from them. All C++ elements inherit from the CppElement class, it maintains a child/parent relationship and has convenience properties for docblocks, comments, spacing and a link to the Proto object (if any) that it was created from. Several subclasses exists such as Class, Enum, Method and TypeDef, however it does not cover all C++ types and keywords (yet).

Troubleshooting

Corrupt code

There has been some issues in the past with code being corrupted at times. This problem comes from the caching and mtime checking which has caused only some files to be updated. The result is that the compiler will only recompile those modified files leaving existing files untouched. When the code is linked there may be different offsets used for fetching attributes in objects leading to memory corruption. The main cause of this issue has been switching between changesets in git which contain different versions of the protobuf Python module.

The easy fix to this is to clear the cache folder/files and run operasetup again. The proper fix is to change the build system as outlined in Future Improvements.

Future Improvements

Python cleanup

The current Python code is a mix of coding styles and should be standardized to just one. The best would be to use the PEP8 standard.

C++ generation

The code generation to C++ is a mix of C++ Python objects and direct text output. When the generator was first created it relied only on text/template output and the C++ objects was added little by little afterwards. Ideally the generation should be made by creating the Python C++ objects only and then serializing them to text. Doing that will require adding more CppElement subclasses and improving existing ones as well as convert a lot of the old manual text output to use the new system.

For instance it could all start with a File object which contains CppElements, such as:

File
  - Comment
  - MacroIf
      - Class
        - Method
        - Method
        ...

Then all that needs to be done is serialize the File object.

Unified message-sets

Currently the scope service generator creates the message-set a bit different than the one created when no service is defined. The best solution would be to always create the message-set outside the service class and make the service file reuse that. Ideally all message-sets should be managed by the protobuf module, currently the scope module manages the message-sets found in services.

Improved build process

Currently script.py partially implements its own build system by keeping track of dependencies and managing mtime and pickle caches. This makes the codebase bigger than it needs to be and it also contains bugs from time to time. The better approach would be to separate the process, the first run would only take care of discovering proto files and writing out a set of text files which contain information about which proto files are to be used. For instance there could be one file listing all proto files which are used to generate message-sets, another for scope services and a third for OpMessages, these files could also contain dependency information. The real build system will then use that information to generate the files if needed (mtime, dependencies etc.) by calling specific Python scripts, e.g. "generate_messages.py".

Fix imports

Currently the protobuf import mechanism is a bit lacking. The system will import proto files from the current working directory instead of either relative to the current file or from the root of the opera source tree. This could potentially be fixed by setting up a list of include paths and use these paths to find the file to include.

More C++ customizations

The customization of the generated code could be improved. For instance supporting setting a list of C++ include files to be added to all generated files or just for a given .proto file.