Nintendo NX CPU Profiler 0.11
===============================================================================

1. Overview
2. Requirements
3. Installation
4. Runtime Integration
5. PC GUI
6. Troubleshooting
7. Known Issues
8. Revision History

===============================================================================

1. Overview

This is the Nintendo NX CPU Profiler.
The NX CPU Profiler is a statistical sampling profiler that helps detect which
functions are using significant CPU resources. It currently uses time-based
interruption to collect information about which functions are active at the
time of interruption. The tool is tuned to help you to understand your
application's use of CPU and OS resources. It automatically detects high-level
logic loops, can give general advice on how to improve performance, and allows
you to inspect the sampled data in many different ways.

===============================================================================

2. Requirements

 * .NET 4.6.1 or later
 * DirectX End-User Runtime (June 2010)
     https://www.microsoft.com/en-us/download/details.aspx?id=8109

===============================================================================

3. Installation

The Nintendo NX CPU Profiler is automatically installed to the correct location
by NDI. You can find the PC executable 'Nintendo CPU Profiler.exe' inside the
'Tools\NintendoNxCpuProfiler' folder.

===============================================================================

4. Runtime Integration

These steps are required to support In-Process profiling, to take an
Instrumented profile, or to make use of the nn::profiler API functions.

Integration Steps:
 1. Link against the profiler library (do this for both Debug and Release):
    a. Right click on your project and click "Properties".
    b. Under "Configuration Properties", go to "Linker",
       and then go to "Input".
    c. In the "Additional Dependencies" field, at the beginning of the line,
       add "libnn_profiler.a;" to the list of libraries.

 2. Initialize the profiler with a memory buffer:
    a. Find your application initialization. Include the profiler library:
          #include <nn/profiler.h>
    b. Allocate a memory buffer for the profiler to use.
       This buffer must be at least the size specified by
       nn::profiler::MinimumBufferSize.
       The following line would accomplish this:
          void* profBuffer = (void*) new char[nn::profiler::MinimumBufferSize];
    c. Initialize the profiler with nn::profiler::Initialize().
       The first argument is the buffer and the second is the buffer size.
       The following line would accomplish this:
          nn::profiler::Initialize(profBuffer, nn::profiler::MinimumBufferSize);
    d. IMPORTANT: If you initialize HTCS, ensure that you do so before
       initializing the profiler.

 3. Log the framerate:
    a. Find your main rendering loop. Include the profiler library:
          #include <nn/profiler.h>
    b. Inside the main rendering loop, at the top of the loop, insert the line:
          nn::profiler::RecordHeartbeat(nn::profiler::Heartbeats_Main);

===============================================================================

5. PC GUI

If you have used either the Nintendo 3DS CPU Profiler or the Wii U CPU Profiler
then you will be familiar with how the GUI behaves.

Take a Profile:
 1. (Optional) Integrate with the profiler library, described in Section 4.
 2. Run the application. 
 3. Launch the Nintendo CPU Profiler GUI. Run 'Nintendo CPU Profiler.exe'
    found inside the 'Tools\NintendoNxCpuProfiler' folder.
 4. Click the Sync button. This button will only be clickable if there is a
    profile-able application running on the NX hardware.
 5. When the "Select a Scenario" dialog box appears, locate the NSS file
    corresponding with the running application. Be careful to choose the
    correct Debug or Release file.
 6. On the "Sampled Profiler" tab, select the settings you wish to use while
    profiling. The default selection is good for a quick test.
 7. Click the Start button to begin profiling.
 8. Click the Stop button to finish. If the internal profiler buffer fills
    completely, the profile will be automatically stopped.
 9. Wait for the data to be sent to the PC and for it to be parsed. This may
    take up to a couple of minutes.

Explore the Profile:
 1. Inspect the left window to see the top functions by percentage. You can
    click to select functions you are interested in examining. Selected
    functions will be highlighted in the Call Tree tab and drawn in the
    Sample Graph tab.
 2. Inspect the Call Tree tab in the right window. This tab only appears if
    there is callstack information in the profile.
 3. Inspect the Sample Graph tab in the right window. As you select functions
    in the left window, they will be drawn here. Use the Sample Graph toolbar
    to turn various graphs on and off (Icicle Graph, Load Graph, System Load).

===============================================================================

6. Troubleshooting

The following are common issues:
 1. The "Sync" button in the GUI is not enabled.
    Solution 1: Close any other open instances of Nintendo NX CPU Profiler.
    Solution 2: Restart the profiler GUI. This might fix the problem.
    Solution 3: Restart the dev kit.
    Solution 4: Restart Target Manager.

 2. Sync works correctly, but as soon as I start a profile, the GUI crashes.
    Solution: Ensure that you choose the correct .nss file when you synced.
    Ensure that the correct Debug/Release and NX32/NX64 version was selected.

 3. Adding the call to nn::profiler::Initialize causes a crash.
    Solution: The profiler attempts to start HTCS communication if it is not
    already initialized. If the initialization is attempted again in your
    application the call will assert and stop execution. Move the call to
    nn::htcs::Initialize to occur before initializing the profiler.

 4. White boxes with no text appear on the Sample Graph tab.
    This issue will occur when the required version of DirectX is not
    installed. Please see the Requirements section of this document for a link
    to the required version of DirectX.

===============================================================================

7. Known Issues

The following are known issues:
 1. Callstacks may contain invalid data. If you see BAD FRAME POINTER in the
    callstack, this indicates that a library or module being used was built
    without frame pointers. Rebuild the library or module with the latest
    SDK and compiler as frame pointers are now enabled by default. Any data
    under this marker is likely invalid.

    Similarly, if you see SENTINEL SAVE in the callstack, this indicates
    that the profiler needed to abort walking the callstack in order avoid
    walking off the end of a buffer. All of the data until this marker may
    be considered valid.

 2. Assembly view for 32-bit applications may show "Unknown Instruction".
    The current 32-bit encyclopedias used by the profiler support ARMv7.
    ARMv8 contains some new instructions which have not yet been added to the
    encyclopedias used in assembly view. If one of these instructions is
    generated by the compiler, it will result in the "Unknown Instruction"
    appearing.

 3. When taking a new profile, or loading an existing profile, existing data
    in the GUI remains until the entire load has finished. It is generally
    not possible to interact with the data while in this state.

 4. Relocatable modules are determined at the time that profiling starts or
    ends. If a relocatable module is both loaded and unloaded during profiling
    the GUI will not be able to track the module.

 5. Function instrumentation only works for 64-bit applications that have
    been started by Target Manager and have called nn::profiler::Initialize().
    The instrumentation works by overwriting portions of the instrumented
    function's code. This means that certain functions may not be able to be
    instrumented.

    a. Functions that take more than 7 parameters cannot be instrumented.
       In order to function properly, the profiler modifies the stack between
       the function call and the function's use of the stack. This means that
       any attempt to read a value from the stack will read the wrong value.

       The AARCH64 ABI states that the first 8 registers can be used to pass
       parameters. Any additional parameters will be pushed onto the stack.
       However, in C++, the first parameter will be a this pointer if it is a
       member function. Similarly, floating-point values use a different settings
       of registers and passing structures by value can introduce a stack push.

       Currently, the GUI only checks the number of parameters, not what kind
       of value is used. It also assumes that all functions are member functions
       since the determination is being made solely by looking at the function
       name with arguments visible in the GUI. This check may be improved in
       the future.

    b. Functions that read or write the PC in their first instruction cannot
       be instrumented.
       The profiler copies the first instruction of the function being
       instrumented into its own trampoline function. It then overwrites the
       first instruction with a branch to the trampoline. Attempting to read
       or write the PC in the first instruction will generally cause problems
       to occur.

    c. NX CPU Profiler functions cannot be instrumented.

    d. Functions that branch without linking (jump) to the first instruction
       should not be profiled.
       The profiler does not currently check for this behavior. If a function
       is instrumented and it leads to system destabilization, ensure that
       it does not jump to the first instruction in the function.
       Note: Recursive functions can be instrumented.

    e. The code overwriting used has a maximum branch offset of 128MB.
       Functions that require a further offset cannot be instrumented. This
       may mean that only functions located in the primary application can be
       instrumented.

 6. The Performance Counters include counts generated by the profiler threads.
    This is especially evident for the 'System Calls' group. For this group,
    the profiler adds an overhead of 2 for the value 'System Calls' and 1 for
    the value 'IRQ interrupt'. As 'System Exceptions' is the sum of the other
    two, the base value is 3.

    The value cannot be subtracted out automatically as, occasionally, the
    'System Calls' does not increment both expected times. This usually only
    occurs at higher sample rates, 1000x or higher.

 7. In NX Addon 3.0.0 a change to the thread scheduling was introduced to
    allow threads to migrate cores for which they had an affinity more
    frequently. Threads that are allowed to migrate are far more likely to
    do so while profiling. Due to this, when profiling it is strongly
    recommended that you disable multiple core affinity threads. Failure to
    do so can result in misreported System Idle Thread time.

 8. Some NVN functions that are API entry points may not appear in the Call
    Tree. This is because those functions internally have a tail-call
    optimization built into them, removing them from the return callstack.
    
 9. The following issue only applies to in-process profiling: 
    To fix an issue where migrating threads could execute while sampling,
    the profiler now takes some control of the thread. As this increases
    overhead, this is only done if migrating threads have been detected.
    If profiling begins before any migrating threads have been created, the
    original issue can still occur. To workaround this limitation, ensure that
    there is at least one thread running with core affinity to more than
    one core when profiling begins if your application will be starting threads
    during the scene being profiled.


===============================================================================

8. Revision History

Version 0.11
 * Added support for out-of-process profiling.
 * Added new Scenario Selection window when Syncing to the dev kit.
 * Added crash reports when profiler is Attached to a process.
 * Added auto-expansion of calltree nodes when they contain a single child.
 * Added a new Readme button to the Home tab.
 * Added VSync button and drawing into the Sample Graph.
 * Added additional Module and Thread information to Info tab.
 * Added new scripting commands.
 * Added new Checkers.
 * Added new Warning when a Critical Issue Checker triggers.
 * Added keyboard shortcuts for Open (Ctrl+O), Save (Ctrl+S), and Manual (F1).
 * Moved Migrating Thread warning into a Checker.
 * Code Blocks and User Data now use string hashes as their identifiers
   (improves usability when there are more strings than buffer space).
 * Changed the Manual button to open the Manual rather than Readme.
 * Changed width and text alignment of the Address Column in the Assembly tab.
 * Changed dev kit reporting to show port along with serial number.
 * Changed Checkers to use a blacklist for removing SDK-provided NSOs.
 * Changed behavior so that Instrumented tab only automatically selects if
   there is no Functions tab.
 * Changed defaults to disable Margin of Error. Does not affect existing users.
 * Fixed demangling issue that could cause an NSS from loading properly.
 * Fixed issue where Code Blocks on migrating threads could corrupt data.
 * Fixed issue where migrating threads could execute while in-process sampling.
 * Fixed crash when saving profiles if the last used drive was removed.
 * Fixed crash that could occur when using Show Source on the Assembly tab.
 * Fixed crash that could occur if a disconnect occurred while syncing.
 * Fixed crash that could occur on first boot if no other Nintendo tools had
   been run on the computer.
 * Fixed rare crash that could occur when communicating with dev kit.
 * Fixed issue where a null dereference could occur after problems occurred
   while loading profile data.
 * Fixed bug that could cause saving profiles to fail.
 * Fixed bug where command-line options didn't check length.
 * Fixed 'Do Not Profile Debug Builds' Checker.
 * Fixed bug that could cause Unicode characters copied to clipboard to be
   copied as ASCII rather than Unicode.
 * Fixed bug that caused non-errors to put profiler into an error state.
 * Fixed logs to show .NET version 4.6.1 and later.

Version 0.10
 * Switches sample back to individual thread sampling. Threads with a core
   affinity of more than one core will not profile properly.
 * Added a warning when threads with multiple core affinity exist.
 * Fix for referencing the LogClientLibrary DLL to support multiple SDKs.

Version 0.9.1
 * Added automatic recording of Vsync events as a Heartbeat. Requires SDK 5.0.0
   or later.
 * Added new NVN functions to list of NVN API entry points.
 * Fixed library crash and deadlock that could occur when calling Finalize().
 * Fixed GUI crash that could occur when Target Manager was closed while the
   GUI was synced to an application.

Version 0.9
 * Added support for showing function names for NVN API entry points.
   See Known Issues for limitations surrounding this.
 * Added better handling of null strings passed to Code Blocks and User Data.
 * Added support for Target Manager 4.0.0 and later.
 * Fixed issue with profiler reporting warnings about the Console tab.
 * Fixed issue with building 'detected' calls of functions that were not seen
   during profiling.
 * Fixed slowdown that occurred with changing Units.
 * Fixed issue with 'long double' in Demangler which caused NSS files to be
   shown as unusable. These names are now just not demangled.
 * Fixed runtime crash that could occur when the profiler was finalized and
   then reinitialized.
 * Fixed some Japanese localization issues.
 * Fixed crash that could occur when profiling and having multiple Initialize
   and Sync operations without restarting the profiled application.
 * Fixed issue that would cause the profiler to crash when a newer version of
   PowerShell was installed and a second script was run.
 * Fixed issue where command line arguments were ignored the first time a new
   version was run.
 * Fixed issue where Instrumented profiles recorded more data than requested.

Version 0.8
 * Added ability to use dynamic strings in Code Blocks and User Data.
 * Updated target .NET version to 4.6.1.
 * User Data now uses 64-bit ids instead of 32-bit ids.
 * Improved load speed and responsiveness with large numbers of Code Blocks.
 * Reduced memory footprint in some use-cases.
 * Fixed issue where Thread filtering didn't work for some threads.
 * Fixed issues with thread IDs that used more than 32 bits.
 * Fixed issue where PLT functions weren't being loaded properly.

Version 0.7
 * Fixed issue that started with NX Addon 3.0.0 where threads with an affinity
   for more than one core could profile as Idle time.
 * Fixed issue where Code Blocks on threads with an affinity for more than one
   core could lead to bad profile data.
 * Fixed issue where Code Blocks on threads that changed cores would still
   record as being on the original core.
 * Fixed issue where sometimes this readme could not be found by the profiler.
 * Fixed issue where SYSTEM functions could show a 0% Total when calling
   another SYSTEM function.

Version 0.6.1
 * Fixed where a DLL was erroneously requiring .NET 2.0.
 * Fixed freezing issue when there are large numbers of different Code Blocks.
 * Fixed issue where symbols were undefined when using RunOnTarget.
 * Fixed issue where GUI could not Sync to dev kit if Target Manager is closed
   and reopened while GUI is open.

Version 0.6
 * Added memory tracking API and HeapInspector tool.
 * Removed "error" outputs when expected errors occur.
 * Removed Power Stat displays if that stat could not be obtained.
 * Fixed bug that prevented Syncing when a constructor or destructor is the
   last symbol of the executable section.
 * Fixed a memory leak that occurred when taking multiple profiles in
   a single profiler session.

Version 0.5.1
 * Fixed bug in interaction with Target Manager that could cause the profiler
   to crash when stopping a profile.
 * Fixed bug that could cause closing the profiler while collecting power stats
   to keep the instance of the power running without a window.
 * Fixed potential buffer overflow issue.

Version 0.5
 * Added support for gathering Performance Counter data when sampling.
 * Added support for taking Instrumented profiles in 64-bit applications run
   from Target Manager.
 * Added support for showing function disassembly in the Assembly tab.
 * Added additional power statistics to be recorded.
 * Initial support for Checkers.
 * Switched power statistics to show medians rather than means.
 * Thread name, core, and ID are now cached the first time a thread is seen
   during a profile.
 * Improved handling of relocatable modules.
 * Relocatable module list is now also checked when profiling ends.
 * Renamed 'Internal Profiler Thread' to 'SYSTEM THREAD' to accurately state
   where the execution time was spent.
 * Renamed 'IDLE: <function>' to 'SYSTEM: <function>' to accurately state
   where the execution time was spent.
 * Split the visible time in the GUI for time spent in the System Idle Thread
   vs. another process, previously everything was allocated to the idle thread.
 * Reduced the profiler library's footprint for both code and data.
 * Increased the amount of memory buffer being used for sampling, allowing
   for longer profiles.
 * Fixed a major memory leak that occurred when taking multiple profiles in
   a single profiler session.
 * Fixed issue where assertions in the profiler library were not removed in
   Release builds.
 * Fixed some potential crashing and running problems.

Version 0.4.1
 * Initial support for relocatable modules.
 * Fixed version numbering and how it is used throughout the application.

Version 0.4
 * Added ability to record power statistics.
 * Added additional information when a system call is actively happening.
   The function is now preceded with the word "IDLE".
 * Added a progress bar to Sync operation while opening the application NSS.
 * Improved loading times of NSS files.
 * Moved the location of the readme.
 * Fixed an issue that could occur if the profiler was initialized before the
   connection to Target Manager was possible.
 * Fixed a null dereference crash bug that would rarely occur after profiling
   had started.
 * Fixed a warning that would occur with StringLiteral if extra warnings
   were enabled in the compiler.

Version 0.3
 * Added Manual button.
 * Fixed issue where NSS files for NSOs other than nnSdk would not load.
 * Fixed issues where 32-bit NSS files would fail to load for Sync.

Version 0.2.2
 * Renamed the minimum buffer size variable to nn::profiler::MinimumBufferSize.

Version 0.2.1
 * Functions in the PLT are now properly decoded, they are marked with
   the text '[import]'.
 * The Functions and Threads tab now clear when a new profile is loaded or
   being transferred from the dev kit.
 * Moved contents of nn::profiler::codeblocks into nn::profiler namespace.
 * Fixed bug with anonymous namespace functions not displaying correctly.
 * Fixed bug where contents of Counter tab were not always selectable.
 * Fixed bug where Script button didn't properly disable.
 * Fixed bug with heartbeats "Apply to All Cores" button causing hover text
   in the Sample Graph to sometimes show the wrong data.
 * Fixed bug with "Apply to All Cores" and "Use as Frame Rate" buttons not
   resetting when opening or taking a new profile.
 * Fixed bug with the Negate and RegEx buttons in regards to Filter Lock.
 * Fixed bug on Last tab where Negate and RegEx buttons did not clear.
 * Fixed bug where data in the Functions tab could overwrite itself.

Version 0.2
 * Renamed namespace from nn::prof to nn::profiler.
 * Added method to try to automatically locate the running application's NSS.
 * Added ability to use string literals for code blocks and user logs.
 * Errors due to HTCS failures are now more explicit.
 * Removed macro NN_PROF_DO_NOT_USE.
 * Units selection in GUI is now saved between runs.
 * Fixed nn::profiler::Finalize() to properly shut down the profiler library.
 * Fixed out-of-memory crash that could occur when opening NSS file on Sync.
 * Fixed crash that could occur if dev-kit connection was lost while
   selecting an NSS file during Sync.
 * Fixed text on Functions tab regarding number of results shown.

Version 0.1.1
 * Added support for displaying thread names in the GUI.
 * Dev kit name is now the serial number of the kit instead of IP address and
   port number.
 * Reduced the amount of debug output generated by the Develop library.
 * Fixed issue where nnSdkEn.nss and nnSdkJp.nss would not load correctly.
 * Fixed issue where a profiler thread was not properly destroyed.

Version 0.1
 * Initial version
