Runtime Performance Ideas

Runtime performance is important for our users. We care about the startup performance and also the performance in general.

Measurements

Type of measurements we use to analyze the runtime performance

Regular test measurements with time and size plots
Profiler - we use manual measurements with profiler to analyze the performance of the managed code. Usually the calls, alloc and sample reports.
- NDK native profiler can be used to profile Mono + Xamarin.Android native runtimes
JIT times measurements We can now measure the JIT time per method, using the debug.mono.log=timing property and on device methods.txt output.
- Example of JIT times on Pixel XL 2, when running Xamarin.Forms test

Findings

Not preloading all app assemblies during runtime init (Java_mono_android_Runtime_init in monodroid-glue.cc) buys us 100ms.
Application started via an Intent is ~2.5x slower than one started "normally" by clicking the app icon in the launcher (sometimes even slower than that - e.g. logger initialization in managed code takes ~4ms with normal startup, 28ms with the "Intent" one). Measured on Pixel 3 XL. Nobody knows why it happens, yet.
Xamarin.Forms apparently attempt to reflection-load all the assemblies in the app, this causes our savings in the native init code to disappear once we hit the managed land.
It is important to consider the time to JIT as it takes about 50% of the app startup time

Ideas

New JNI marshal methods to speedup startup by avoiding System.Reflection.Emit use. That would hopefully provide faster marshal methods registration and reduce the JIT time and apk size
Profiled AOT - the idea is to AOT just the startup code to make apk size smaller, while preserve fast startup. (AOT startup is about 2 times faster)
Run some of the regular tests with profiling. Process and use the data in the time plots
Assemblies loading
- do not try to load AOT'ed assemblies when we know they are not in the apk (Release configuration)
- Let Mono know where it can look for assemblies in general (e.g. skip the GAC, skip compile time host locations). We know exactly where the assemblies may be. This would limit calls to monodroid_dlopen and dlopen
Measure performance regularly on devices as well - today we only use Android emulator to run the tests by the build bots. It might be also worth to run performance measurements on dedicated machine and collect data on mobile devices and emulators. That might be more stable compared to the build bots.
Improve the way we use the typemap.{jm,mj} files. Currently they use the bsearch and strcmp to compare strings, thus comparing strings continuously character by character and the lookup function is called very often. It might be faster to enhance the typemap format with a precalculated hash of the string and then use an int -> typemap_entry dictionary/map to cut the number of hash calculations in half. With C++ we could use for instance sparsehash to make the comparisons faster than they are now.
p/invoke optimization Mono runtime uses dlsym to look up native functions in DSOs whenever the p/invoke is used. The same thing happens on Android, especially for all the __Internal externals. However, this is completely unnecessary since by the time Mono runtime is initialized by us, we already know addresses of all the exported functions. The idea is to inform Mono about the addresses of those functions so that it can skip the lookup. This requires changes to the Mono runtime.
Xamarin.Android runtime uses JNI's FindClass method quite a lot to look up Java classes during startup. This might not be necessary if we store references to those classes in the Java mono.android.Runtime class which calls into our native initialization sequence. This way we can skip the "reflection" part and let ART do the job for us before our native code runs.
Limit unnecessary logging. We currently call e.g. log_info (LOG_DEFAULT, ...) a lot without checking whether the DEFAULT category is enabled. This is costly, as logcat is costly, but it also sometimes requires code to run in order to prepare parameters for the logger call, only to discard everything because the category is disabled. We need to actively check whether a category is enabled before logging. Doesn't apply to log_warn or higher.
Stop reading Android system properties on startup (and in general - we shouldn't need to use them at all in our runtime code). Android implements property store as a kernel driver and accessing the store is full of context switches, uses socket polling as well as puts up write barriers. We should replace use of system properties with a set of files to store them in the application directory. It might still be relevant, but on Pixel 3 we spend a total of 33462ns (max 41357ns, min 30105ns) averaged over 20 runs (app rebuild, install, cold start) during our runtime init. That's 0.03ms and doesn't seem worth spending time on right now
Interpreter? Rumor has it that mono's new interpreter is "reasonably fast," in that it can execute some methods in less time than it would take to normally JIT + execute that same method. This makes it potentially interesting during process startup, when many methods are executed only once, e.g. JNIEnv.Initialize(), the AndroidRuntime constructor, and some others. This would have a "cost" in larger .apk sizes (to include the interpreter), but the tradeoff may very well be worth it.
We generate two type map files, for managed to java and reverse lookups. They are stored in the apk and subsequently loaded into memory during runtime startup (and kept in memory). We can do better than that - we can generate an native assembler file with the data, compile with as (which we'll have to ship, but it's small and standalone) and relink the XA runtime when building the APK (Android SDK ships with the native linker). The data would be placed in a read-only section, loaded by the system loader for us and immediately available whenever it is required - without incurring any runtime overhead whatsoever. We can also reuse the generated assembly for other things (environment variables, flags - for instance whether the app uses embedded DSOs, LLVM, AOT etc). MSBuild side caching would make sure that we don't relink the runtime unnecessary. The gains can be quite worth the effort of developing this. Assembly generation can easily be implemented by creating a custom Stream implementation which can then be used with TypeNameMapGenerator without changing the latter at all.

Improvements in progress

PR 1886 Do not fallback in GetJavaToManagedType
PR 2515 Runtime startup performance improvements
PR 2454 jnimarshalmethod-gen.exe Windows support
Profiled AOT Project 12

Planned improvements

Fix Bcl test measurements and add them to the plots again
Generate and use JNI marshal methods for constructors used in the ConstructorBuilder to avoid last System.Reflection.Emit use

Finished improvements

PR 2153 Generate JNI marshal methods
PR 2008 Faster native members registration
PR 1890 Use the faster java type name mapping
PR 1872 Avoid using Guid.NewGuid () for dynamic constructor names
PR 1868 Avoid using Guid.NewGuid () for dynamic method names

Runtime Performance Ideas

Measurements

Findings

Ideas

Improvements in progress

Planned improvements

Finished improvements

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

General

Contributing

Tips & tricks

Specifications

Release Notes

Clone this wiki locally