Android NDK & ARM NEON instruction set extension support Introduction: ==== Android NDK r3 added support for the new 'armeabi-v7a' ARM-based ABI that allows native code to use two useful instruction set extensions: - Thumb-2, which provides performance comparable to 32-bit ARM instructions with similar compactness to Thumb-1 - VFPv3, which provides hardware FPU registers and computations, to boost floating point performance significantly. More specifically, by default 'armeabi-v7a' only supports VFPv3-D16 which only uses/requires 16 hardware FPU 64-bit registers. More information about this can be read in docs/CPU-ARCH-ABIS.html The ARMv7 Architecture Reference Manual also defines another optional instruction set extension known as "ARM Advanced SIMD", nick-named "NEON". It provides: - A set of interesting scalar/vector instructions and registers (the latter are mapped to the same chip area as the FPU ones), comparable to MMX/SSE/3DNow! in the x86 world. - VFPv3-D32 as a requirement (i.e. 32 hardware FPU 64-bit registers, instead of the minimum of 16). Not all ARMv7-based Android devices will support NEON, but those that do may benefit in significant ways from the scalar/vector instructions. The NDK supports the compilation of modules or even specific source files with support for NEON. What this means is that a specific compiler flag will be used to enable the use of GCC ARM Neon intrinsics and VFPv3-D32 at the same time. The intrinsics are described here: > http://gcc.gnu.org/onlinedocs/gcc/ARM-NEON-Intrinsics.html Using LOCAL_ARM_NEON: --------------------- Define LOCAL_ARM_NEON to 'true' in your module definition, and the NDK will build all its source files with NEON support. This can be useful if you want to build a static or shared library that specifically contains NEON code paths. Using the .neon suffix: ----------------------- When listing sources files in your LOCAL_SRC_FILES variable, you now have the option of using the .neon suffix to indicate that you want to corresponding source(s) to be built with Neon support. For example: LOCAL_SRC_FILES := foo.c.neon bar.c Will only build 'foo.c' with NEON support. Note that the .neon suffix can be used with the .arm suffix too (used to specify the 32-bit ARM instruction set for non-NEON instructions), but must appear after it. In other words, 'foo.c.arm.neon' works, but 'foo.c.neon.arm' does NOT. Build Requirements: ------------------ Neon support only works when targeting the 'armeabi-v7a' or 'x86' ABI, otherwise the NDK build scripts will complain and abort. Neon is partially supported on x86 via translation header (To learn more about it, see docs/CPU-X86.html). It is important to use checks like the following in your Android.mk: # define a static library containing our NEON code ifeq ($(TARGET_ARCH_ABI),$(filter $(TARGET_ARCH_ABI), armeabi-v7a x86)) include $(CLEAR_VARS) LOCAL_MODULE := mylib-neon LOCAL_SRC_FILES := mylib-neon.c LOCAL_ARM_NEON := true include $(BUILD_STATIC_LIBRARY) endif # TARGET_ARCH_ABI == armeabi-v7a || x86 Runtime Detection: ------------------ As said previously, NOT ALL ARMv7-BASED ANDROID DEVICES WILL SUPPORT NEON ! It is thus crucial to perform runtime detection to know if the NEON-capable machine code can be run on the target device. To do that, use the 'cpufeatures' library that comes with this NDK. To learn more about it, see docs/CPU-FEATURES.html. You should explicitly check that android_getCpuFamily() returns ANDROID_CPU_FAMILY_ARM, and that android_getCpuFeatures() returns a value that has the ANDROID_CPU_ARM_FEATURE_NEON flag set, as in: #include <cpu-features.h> ... ... if (android_getCpuFamily() == ANDROID_CPU_FAMILY_ARM && (android_getCpuFeatures() & ANDROID_CPU_ARM_FEATURE_NEON) != 0) { // use NEON-optimized routines ... } else { // use non-NEON fallback routines instead ... } ... Sample code: ------------ Look at the source code for the "hello-neon" sample in this NDK for an example on how to use the 'cpufeatures' library and Neon intrinsics at the same time. This implements a tiny benchmark for a FIR filter loop using a C version, and a NEON-optimized one for devices that support it.