<?xml version="1.0" encoding="UTF-8" ?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" lang="en"> <head> <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /> <link rel="stylesheet" href=".resources/doc.css" charset="UTF-8" type="text/css" /> <link rel="stylesheet" href="../coverage/.resources/prettify.css" charset="UTF-8" type="text/css" /> <link rel="shortcut icon" href=".resources/report.gif" type="image/gif" /> <script type="text/javascript" src="../coverage/.resources/prettify.js"></script> <title>JaCoCo - Control Flow Analysis</title> </head> <body onload="prettyPrint()"> <div class="breadcrumb"> <a href="../index.html" class="el_report">JaCoCo</a> > <a href="index.html" class="el_group">Documentation</a> > <span class="el_source">Control Flow Analysis</span> </div> <div id="content"> <h1>Control Flow Analysis for Java Methods</h1> <p class="hint"> Implementing a coverage tool that supports statement (C0) as well as branch coverage coverage (C1) requires detailed analysis of the internal control flow of Java methods. Due to the architecture of JaCoCo this analysis happens on the bytecode of compiled class files. This document describes JaCoCo's strategies for inserting probes into the control flow at runtime and analyzing the actual code coverage. Marc R. Hoffmann, November 2011 </p> <h2>Control Flow Graphs for Java Bytecode</h2> <p> As an starting point we take the following example method that contains a single branching point: </p> <pre class="source lang-java linenums"> public static void example() { a(); if (cond()) { b(); } else { c(); } d(); } </pre> <p> A Java compiler will create the following bytecode from this example method. Java bytecode is a linear sequence of instructions. Control flow is implemented with <i>jump</i> instructions like the conditional <code>IFEQ</code> or the unconditional <code>GOTO</code> opcode. The jump targets are technically relative offsets to the target instruction. For better readability we use symbolic labels (<code>L1</code>, <code>L2</code>) instead (also the ASM API uses such symbolic labels): </p> <pre class="source linenums"> public static example()V INVOKESTATIC a()V INVOKESTATIC cond()Z IFEQ L1 INVOKESTATIC b()V GOTO L2 L1: INVOKESTATIC c()V L2: INVOKESTATIC d()V RETURN </pre> <p> The possible control flow in the bytecode above can be represented by a graph. The nodes are byte code instruction, the edged of the graph represent the possible control flow between the instructions. The control flow of the example is shown in the left box of this diagram: </p> <img src=".resources/flow-example.png" alt="Bytecode Control Flow"/> <h3>Flow Edges</h3> <p> The control flow graph of a Java method defined by Java byte code may have the following Edges. Each edge connects a source instruction with a target instruction. In some cases the source instruction or the target instruction does not exist (virtual edges for method entry and exit) or cannot be exactly specified (exception handlers). </p> <table class="coverage"> <thead> <tr> <td>Type</td> <td>Source</td> <td>Target</td> <td>Remarks</td> </tr> </thead> <tbody> <tr> <td>ENTRY</td> <td>-</td> <td>First instruction in method</td> <td></td> </tr> <tr> <td>SEQUENCE</td> <td>Instruction, except <code>GOTO</code>, <code>xRETURN</code>, <code>THROW</code>, <code>TABLESWITCH</code> and <code>LOOKUPSWITCH</code></td> <td>Subsequent instruction</td> <td></td> </tr> <tr> <td>JUMP</td> <td><code>GOTO</code>, <code>IFx</code>, <code>TABLESWITCH</code> or <code>LOOKUPSWITCH</code> instruction</td> <td>Target instruction</td> <td><code>TABLESWITCH</code> and <code>LOOKUPSWITCH</code> will define multiple edges.</td> </tr> <tr> <td>EXHANDLER</td> <td>Any instruction in handler scope</td> <td>Target instruction</td> <td></td> </tr> <tr> <td>EXIT</td> <td><code>xRETURN</code> or <code>THROW</code> instruction</td> <td>-</td> <td></td> </tr> <tr> <td>EXEXIT</td> <td>Any instruction</td> <td>-</td> <td>Unhandled exception.</td> </tr> </tbody> </table> <p> The current JaCoCo implementation ignores edges caused by implicit exceptions and the the method entry. This means we consider SEQUENCE, JUMP, EXIT. </p> <h2>Probe Insertion Strategy</h2> <p> Probes are additional instructions that can be inserted between existing instructions. They do not change the behavior of the method but record the fact that they have been executed. One can think probes are placed on edges of the control flow graph. Theoretically we could insert a probe at every edge of the control flow graph. As a probe implementation itself requires multiple bytecode instructions this would increase the size of the class files several times and significantly slow down execution speed of the instrumented classes. Fortunately this is not required, in fact we only need a few probes per method depending on the control flow of the method. For example a method without any branches requires a single probe only. The reason for this is that starting from a certain probe we can back-trace the execution path and typically get coverage information for multiple instructions. </p> <p> If a probe has been executed we know that the corresponding edge has been visited. From this edge we can conclude to other preceding nodes and edges: </p> <ul> <li>If a edge has been visited, we know that the source node of the this edge has been executed.</li> <li>If a node has been executed and the node is the target of only one edge we know that this edge has been visited.</li> </ul> <p> Recursively applying these rules allows to determine the execution status of all instructions of a method – given that we have probes at the right positions. Therefore JaCoCo inserts probes </p> <ul> <li>at every method exit (return or throws) and</li> <li>at every edge where the target instruction is the target of more than one edge.</li> </ul> <p> We recall that a probe is simply a small sequence of additional instructions that needs to be inserted at a control flow edge. The following table illustrates how this extra instructions are added in case of different edge types. </p> <table class="coverage"> <thead> <tr> <td>Type</td> <td>Before</td> <td>After</td> <td>Remarks</td> </tr> </thead> <tbody> <tr> <td>SEQUENCE</td> <td><img src=".resources/flow-sequence.png" alt="Sequence"/></td> <td><img src=".resources/flow-sequence-probe.png" alt="Sequence with Probe"/></td> <td> In case of a simple sequence the probe is simply inserted between the two instructions. </td> </tr> <tr> <td>JUMP (unconditional)</td> <td><img src=".resources/flow-goto.png" alt="Unconditional Jump"/></td> <td><img src=".resources/flow-goto-probe.png" alt="Unconditional Jump with Probe"/></td> <td> As an unconditional jump is executed in any case, we can also insert the probe just before the GOTO instruction. </td> </tr> <tr> <td>JUMP (conditional)</td> <td><img src=".resources/flow-cond.png" alt="Conditional Jump"/></td> <td><img src=".resources/flow-cond-probe.png" alt="Conditional Jump with Probe"/></td> <td> Adding a probe to an conditional jump is little bit more tricky. We invert the semantic of the opcode and add the probe right after the conditional jump. With a subsequent <code>GOTO</code> instruction we jump to the original target. Note that this approach will not introduce a backward jump, which would cause trouble with the Java verifier if we have an uninitialized object on the stack. </td> </tr> <tr> <td>EXIT</td> <td><img src=".resources/flow-exit.png" alt="Exit"/></td> <td><img src=".resources/flow-exit-probe.png" alt="Exit with Probe"/></td> <td> As is is the nature of RETURN and THROW statements to actually leave the method we add the probe right before these statements. </td> </tr> </tbody> </table> <p> Now let's see how this rules apply to the example snippet above. We see that <code>INVOKE d()</code> instruction is the only node with more than one incoming edge. So we need to place probes on those edges and another probe on the only exit node. The result is shown the the right box of the diagram above. </p> <h2>Additional Probes Between Lines</h2> <p> The probe insertion strategy described so far does not consider implicit exceptions thrown for example from invoked methods. If the control flow between two probes is interrupted by a exception not explicitly created with a <code>throw</code> statement all instruction in between are considered as not covered. This leads to unexpected results especially when the the block of instructions spans multiple lines of source code. </p> <p> Therefore JaCoCo adds an additional probe between the instructions of two lines whenever the subsequent line contains at least one method invocation. This limits the effect of implicit exceptions from method invocations to single lines of source. The approach only works for class files compiled with debug information (line numbers) and does not consider implicit exceptions from other instructions than method invocations (e.g. <code>NullPointerException</code> or <code>ArrayIndexOutOfBoundsException</code>). </p> <h2>Probe Implementation</h2> <p> Code coverage analysis is a runtime metric that provides execution details of the software under test. This requires detailed recording about the instructions (instruction coverage) that have been executed. For branch coverage also the outcome of decisions has to be recorded. In any case execution data is collected by so called probes: </p> <p class="hint"> A <b>probe</b> is a sequence of bytecode instructions that can be inserted into a Java method. When the probe is executed, this fact is recorded and can be reported by the coverage runtime. The probe must not change the behavior of the original code. </p> <p> The only purpose of the probe is to record that it has been executed at least once. The probe does not record the number of times it has been called or collect any timing information. The latter is out of scope for code coverage analysis and more in the objective of a performance analysis tool. Typically multiple probes needs to be inserted into each method, therefore probes needs to be identified. Also the probe implementation and the storage mechanism it depends on needs to be thread safe as multi-threaded execution is a common scenario for java applications (albeit not for plain unit tests). Probes must not have any side effects on the original code of the method. Also they should add minimal overhead. </p> <p> So to summarize the requirements for execution probes: </p> <ul> <li>Record execution</li> <li>Identification for different probes</li> <li>Thread safe</li> <li>No side effects on application code</li> <li>Minimal runtime overhead</li> </ul> <p> JaCoCo implements probes with a <code>boolean[]</code> array instance per class. Each probe corresponds to a entry in this array. Whenever the probe is executed the entry is set to <code>true</code> with the following four bytecode instructions: </p> <pre class="source"> ALOAD probearray xPUSH probeid ICONST_1 BASTORE </pre> <p> Note that this probe code is thread safe, does not modify the operand stack or modify local variables and is also thread safe. It does also not leave the method though an external call. The only prerequisite is that the probe array is available as a local variable. For this at the beginning of each method additional instrumentation code needs to be added to obtain the array instance associated with the belonging class. To avoid code duplication the initialization is delegated to a static private method <code>$jacocoinit()</code> which is added to every non-interface class. </p> <p> The size of the probe code above depends on the position of the probe array variable and the value of the probe identifier as different opcodes can be used. As calculated in the table below the overhead per probe ranges between 4 and 7 bytes of additional bytecode: </p> <table class="coverage"> <thead> <tr> <td>Possible Opcodes</td> <td>Min. Size [bytes]</td> <td>Max. Size [bytes]</td> </tr> </thead> <tfoot> <tr> <td>Total:</td> <td>4</td> <td>7</td> </tr> </tfoot> <tbody> <tr> <td><code>ALOAD_x</code>, <code>ALOAD</code> <sup>1</sup></td> <td>1</td> <td>2</td> </tr> <tr> <td><code>ICONST_x</code>, <code>BIPUSH</code>, <code>SIPUSH</code>, <code>LDC</code>, <code>LDC_W</code> <sup>2</sup></td> <td>1</td> <td>3</td> </tr> <tr> <td><code>ICONST_1</code></td> <td>1</td> <td>1</td> </tr> <tr> <td><code>BASTORE</code></td> <td>1</td> <td>1</td> </tr> </tbody> </table> <p> <sup>1</sup> The probe array is the first variable after the arguments. If the method arguments do not consume more that 3 slots the 1-byte opcode can be used.<br/> <sup>2</sup> 1-byte opcodes for ids 0 to 5, 2-byte opcode for ids up to 127, 3-byte opcode for ids up to 32767. Ids values of 32768 or more require an additional constant pool entry. For normal class files it is very unlikely to require more than 32,000 probes. </p> <h2>Performance</h2> <p> The control flow analysis and probe insertion strategy described in this document allows to efficiently record instruction and branch coverage. In total classes instrumented with JaCoCo increase their size by about 30%. Due to the fact that probe execution does not require any method calls, only local instructions, the observed execution time overhead for instrumented applications typically is less than 10%. </p> <h2>References</h2> <ul> <li><a href="http://asm.objectweb.org/">ASM byte code library</a> by Eric Bruneton at al.</li> <li><a href="http://andrei.gmxhome.de/bytecode/index.html">Bytecode Outline Plug-In</a> by Andrei Loskutov</li> <li><a href="http://en.wikipedia.org/wiki/Glossary_of_graph_theory">Wikipedia: Glossary of Graph Theory</a></li> </ul> </div> <div class="footer"> <div class="right"><a href="@jacoco.home.url@">JaCoCo</a> @qualified.bundle.version@</div> <a href="license.html">Copyright</a> © @copyright.years@ Mountainminds GmbH & Co. KG and Contributors </div> </body> </html>