Verification todo
~~~~~~~~~~~~~~~~~
check that illegal insns on all targets don't cause the _toIR.c's to
assert. [DONE: amd64 x86 ppc32 ppc64 arm s390]
check also with --vex-guest-chase-cond=yes
check that all targets can run their insn set tests with
--vex-guest-max-insns=1.
all targets: run some tests using --profile-flags=... to exercise
function patchProfInc_<arch> [DONE: amd64 x86 ppc32 ppc64 arm s390]
figure out if there is a way to write a test program that checks
that event checks are actually getting triggered
Cleanups
~~~~~~~~
host_arm_isel.c and host_arm_defs.c: get rid of global var arm_hwcaps.
host_x86_defs.c, host_amd64_defs.c: return proper VexInvalRange
records from the patchers, instead of {0,0}, so that transparent
self hosting works properly.
host_ppc_defs.h: is RdWrLR still needed? If not delete.
ditto ARM, Ld8S
Comments that used to be in m_scheduler.c:
tchaining tests:
- extensive spinrounds
- with sched quantum = 1 -- check that handle_noredir_jump
doesn't return with INNER_COUNTERZERO
other:
- out of date comment w.r.t. bit 0 set in libvex_trc_values.h
- can VG_TRC_BORING still happen? if not, rm
- memory leaks in m_transtab (InEdgeArr/OutEdgeArr leaking?)
- move do_cacheflush out of m_transtab
- more economical unchaining when nuking an entire sector
- ditto w.r.t. cache flushes
- verify case of 2 paths from A to B
- check -- is IP_AT_SYSCALL still right?
Optimisations
~~~~~~~~~~~~~
ppc: chain_XDirect: generate short form jumps when possible
ppc64: immediate generation is terrible .. should be able
to do better
arm codegen: Generate ORRS for CmpwNEZ32(Or32(x,y))
all targets: when nuking an entire sector, don't bother to undo the
patching for any translations within the sector (nor with their
invalidations).
(somewhat implausible) for jumps to disp_cp_indir, have multiple
copies of disp_cp_indir, one for each of the possible registers that
could have held the target guest address before jumping to the stub.
Then disp_cp_indir wouldn't have to reload it from memory each time.
Might also have the effect of spreading out the indirect mispredict
burden somewhat (across the multiple copies.)
Implementation notes
~~~~~~~~~~~~~~~~~~~~
T-chaining changes -- summary
* The code generators (host_blah_isel.c, host_blah_defs.[ch]) interact
more closely with Valgrind than before. In particular the
instruction selectors must use one of 3 different kinds of
control-transfer instructions: XDirect, XIndir and XAssisted.
All archs must use these the same; no more ad-hoc control transfer
instructions.
(more detail below)
* With T-chaining, translations can jump between each other without
going through the dispatcher loop every time. This means that the
event check (counter dec, and exit if negative) the dispatcher loop
previously did now needs to be compiled into each translation.
* The assembly dispatcher code (dispatch-arch-os.S) is still
present. It still provides table lookup services for
indirect branches, but it also provides a new feature:
dispatch points, to which the generated code jumps. There
are 5:
VG_(disp_cp_chain_me_to_slowEP):
VG_(disp_cp_chain_me_to_fastEP):
These are chain-me requests, used for Boring conditional and
unconditional jumps to destinations known at JIT time. The
generated code calls these (doesn't jump to them) and the
stub recovers the return address. These calls never return;
instead the call is done so that the stub knows where the
calling point is. It needs to know this so it can patch
the calling point to the requested destination.
VG_(disp_cp_xindir):
Old-style table lookup and go; used for indirect jumps
VG_(disp_cp_xassisted):
Most general and slowest kind. Can transfer to anywhere, but
first returns to scheduler to do some other event (eg a syscall)
before continuing.
VG_(disp_cp_evcheck_fail):
Code jumps here when the event check fails.
* new instructions in backends: XDirect, XIndir and XAssisted.
XDirect is used for chainable jumps. It is compiled into a
call to VG_(disp_cp_chain_me_to_slowEP) or
VG_(disp_cp_chain_me_to_fastEP).
XIndir is used for indirect jumps. It is compiled into a jump
to VG_(disp_cp_xindir)
XAssisted is used for "assisted" (do something first, then jump)
transfers. It is compiled into a jump to VG_(disp_cp_xassisted)
All 3 of these may be conditional.
More complexity: in some circumstances (no-redir translations)
all transfers must be done with XAssisted. In such cases the
instruction selector will be told this.
* Patching: XDirect is compiled basically into
%r11 = &VG_(disp_cp_chain_me_to_{slow,fast}EP)
call *%r11
Backends must provide a function (eg) chainXDirect_AMD64
which converts it into a jump to a specified destination
jmp $delta-of-PCs
or
%r11 = 64-bit immediate
jmpq *%r11
depending on branch distance.
Backends must provide a function (eg) unchainXDirect_AMD64
which restores the original call-to-the-stub version.
* Event checks. Each translation now has two entry points,
the slow one (slowEP) and fast one (fastEP). Like this:
slowEP:
counter--
if (counter < 0) goto VG_(disp_cp_evcheck_fail)
fastEP:
(rest of the translation)
slowEP is used for control flow transfers that are or might be
a back edge in the control flow graph. Insn selectors are
given the address of the highest guest byte in the block so
they can determine which edges are definitely not back edges.
The counter is placed in the first 8 bytes of the guest state,
and the address of VG_(disp_cp_evcheck_fail) is placed in
the next 8 bytes. This allows very compact checks on all
targets, since no immediates need to be synthesised, eg:
decq 0(%baseblock-pointer)
jns fastEP
jmpq *8(baseblock-pointer)
fastEP:
On amd64 a non-failing check is therefore 2 insns; all 3 occupy
just 8 bytes.
On amd64 the event check is created by a special single
pseudo-instruction AMD64_EvCheck.
* BB profiling (for --profile-flags=). The dispatch assembly
dispatch-arch-os.S no longer deals with this and so is much
simplified. Instead the profile inc is compiled into each
translation, as the insn immediately following the event
check. Again, on amd64 a pseudo-insn AMD64_ProfInc is used.
Counters are now 64 bit even on 32 bit hosts, to avoid overflow.
One complexity is that at JIT time it is not known where the
address of the counter is. To solve this, VexTranslateResult
now returns the offset of the profile inc in the generated
code. When the counter address is known, VEX can be called
again to patch it in. Backends must supply eg
patchProfInc_AMD64 to make this happen.
* Front end changes (guest_blah_toIR.c)
The way the guest program counter is handled has changed
significantly. Previously, the guest PC was updated (in IR)
at the start of each instruction, except for the first insn
in an IRSB. This is inconsistent and doesn't work with the
new framework.
Now, each instruction must update the guest PC as its last
IR statement -- not its first. And no special exemption for
the first insn in the block. As before most of these are
optimised out by ir_opt, so no concerns about efficiency.
As a logical side effect of this, exits (IRStmt_Exit) and the
block-end transfer are both considered to write to the guest state
(the guest PC) and so need to be told the offset of it.
IR generators (eg disInstr_AMD64) are no longer allowed to set the
IRSB::next, to specify the block-end transfer address. Instead they
now indicate, to the generic steering logic that drives them (iow,
guest_generic_bb_to_IR.c), that the block has ended. This then
generates effectively "goto GET(PC)" (which, again, is optimised
away). What this does mean is that if the IR generator function
ends the IR of the last instruction in the block with an incorrect
assignment to the guest PC, execution will transfer to an incorrect
destination -- making the error obvious quickly.