Skip to content

Capstone 4.0 New API to provide explicit registers

Nguyen Anh Quynh edited this page Apr 30, 2015 · 7 revisions

Update:

  • This proposal was already done for X86 & ARM. See http://capstone-engine.org/op_access.html.

  • We figured out the way to generate the desired mapping table automatically, but still need to verify the correctness of them. Contact us if you are willing to help.


Short version

This new API can tell you that this X86 instruction

ADD EAX, EBX

would read registers EAX & EBX, and also modifies EAX & EFLAGS.

Our intention is to have this API supported all 8 architectures, but there is a lot of work, so please contact us if you can help on this API.

Long version

At the moment, Capstone can provide the list of registers implicitly read/written by disassembled instructions (see this C tutorial, section 3). But this is not enough, as we always want to know about all registers being accessed, not only the implicit ones.

Here is a feature that has been requested by many people for the next version of Capstone (v4.0): a new API that lists all the registers explicitly read or written by disassembled instruction. Information like this is very helpful to build binary analysis tools.

For example, this X86 instruction:

ADD EAX, EBX

would read EAX & EBX, then update EAX.

You can see that combining with existing information about implicit registers being accessed (EFLAGS register is modified in this case), Capstone would be able to tell all the registers being read or written, including both implicit & explicit ones.

To provide this information, we need a big table that maps each instruction to an array reflecting attributes READ, WRITE or READ_WRITE (meaning both read & write to the same operand)

This new feature requires huge work: our job is to fill in the information into the mapping table for each instruction, in which each line in this table is actually an array.

Take the above example ADD EAX, EBX, we have this line:

{ 0 },  /* X86_ADD32rr, X86_INS_ADD: add{l}     $src1, $src2 */

(Note that everything between /* */ is just comment for each instruction.)

According to the Intel manual of ADD32rr (add a 32bit register with another 32bit register), we need to modify this line to:

{ READ_WRITE, READ, 0 },  /* X86_ADD32rr, X86_INS_ADD: add{l}     $src1, $src2 */

which means the first operand (EAX) is read & written, the second operand (EBX) is read. The last 0 marks the end of array.


The question is: how to have this mapping table? Unfortunately, the only answer is that we have to manually fill in the table for every single instruction. And to be complete, this must be done for all the 8 architectures supported by Capstone.

This is a lot of labor hard work, so if you can and are willing to help to make this API, please contact us.