Disassembling the Disassembler: Fixing a Bug in IDA

The Issue

While analyzing a Digital Rights Management (DRM) system, I encountered a function with instructions that the IDA disassembler failed to handle properly.

Examining the start of the function, I got greeted with this:

The byte 66 is an operand override prefix. I attempted to force IDA to disassemble starting from the second byte 0F, but this was unsuccessful. This led me to believe that the issue is not related to the prefix.

To get a glimpse on what might be going wrong, I consulted the Intel instruction manual and looked up the byte sequence 0F 18 .., which corresponds to the PREFETCH instruction

Using ZydisInfo to decode byte sequence

I found that it was interpreted as... a NOP?, which was rather unexpected.

A more thorough examination of the entry for 'PREFETCH' reveals the following information:

The source operand is a byte memory location. (The locality hints are encoded into the machine level instruction using bits 3 through 5 of the ModR/M byte.)

With the ModR/M byte being C6, this leaves us looking at

11000110
  ^^^

According to the manual, 0F 18 /0 corresponds to PREFETCHNTA. PREFETCHNTA expects a memory operand, as outlined by m8 in the manual.

C6 however does not encode a memory operand, but the register (e)si.

Changing the instruction, encoding 0F 18 86 00 00 00 00 seems to support this theory. Asembling this byte sequence results in PREFETCHNTA BYTE PTR ds:[rsi].

Wrapping up the issue

Technically, 0F 18 C6 encodes PREFETCHNTA RSI, which seems to be invalid? Executing it however seems fine on my CPU whilst having no easily observable effects, which supports that it is infact a NOP.

Logically, it sort of makes sense, encoding a PREFETCH of a register that's not a memory location is kind of non-sensical because register access is supposed to be as fast or faster than any cache access, so even if you were to give this some behavior, not doing anything should be fine.

The intel manual even states:

The PREFETCH instruction is merely a hint and does not affect program behavior. If executed, this instruction moves data closer to the processor in anticipation of future use.

Potential fixes?

Writing a script that patches all occurences of 66 0F 18 C6 to a known-good 4-byte NOP instruction seems rather error-prone. For example, if this sequence of bytes is used as a immediate value inside of an instruction, such a patch would alter the program's behavior.

Aside from that, it also isn't that simple, as outlined earlier, the instruction fails to decode even without the operand override prefix 66, and I'm pretty sure encoding the same instruction with other registers than (e)si would also trigger the bug.

Given the absence of optimal solutions, I opted to disassemble the disassembler in an attempt to find a workaround to the issue.

IDA's Program structure

Given IDA's broad range of use cases, designing the software is not an easy task. Its ability to load diverse binary formats and disassemble code for various processor types, necessitates a modular architecture.

Reading through the SDK & documentation provides some insight on how this is done.

Loaders deal with binary formats that contain code.
Processor modules contain the actual disassembler.

The loaders aren't of much concern in our case, so we will focus on processor modules.

Processor modules are esentially plugins that use the same event-driven API that is provided in the SDK.

Upon looking at procs\pc64.dll, I noticed is that the entrypoint is not doing anything interesting, there however is a data export called 'LPH'.

Looking at sources of various processor modules that are included in the SDK, LPH is revealed to be of the type processor_t, this struct has a field called _notify which is used for handling events.

Decompiling the module LPH's notify function reveals code that looks like this:

if ( ev != ev_get_procmod ) return 0LL;
construct_x86procmod(&procmod);

The notify() function appears to perform setup, specifically initializing the x86 implementation of procmod_t through the construct_x86procmod() function, which looks like a constructor.

Referencing the SDK, procmod_t inherits from event_listener_t, which contains a vtable with on_event() as its first function.

Following on_event() led me to a lengthy function containing a large switch statement.

Looking at the switch statement whilst referencing the event_t enum from the SDK leads me to the ev_ana_insn case, which is responsible for disassembling a single instruction.

Breakpointing the function and forcing IDA to disassemble an instruction confirmed my assumptions, the breakpoint is hit and the parameter looks like a pointer to an instance of insn_t.

Constructing a fix

With this new understanding, I began developing a solution to address the issue.

I hooked said function that performs disassembly of one instruction.

uint64 ana_insn_replacement(void *module_data, insn_t *out, int64 unk)

As a preliminary proof-of-concept solution, I simply compared the bytes that are being disassembled to the known bad sequence, if they match, I fill out the instruction with a NOP.

uint8 instr_bytes[16];
uint8 bad_instr[4] = {0x66, 0x0F, 0x18, 0xC6};
auto read = get_bytes(instr_bytes, 16, out->ea);
if(memcmp(instr_bytes, bad_instr, 4) == 0)
{
  out->itype = NN_nop;
  out->size = 4;
  return 4;
}

.. And sure enough, it worked!

The function gets correctly disassembled now.

So, we're done here?

Well, not quite, I called my solution a 'preliminary proof-of-concept solution' because it still suffers from a few of the issues mentiond above.

Given my time constraints and the need for a reliable solution, I opted to integrate Zydis into the equation as a workaround, rather than investing more time in identifying the root cause of the issue, which would be both tideous and time consuming, given i am doing this externally.

(Most disassemblers use bitmasking-tables internally for matching instructions, those table can be complex, understanding how they're constructed and how they interact with the core logic is not straightforward. The bug might be in the table definitions rather than inside the core logic.)

The new approach is to first let the IDA disassembler run as normal, but if it fails, resort back to Zydis.

// backup the instruction struct prior to fields being set from the orignal function
insn_t prefill_instruction = *out;

auto res = decltype(&ana_insn_replacement)(ana_insn)(module_data, out, unk);

// if IDA's disassembler fails, fall back to zydis
if (res == 0)
{
  LOG("disasm fail %llx\n", out->ea);
  
  ZydisDisassembledInstruction zy_ins;
  if (ZYAN_SUCCESS(ZydisDisassembleIntel(
      ZYDIS_MACHINE_MODE_LONG_64,
      out->ea, instr_bytes, read, &zy_ins
  )))
  {
    // set a comment in any case
    set_cmt(out->ea, zy_ins.text, true);

    switch (zy_ins.info.mnemonic) {
      case ZYDIS_MNEMONIC_NOP:
      {
        *out = prefill_instruction;
        out->itype = NN_nop;
        out->size = zy_ins.info.length;
        return zy_ins.info.length;
      }

      default: break;
    }
  }
}

For now, I only implemented mapping from ZydisDisassembledInstruction to insn_t for NOP, but this could be easily extended in case more stuff turns out to be broken.

If Zydis successfully disassembles an instruction that IDA's disassembler failed to handle, I add a comment to indicate this, allowing for further analysis and debugging.