Friday, September 28, 2012

Fun with Constrained Programming

Believe it or not, RAR files can contain bytecode for a simple x86-like virtual machine called the RarVM. This is designed to provide filters (preprocessors) to perform some reversible transformation on input data to increase redundancy, and thus improve compression.

For example, one filter (likely inspired by LZX, an earlier scheme with a similar feature) is called "Intel E8 preprocessing", which is designed to increase redundancy in x86 code.

If you imagine a program like this:

mov   foo, bar 
cmp   bar, baz
push  foo 
call  write
push  bar
call  write

The two calls are relative to the branch location, so there will be two different encodings of the same instruction. At compress time, we can translate them to absolute addresses. This way, the same instruction appears twice, and can therefore be compressed much more efficiently. When the archive is decompressed, this transformation can be reversed to restore the original.

WinRAR includes around a dozen standard filters that improve compression of several common inputs, but surprisingly also allows new filters to be defined at runtime by archives!

As far as I'm aware, no tool exists to explore this functionality. This is just too tempting to play with, so I've written some. So far, I have a RAR toolchain working, as well as some documentation on how to write programs.

You can checkout the code and some example programs on github,

$ cat
#include <constants.rh>
#include <util .rh>

; vim: syntax=fasm
; Test RAR assembly file.
    ; Install our message in the output buffer
    mov     r3, #0x1000                 ; Output buffer.
    mov     [r3+#0], #0x6c6c6548        ; 'lleH'
    mov     [r3+#4], #0x57202c6f        ; 'W ,o'
    mov     [r3+#8], #0x646c726f        ; 'dlro'
    mov     [r3+#12], #0x00000a21       ; '!\n'
    mov     [VMADDR_NEWBLOCKPOS],  r3   ; Pointer
    mov     [VMADDR_NEWBLOCKSIZE], #14  ; Size
    call    $_success
$ make helloworld.rar
cpp -I../stdlib < > helloworld.ri
../raras -o helloworld.ri
../rarld > helloworld.rar
rm helloworld.ri
$ unrar p -inul helloworld.rar
Hello, World!

At the moment, the toolchain consists of a minimal linker and assembler. I've been toying with the idea of porting a compiler (possibly an llvm backend, or gcc target), but I'm not sure if I'm that crazy.

Needless to say, programming something as ridiculous as WinRAR presents some interesting challenges, but the fact that it's so widely deployed presents some some fun possibilities. Here is a quick overview of the architecture.

Familiarity with x86 (and preferably Intel assembly syntax) would be an advantage, as the VM was clearly inspired by x86. There are 8 named registers, called r0 to r7.

r7 is used as a stack pointer for stack related operations (such as push, call, pop, etc), and grows down. A RarVM filter may execute for at most 250,000,000 instructions, at which
point it will be terminated. However there are no limits on the number of filters included in a file, so you can simply split your task into multiple 250M cycle chunks. If the instruction pointer ever moves outside the code segment, the program is considered to have completed

Filters have a 0x40000 byte address space, and access to 3 status flags : ZF (Zero), CF (Carry) and SF (Sign) which can be accessed via the conditional branch instructions, or via pushf/popf (as on x86).

These are the operand types supported

  • Registers (r0, r1, ..., r7)
  • Memory references ([#0x12345])
  • Register indirect references ([r0])
  • Base + Index indirect references ([r4+#0x1234])
  • Immediate (#0x12312)

As a convenience, the assembler also supports symbolic references that are resolved to immediates at compile time, just prefix them with a $.

  • Symbolic references (jnz $next_loop)

There are no restrictions on how many memory references you can make (like on Intel), so this is perfectly legal:

mov [r0+#0x1234], [r7-#12]

RAR programs can either supply their own initial register values, or use the default set which includes some status information. If you don't specify your own registers, these will be set:
  • r3: Address of Global Memory Buffer
  • r4: Filter Block Length
  • r5: Filter Execution Count
  • r7: Top of address space (and thus the stack grows down).

Dynamic Output

So here's the fun part, because RAR is intended to be an archiver and not a programming environment, the Windows version of WinRAR requires the output from your program matches the CRC stored in the volume header....Ouch.

That would normally mean that you can't write a program whose output you cannot predict at compile time, making things a little less interesting. However, there's no need to give up so easily.

Julien wrote a paper on dynamic CRC compensation algorithms (french, but the code is understandable). I've ported his algorithm to RarVM, so I now have a self-hosted RAR file that can produce dynamic output and compensate it's own output at runtime to match a fixed CRC. 

Yes, that's pretty ridiculous.

Here is a portion of the CRC compensation code:

; Modifies a few bytes within the first 7 bytes of buffer to force the CRC32 to
; match the requested value. You should reserve the first few bytes of buffer
; for this process.
; Original implementation based on a paper by Julien Tinnes.
        ; [r6+#16]  target     // Desired CRC value.
        ; [r6+#12]  length     // Size of buffer.
        ; [r6+#8]   buffer     // Pointer to buffer to compensate.
        ; [r6+#4]   r
        ; [r6+#0]   s
        ; [r6-#4]   modlen     // Length % 4
        push    r6                          ; save frame pointer
        mov     r6, r7                      ; create new frame
        sub     r7, #4                      ; allocate variables
        push    [r6+#12]                    ; length
        push    [r6+#8]                     ; buffer
        call    $_crc_block                 ; _crc_block(buffer, length)
        xor     [r6+#16], r0                ; target ^= currentcrc
        push    #4                          ; divisor
        push    [r6+#12]                    ; length
        call    $_mod                       ; _mod(length, 4)
        mov     [r6-#4], r0                 ; save this calculation
        sub     [r6+#12], [r6-#4]           ; length -= modlen
        div     [r6+#12], #4                ; divide by 4
        sub     [r6+#12], #1                ; calculate row for _find_vector_x
        push    [r6+#12]                    ; row
        push    [r6+#16]                    ; target
        call    $_find_vector_x             ; _find_vector_x(target, row)
        push    r0                          ; value
        call    $_bswap                     ; byte swap result
        add     [r6+#8], [r6-#4]            ; calculate offset to modify
        mov     r1, [r6+#8]                 ; load address
        xor     [r1], r0                    ; compensate crc
        xor     r0, r0                      ; return code
        mov     r7, r6                      ; clear frame
        pop     r6

And the result:

$ cat 
; vim: syntax=fasm

; Test RAR assembly file that just demonstrates the syntax.

    ; Install our message in the output buffer
    mov     r3,       #0x1000            ; Output buffer.
    mov     [r3+#0],  #0x41414141        ; Padding for compensation
    mov     [r3+#4],  #0x0a414141        ; Padding for compensation
    mov     [r3+#8],  #0x6c6c6548        ; 'lleH'
    mov     [r3+#12], #0x57202c6f        ; 'W ,o'
    mov     [r3+#16], #0x646c726f        ; 'dlro'
    mov     [r3+#20], #0x00000a21        ; '!\n'
    mov     [VMADDR_NEWBLOCKPOS],  #0x00001000
    mov     [VMADDR_NEWBLOCKSIZE], #22

    ; Compensate to required CRC
    push    RAR_FILECRC
    call    $_compensate_crc
    test    r0, r0
    jz      $finished
    call    $_error

    call    $_success
$ make sample.rar
cpp -Istdlib < > sample.ri
./raras -o sample.ri
./rarld > sample.rar
rm sample.ri
$ unrar p -idq sample.rar 
Hello, World!


This is all pretty useless, but that's what makes it fun :-)


Riccardo said...

Considering the code is automatically executed during unrar...I wonder if it is possible to break out the sandbox? How constrained is the VM?

dodysw said...

Good hacking.

Unknown said...

What other calls are available unless 'write'?

Sune Mølgaard said...

I'm not entirely sure how or why, but after posting this to Facebook, when people visit the link, they get a security warning from facebook...

Anonymous said...

Not to tout my own horn too much, but I was a co-author on a paper that discusses achieving a given CRC in more abstract detail, but also gives a concrete C implementation for CRC-32 – in English, for those, like me, who don't speak French. :-)

Spiff said...

I can't figure out how you're generating that output I don't see any calls to a print function. Is output supposed to reside at a fixed location? Could you explain?

Spiff said...

Never mind, I see the output buffer now. How can you do anything but write to stdout though, for instance, how would you output decompressed data?

R4ndom said...

I have been messing with this for a while, though you have surpassed me. Can you please write me? I cannot find a link to your email address anywhere. I just wanted to discuss your amazing find. Thanks. Random

You can email me thru my site:

Unknown said...

Bravo, sir. Bravo! Madness like this is great fun, and you have earned mad kudos!

Jeff Karrels said...
This comment has been removed by the author.