OS: Ubuntu 18.04
As training for a Competition I made some Firmware, IoT Setup Preparations

Arduino (ATmega328P, ATmega2560)

Starting with reversing programs for Arduino Boards of course, it’s probably the simplest and lot of sources are available in the wild.

I Downloaded the latest software Package from their homepage and installed the toolchain, works out of the box.

Requirements I needed for Installation:

  • freeglut3-dev

In order to know where the compiled sketches are getting stored we have to enable the verbose mode in the Arduino IDE.

File->Preferences->Show verbose output

If we compile now a simple sketch we see where the .eep, .hex, .elf files are getting stored, even more nice is we see which commands are used

Since it is using the binaries from the install directory we have all the (maybe long) paths in the output but we can trim that.

-> I will skip the verbose output here and only point to the interesting parts, it is also very nice that we the avr-gcc commands and so on, like that we don’t have to google for the correct commandline arguments and so on

Compiling sketch…

avr-g++ -c -g -Os -std=gnu++11 -fpermissive -fno-exceptions -ffunction-sections -fdata-sections -fno-threadsafe-statics -Wno-error=narrowing -MMD -flto -mmcu=atmega2560 -DF_CPU=16000000L -DARDUINO=10809 -DARDUINO_AVR_MEGA2560 -DARDUINO_ARCH_AVR -I/avr/cores/arduino -I/avr/variants/mega /tmp/arduino_build_478369/sketch/MYPROG.ino.cpp -o /tmp/arduino_build_478369/sketch/MYPROG.ino.cpp.o

Here we can see the compilation and that the output is stored in the tmp directory

Compiling core… Using precompiled core:

/tmp/arduino_cache_682155/core/core_arduino_avr_mega_cpu_atmega2560_f09daf148f324672450511b9824ed27b.a

(I dont know what the core actually does, probably provides libraries)

Linking everything together…

avr-gcc -Os -g -flto -fuse-linker-plugin -Wl,–gc-sections,–relax -mmcu=atmega2560 -o /tmp/arduino_build_478369/MYPROG.ino.elf /tmp/arduino_build_478369/sketch/MYPROG.ino.cpp.o /arduino_cache_682155/core/core_arduino_avr_mega_cpu_atmega2560_f09daf148f324672450511b9824ed27b.a -L/tmp/arduino_build_478369 -lm

at this step we have our executeable

avr-objcopy -O ihex -j .eeprom –set-section-flags=.eeprom=alloc,load –no-change-warnings –change-section-lma .eeprom=0 /tmp/arduino_build_478369/MultiSerial.ino.elf /tmp/arduino_build_478369/MYPROG.ino.eep

avr-objcopy -O ihex -R .eeprom /tmp/arduino_build_478369/MYPROG.ino.elf /tmp/arduino_build_478369/MYPROG.ino.hex

avr-size -A /tmp/arduino_build_478369/MYPROG.ino.elf

Interesting is, that after we have the .elf file we dont stop, we generate a .hex file, and if we hit the upload button instead of the compile button, we get one more command which is uploading the .hex file to the arduino and not the .elf file.

avrdude -C/avrdude.conf -v -patmega2560 -cwiring -P/dev/ttyACM0 -b115200 -D -Uflash:w:/tmp/arduino_build_478369/MYPROG.ino.hex:i

The First Question of course, what if we get either the .hex file or the .elf file, can we analyse it?

Analysis

The .elf file we can simply throw into IDA or radare2 and can analyse it, we have to know which kind of avr it is, to properly map the Memory sections, but that should be quite simple.

Binary for the Analysis:

file BlinkSample.ino.elf BlinkSample.ino.elf: ELF 32-bit LSB executable, Atmel AVR 8-bit, version 1 (SYSV), statically linked, with debug_info, not stripped

How do we know which kind of Atmel it is?

looking at the strings in the binary is working on non-stripped and on stripped binaries as well.

strings BlinkSapmle.ino.stripped.elf
GCC: (GNU) 5.4.0
atmega2560
.shstrtab
.data
.text
.bss
.comment
.note.gnu.avr.deviceinfo

By knowing the architecture we can choose the correct IDA-config

Note:
To use the AVR-Toolchain, we need to export the installtion Path, or we always call it from the install directory:

export PATH=”$PATH:/PathTo/Arduino/Software/arduino-1.8.9/hardware/tools/avr/bin”


using radare2

Loading it without config into r2 already leads to nice and easy to understand results:

  • entry0 found
  • main function
  • library functions

We look into the main, we know that using an embedded board we always have a main loop which never exits and a setup function which is called once, this is how the main function is structured, first there is code which is only called once and after we have a loop which end with and rjmp building our endless loop. Looking at the main graph in r2 we get something like

[0x470]
(fcn) main
            f t 
            │ │           0x470: start of Setup 
   ╭────────╯ ╰╮
   │           │
  __x560__[ob] │
   v           │
   │           │
   ╰──────╮    │
          │ ╭──╯
          │ │
       __x586__[oc]
          v
          ╰─╮
            │
   ╭──────╮ │
   │      │ │
   │   __x58a__[of]        0x58A: start of MainLoop
   │      │ 
   ╰──────╯

Even if the binary is stripped the main function is easy to find, first we take look at the interrupt vector table which is located at the very beginning of the binary, the first entry in the table is the reset which resets the execution, and restarts the main function, so it points to to an interrupt handler which will call the main function.

Further Analysis of the main function:
If it is not stripped it’s easy (if you can read avr assembly), functions are named:

  • sym.digitalWrite.constprop
  • sym.delay.constprop

We still have the source code, so that makes it anyway easy, what is sym.digitalWrite.constprop doing, in a stripped binary we don’t have names, (I don’t know why it is called constprop, as well, I assume because it only takes one parameter which says HIGH or LOW but not the Port, so probably it is called constprop because it’s always using the same Port and Bit, TODO: verify)

It is interesting what the compiler is doing here, adding more pins to the simple program changes the definition of the sym.digitalWrite.constprop, now it takes two parameters instead of one, it takes the pin and the value (HIGH/LOW) and it doesn’t have the suffix constprop anymore.

0x000005a8  61e0  ldi r22, 0x01				; value
0x000005aa  8de0  ldi r24, 0x0d				; pin
0x000005ac  62de  rcall sym.digitalWrite

Taking a look into the digitalWrite funtion we can analyse further and see that the pin is getting hardcoded into the digitalWrite function if we only use one, that’s in the beginning of the function.

mov     r19, r24        ; r19 = value
ldi     r30, 0xD5		; r30 = hardcoded pin 

vs.

mov     r19, r22        ; r19 = value
ldi     r30, r24		; r30 = pin 

The rest of the function, does nothing special, saving SREG, disable Interrupts writing clearing or setting the pin, restoring SREG and return. We can also look up the source in wiring_digital.c.

what if we only have the .hex file?

The .hex file is the acutal code, before it gets loaded on the microcontroller, one line looks like following, if we open the original .elf file in radare and compare it we can see a pattern.

:1000000007C1000033C1000031C100002FC1000052

can be split into

:10 0000 00 07C1000033C1000031C100002FC10000 52
 |    |   |               |                   |
Idk   |  Idk          Code bytes          Checksum
   Offset

With this knowledge we can write a python script and convert the .hex file back into a binary

import binascii

hexfile = open("code.hex")
hexcontent = hexfile.readlines()

binaryfile = open("generated.elf", "wb")

for line in hexcont:
    binaryfile.write(binascii.unhexlify(line[9:-3]))

binaryfile.close()

We can analyse the generated binary using r2. We notice some difference because all the elf-header is missing and so on, for example the Linux file command only detects it as data and r2 doesn’t know where to load it, but analysing it using aaa detects the entry and which is calling the main function, so same procedure as before.

Analysing AVR-code can be hard, it’s a different ISA and if you are not familar with it, it can get hard and will require lot instruciton googling…


Dynamic Analysis

simavr is the simulator of choice, it works out of the box and is easy to use, since the ATmega2560 and ATmega328P are preconfigured.

simavr -m atmega2560 program.elf

If we want to debug the program we can append -g to wait for a gdb connection.

simavr -m atmega2560 -g program.elf

Here we have to use avr-gdb from the avr-toolchain, or, and thats my favourite, we use r2 to debug the program :)

r2 -a avr -d gdb://127.0.0.1:1234

We just have to specify the architecture and we are good to go, the next awesome thing it can do is, it can simulate a .hex file, which is a really cool feature.

simavr -m atmega2560 -f 16MHz -g program.hex

Depending on controller and it’s frequency we can again simply debug it with r2, if we really wanna use avr-gdb we would have to use the following gdb commands.

avr-gdb
(gdb) target remote:1234

Running the Code is nice, but like that we cannot give input, unless we modify registers by our own, the simavr repository contains also a functional emulator, in the examples folder we find a simduino folder which is an arduino emulator, be careful it emulates the ATmega328p and not ATmega2560, I didn’t check how much work it is to change this to another architecture I just recompiled my program for the ATmega328p.

If you run an ATmega2560 code using the emulator, it will SEGFAULT.

Ok, so now it’s getting a bit hacky, I am not sure if r2 is really showing the memory and so on correct, I did not find a way to look at SRAM values, which is quite annoying. The debugging works fine and using picocom works also fine to give SerialInput.

To make comparision I tried the same using gdb, to use a python plugin for the avr-gdb we first have to compile gdb to support avr and python.

I had to install texinfo as well to make it work.

I really cannot say that this is nice to use, the gef and peda do not support AVR, the gdbinit file which made it finally working was: https://github.com/cyrus-and/gdb-dashboard/blob/master/.gdbinit

Using this I was able to step through the binary with gdb setting breakpoints was still a challenge because I always got a leading 0x80…. which is not the correct address… I could solve this by using b *($pc+0xOffset) instead of b *<addr>.

GDB-Commands, I always forget…

Command Action
ni step over
si step into
x/b <addr> show 1 bytes of addr
x/h <addr> show 2 bytes of addr
x/6i $pc-3 show 6 dissasembled instructions
i r info registers
bp *<addr> create breakpoint at addr
i b list breakpoints
d <num> delete breakpoint num


My final commands now to emulate an arduino board using the simduino emulator: (first we have to build it of course)

1: ./obj-x86_64-linux-gnu/simduino.elf -d /pathTo/program.atmega328p.hex
2: picocom /tmp/simavr-uart0

either:

3.0: avr-gdb -q
3.1: (gdb) target remote:1234
3: r2 -a avr -d gdb://127.0.0.1:1234

All commands should to be executed in differnt terminals, since you need the picocom terminal for inputs and the debugger…


Performing now debugging, is still not a easy task, even we have now all the tools needed and a working debugging setup, if we for exmaple don’t know the instruction which is reading the input from the UART, or if we don’t know that we have to search for ld r24, x we will not succeed, and we will not be able to name functions and break down the program into smaller parts. Therfore now some AVR basics:

We can find the Pin/Port and so on numbers in iomxx0_1.h (TODO: find a better listing), during debugging our assembly has no labels, so we don’t know if it is UBRRnH or RXENn or something else…

Knowing this, and the rest will be hopefully reading simple code :)


RISCV

Sometime it might be also necessary to debug an analyze RISCV-programs and binaries, this is again way easier if you can make dynamic analysis than only static analysis.

There are two different types of programs where we have to take care of if it is a program for a riscv OS, than we can use the emulator or if it a program for an riscv embedded system, than we need a full system emulator. For both cases we have good tools which help us.

RISCV Program

There is a just in time interpreter for RISCV binaries, which can be found at https://rv8.io including all the installation instructions.

It also needs the riscv-gnu-toolchain, this is a quite big one and will take it’s time for compiling.

It really works nicely out of the box. I didn’t found a debug listener which allows connecting gdb or r2 to it, but I found a quite nice Instruction Tracer which prints all instructions, this maybe quite helpfull and can be enabled using the -l switch.

rv-jit program

(I didn’t play around with the debug cli for now)

RISCV Embedded System

To run/simulate a embedded system software we would need the corresponding hardware if we don’t have that we can use qemu. Luckily it has RISCV support.

Download the latest qemu version from https://download.qemu.org unpack it and install it with:

./configure –target-list=riscv64-softmmu,riscv32-softmmu,riscv64-linux-user,riscv32-linux-user
make -j4
sudo make install

start Program/Kernel using:

qemu-system-riscv32 -nographic -machine sifive_e -kernel program

There are a lot of extra options available which makes the usage very nice, and we can also debug it of course using the -s switch, which is a shorthand for -gdb tcp::1234.


IDA

still a TODO