Introduction

The goal was to enable JTAG and write a JTAG debugger to debug an Arduino from another Arduino.

Information Gathering

It needs to be checked if the JTAGEN bit in the highFuse byte is set to zero. In this case zero means programmed. To find this out already took me a while.

In the ATmega2560 datasheet the fuse byte description can be found in section 30. Memory Programming. 30.2 explains the fuse bits, there exist 3 fuse bytes Extended, Low, High. The 6th bit of the high byte enables JTAG.

How to write to it? It cant be simply written from a user program uploaded by the ArduinoIDE.

Burn a new BootLoader

The Arduino Software pack has a lot of files in it, including the bootloader source code. In my case its: arduino-1.8.10/hardware/arduino/avr/bootloaders/stk500v2/stk500boot.c. A bit of grepping should give you the correct one.

The bootloader itself is needed to simply upload the arduino sketches to the board, if you dont need that feature the bootloader can be removed and extra 2k of flash memory is gained. Deleteing the bootloader also removes the startup delay from the board, it would immediatelly start with the execution of the stored program.

Replacing the bootloader needs to be done via the onside ICSP (In-Circuit Serial Programming) interface, thos are extra 6pins somewhere in the center of the board, writing ICSP next to it. These can be written to via SPI, examples are shown here [1].

The Bootloader can also be burned using AndroidIDE itself, it has a Burn Bootloader function. This however needs a second arduino. Which connects to the target arduino via the ICSP interface, obviously.

(Trying to give a drawing, but there are super nice online resources available).

PC ---- USB cable ---- programmer Arduino
							|
					connected via 6pins to the ICSP
						    |
					target Arduino
					(powered via the 6pins going to ICSP)

Check online for the correct pin mapping, be careful with the reset pin.

NOTE: By looking at different mappings online, you might find pins used interchangably, this is cause (talking about ArduinoUno) PIN11 is directly connected with MOSI Pin from the ICSP. Same goes for the other ICSP Pins.

Now back to the original question on writing the highFuse bit, this can be done by burning the bootloader, as during compiling it takes the config from the boards.txt file from the hardware folder. The file specifies bootloader configurations for every arduino. For the ATmega2560 for example:

##############################################################

mega.name=Arduino/Genuino Mega or Mega 2560

mega.vid.0=0x2341
mega.pid.0=0x0010
mega.vid.1=0x2341
mega.pid.1=0x0042
mega.vid.2=0x2A03
mega.pid.2=0x0010
mega.vid.3=0x2A03
mega.pid.3=0x0042
mega.vid.4=0x2341
mega.pid.4=0x0210
mega.vid.5=0x2341
mega.pid.5=0x0242

mega.upload.tool=avrdude
mega.upload.maximum_data_size=8192

mega.bootloader.tool=avrdude
mega.bootloader.low_fuses=0xFF
mega.bootloader.unlock_bits=0x3F
mega.bootloader.lock_bits=0x0F

mega.build.f_cpu=16000000L
mega.build.core=arduino
mega.build.variant=mega
# default board may be overridden by the cpu menu
mega.build.board=AVR_MEGA2560

## Arduino/Genuino Mega w/ ATmega2560
## -------------------------
mega.menu.cpu.atmega2560=ATmega2560 (Mega 2560)

mega.menu.cpu.atmega2560.upload.protocol=wiring
mega.menu.cpu.atmega2560.upload.maximum_size=253952
mega.menu.cpu.atmega2560.upload.speed=115200

mega.menu.cpu.atmega2560.bootloader.high_fuses=0xD8
mega.menu.cpu.atmega2560.bootloader.extended_fuses=0xFD
mega.menu.cpu.atmega2560.bootloader.file=stk500v2/stk500boot_v2_mega2560.hex

mega.menu.cpu.atmega2560.build.mcu=atmega2560
mega.menu.cpu.atmega2560.build.board=AVR_MEGA2560

The highFuse byte is set by default to 0xD8 (1101 1000), which means:

OCDEN   unprogrammed	(on chip debugger)
JTAGEN  unprogrammed  
SPIEN   programmed	(Serial Program and Data Downloading)
WDTON   unprogrammed  (Watchdog Timer always on)

EESAVE  unprogrammed  (EEPROM memory is preserved thourgh the Chip Erase)
BOOTSZ1 programmed
BOOTSZ0 programmed
BOOTRST programmed

To enable JTAGEN the byte needs to be set to 0x98.

Before writing the highFuse, lets read it first via ICSP

I thought before writing it, it might be smart to read it via ICSP to verify if it is really 0xD8. To do this I used the Atmega_Board_Detector sketch from [1].

This little program is very nice, it talks to the target arduino via the ICSP interface, using SPI obviously. (If SPI is not hardware supported on your Arduino it also has an implementation for Bit Banged SPI [2]).

In the setup function it starts by setting the Serial baudrate to 115200, this is for the serial monitor on which it prints the read data. Ok yea, not gonna explain the full code here, you can read it yourself.

The output however is:

Atmega chip detector.
Written by Nick Gammon.
Version 1.20
Compiled on Aug  9 2022 at 20:32:50 with Arduino IDE 10810.
Attempting to enter ICSP programming mode ...
Entered programming mode OK.
Signature = 0x1E 0x98 0x01
Processor = ATmega2560
Flash memory size = 262144 bytes.
LFuse = 0xFF
HFuse = 0xD0
EFuse = 0xFD
Lock byte = 0xCF
Clock calibration = 0x9D
Bootloader in use: Yes
EEPROM preserved through erase: Yes
Watchdog timer always on: No
Bootloader is 8192 bytes starting at 3E000
...

Here it can be seen the the HighFuse is not the default value, it is 0xD0. This means the EESAVE bit is programmed. Using a small modification we can use the Board Detector to set the Fuse as well, and this is the point where I kind a bricked my device… I wasnt careful and I set the LowFuse to 0x90 instead of the HighFuse, this means now that my clock source is set to 0000 -> External Clock. This doesnt necessarily mean bricked cause I can attach an external clock to the chip, on pin XTAL1, but I can really solder on the SMD chips… So this Project kinda needs to rest until I got to borrow a different board..

Measuring execution delay

Playing around with a 8 channel logic analyzer, the goal was to measure instruction execution delay. The analyzer can sample in 24MHz rate which is enough for the 8MHz ATmega328p.

Two little programs have been written, both set a digital pin and the analyzer measures the rising edge.

Using Arduino Library:

void setup() {
  pinMode(12, OUTPUT);
  pinMode(13, OUTPUT);
}

void loop() {
  digitalWrite(12, HIGH);
  digitalWrite(13, HIGH);   
  delay(2);  
  
  digitalWrite(12, LOW);   
  digitalWrite(13, LOW);  
  delay(2);
}

Non Arduino Library:

// same setup code

void loop() {
  PORTB |= (1<<5);
  PORTB |= (1<<4);
  delay(2);

  PORTB &= ~(1<<5);
  PORTB &= ~(1<<4);
  delay(2);
}

Checking the assembly:

Arduino Library:

# taken an annotated in IDA
╭─> loc_1CA:
|   ldi     r22, 1
|   ldi     r24, 12
|   call    digitalWrite_8F
|   ldi     r22, 1
|   ldi     r24, 13
|   call    digitalWrite_8F
|   call    delay200_100
|   ldi     r22, 0
|   ldi     r24, 13
|   call    digitalWrite_8F
|   ldi     r22, 0
|   ldi     r24, 12
|   call    digitalWrite_8F
|   call    delay200_100
|   sbiw    Y, 0
╰─< breq    loc_1CA

DigitalWrite function is not inlined, its two seperate function calls, the digitalWrite function has 40instructions, the path which is most likely taken, as its not a PWM pin, has 34 instructions.

Non Arduino Library:

╭─> 0x000002ea      2d9a           sbi 0x05, 5
╎   0x000002ec      2c9a           sbi 0x05, 4
╎   0x000002ee      0e94aa00       call fcn.00000154   // delay function
╎   0x000002f2      2d98           cbi 0x05, 5
╎   0x000002f4      2c98           cbi 0x05, 4
╎   0x000002f6      0e94aa00       call fcn.00000154   // delay function
╎   0x000002fa      2097           sbiw r28, 0x00
╰─< 0x000002fc      b1f3           breq 0x2ea

We can see that the writting to portb takes exactly one instruction.

Given this we should be able to measure 34x times delay between the rising edges comparing the two programs.

DigitalWrite

ManualWrite

And indeed this can be measured using the logic analyzer, manual write delay is 125ns(8MHz) which is one instruction cycle. The digialWrite method has a delay of 3625ns, which is a 29x increase. Including the 3 instructions for the function parameter setup the 37 instruction delay is not reached, here I would strongly guess that not all instructions have the same delay, maybe the 29x comes from arithmetic instructions, instructions like cli might have a faster timing, but yea not sure, although this can be measure, by placing instructions between the two pin writes and measure the increase delay. This needs to done via inline assembly.

Its also worth mentioning that setting the pins without the arduino library function decreases the program size by about 200 bytes.

Adding a cli instruction via cli(); between the two pin sets, gives exactly one extra instruction and the timing is indeed different, the delay is now 167ns, the cli instruction adds 42ns. Lets test a view more instructions.

cli 42ns
clr 42ns
nop 42ns
eor 42ns
inc 42ns
mul 125ns

call 250ns
ret  250ns
call & ret  500ns

This shows some pretty nice timing differences.

Resources

[1] https://github.com/nickgammon/arduino_sketches
[2] https://circuitdigest.com/article/introduction-to-bit-banging-spi-communication-in-arduino-via-bit-banging

These two are especially nice

http://www.gammon.com.au/forum/?id=11635
http://www.gammon.com.au/forum/?id=11633

unsorted links

https://docs.arduino.cc/built-in-examples/arduino-isp/ArduinoISP
https://www.brainbytez.nl/hardware/arduino-burn-jtag-enabled-bootloader-on-arduino-mega-atmega2560/
https://github.com/sowbug/JTAGWhisperer
https://hsbp.org/fuji
https://sourceforge.net/p/openocd/code/ci/master/tree/
https://learn.adafruit.com/arduino-tips-tricks-and-techniques/bootloader