Speed comparison pyCPU vs. 8bit AVR

The Following is a comparison of the execution speed of a very simple PORT toggling programm between an 8bit atmega AVR and the pyCPU, just to get an idea.

The C Programm for the AVR:

#include <avr/io.h> // (1)

int main (void) {            

   DDRB  = 0xFF;             
   PORTB = 0x00;             

   while(1) {              
     PORTB=PORTB^1;
   }                        

   return 0;                 
}

The python Programm for the pyCPU:


def main():
  global PORTC_OUT

  PORTC_OUT=0

  while 1:
    PORTC_OUT=PORTC_OUT^1 

Comparison of the Bytecode instructions vs. the AVR Assembler instructions of the mainloop:

In this simple example the instructions generated look quite similar although the pyCPU has a stack based architecture (the xor instruction has no parameter and is always executed with the items on the top of the stack). In this example the pyCPU executes the code faster because the jump there only takes one cycle. Another thing is that a normal 8bit AVR runs in the 0-20 Mhz range. But the pyCPU can be ran for example on a CycloneII FPGA with approximatly 60 MHz. Anyway this example is maybe only hardly usfull. But it makes pyCPU look good!!

Next thing to go into some detail is probably code-density and resource usage.

  1. Michael Dreher

    For an ATmega168 you would write the following code which saves the IN and EOR instructions:
    DDRB = 0xFF;
    PORTB = 0x00;
    register uint8_t toggleMask = 1; // give the compiler a hint to use a register for the constant

    while(1) {
    PINB = toggleMask; // toggles the bits which are set to 1 in the mask (see chapter 13.2.2 in the document doc8271)
    }

    An optimized compiler run with -O2 should also move the LDI instruction to load the constant before the loop so the resulting code within the loop should be:
    ldi r25, 1
    loop:
    out 0x23, r25 // PINB=0x23
    rjmp loop

    This code needs 3 cycles per loop.

    Michael

Leave a comment