With this post we will complete (at least for now) the development of the #DIYConsole graphics library. In earlier posts we have written all the functions that we need; what we must do now is checking the performance of our library and find ways to optimize it.

 

How fast is our library?

To measure the performance of our library, we will use a slightly modified version of the final example of the post #DIYConsole – Part 5: Sprites:

/* main.cpp */

#include <Arduino.h>
#include "GFX.h"
#include "DefaultFont.h"
#include "bitmaps.h"

#define STARSHIP_COUNT 1

struct StarShip {
    float x;
    float y;
    float theta;
    float previousTheta;
    float xTarget;
    float yTarget;
    uint8_t* backgroundCopy;
    uint8_t* rotatedBitmap;
    uint16_t width;
    uint16_t height;
};

GFX gfx;
StarShip ships[STARSHIP_COUNT];
unsigned long lastUpdate;
unsigned long frames;
double avgFps;

void setup() {
    // Start graphics library
    gfx.begin();
    gfx.setFont(&defaultFont);

    // Draw stars background
    for(int i=0; i<60; i++)
        gfx.drawPixel(random(0,320), random(0,480), 14);
    for(int i=0; i<30; i++)
        gfx.drawPixel(random(0,320), random(0,480), 1);
    for(int i=0; i<40; i++)
        gfx.drawPixel(random(0,320), random(0,480), 2);
    for(int i=0; i<40; i++)
        gfx.drawPixel(random(0,320), random(0,480), 7);
    
    for(int i=0; i<STARSHIP_COUNT; i++) {
        // Generate start data
        ships[i].x = random(30, 290);
        ships[i].y = random(30, 450);
        ships[i].theta = ships[i].previousTheta = random(0, 360);
        ships[i].xTarget = random(0, 320);
        ships[i].yTarget = random(0, 480);
        ships[i].backgroundCopy = (uint8_t*)malloc(4624);
        ships[i].rotatedBitmap = (uint8_t*)malloc(4624);

        // Prepare starship bitmap
        gfx.scaleAndRotateBitmap(ships[i].rotatedBitmap, &ships[i].width, &ships[i].height, starship, 32, 32, 1.5, 1.5, ships[i].theta, 15);

        // Copy background
        gfx.copyScreenBufferRect(ships[i].backgroundCopy, ships[i].x-ships[i].width/2, ships[i].y-ships[i].height/2, ships[i].width, ships[i].height);
    }

    for(int i=0; i<STARSHIP_COUNT; i++) {
        // Draw starship
        gfx.drawTransparentBitmap(ships[i].rotatedBitmap, ships[i].x-ships[i].width/2, ships[i].y-ships[i].height/2, ships[i].width, ships[i].height, 15);
    }

    // Start time
    lastUpdate = micros();
    frames = 0;

    // Draw first frame
    gfx.update();
}

void loop() {
    // Calculate frame delta time
    unsigned long now = micros();
    float deltaTime = (float)(now - lastUpdate) / 1000000;
    lastUpdate = now;

    // Restore background
    for(int i=0; i<STARSHIP_COUNT; i++) {
        gfx.drawBitmap(ships[i].backgroundCopy, ships[i].x-ships[i].width/2, ships[i].y-ships[i].height/2, ships[i].width, ships[i].height);
    }

    // Update starship position, rotation and target
    for(int i=0; i<STARSHIP_COUNT; i++) {
        // Update starship rotation
        float thetaTarget = atan2(ships[i].yTarget - ships[i].y, ships[i].xTarget - ships[i].x) * 57.2957795 + 90;

        // Turn the starship to the target in the direction of the smallest angle
        if(fabs(thetaTarget - ships[i].theta) > 180) {
            if(ships[i].theta < 0)
                ships[i].theta += 360;
            else
                ships[i].theta -= 360;
        }
        ships[i].theta += max(min(3 * (thetaTarget - ships[i].theta), 180), -180) * deltaTime;

        // Update starship position
        float xSpeed = 0.5 * (ships[i].xTarget - ships[i].x) * (1 / (1 + 0.1 * fabs(thetaTarget - ships[i].theta)));
        float ySpeed = 0.5 * (ships[i].yTarget - ships[i].y) * (1 / (1 + 0.1 * fabs(thetaTarget - ships[i].theta)));
        ships[i].x += xSpeed * deltaTime;
        ships[i].y += ySpeed * deltaTime;

        // If near the target, choose another random target
        if((fabs(ships[i].x - ships[i].xTarget) < 40) && (fabs(ships[i].y - ships[i].yTarget) < 40)) {
            ships[i].xTarget = random(0, 320);
            ships[i].yTarget = random(0, 480);
        }
    }

    for(int i=0; i<STARSHIP_COUNT; i++) {
        // Prepare starship bitmap
        if(abs(ships[i].theta - ships[i].previousTheta) >= 2) {
            gfx.scaleAndRotateBitmap(ships[i].rotatedBitmap, &ships[i].width, &ships[i].height, starship, 32, 32, 1.5, 1.5, ships[i].theta, 15);
            ships[i].previousTheta = ships[i].theta;
        }

        // Copy background
        gfx.copyScreenBufferRect(ships[i].backgroundCopy, ships[i].x-ships[i].width/2, ships[i].y-ships[i].height/2, ships[i].width, ships[i].height);
    }

    for(int i=0; i<STARSHIP_COUNT; i++) {
        // Draw starship
        gfx.drawTransparentBitmap(ships[i].rotatedBitmap, ships[i].x-ships[i].width/2, ships[i].y-ships[i].height/2, ships[i].width, ships[i].height, 15);
    }

    // Draw frame rate
    double fps = 1.0 / deltaTime;
    avgFps = (avgFps * frames + fps) / (++frames);
    gfx.drawFilledRectangle(0, 0, 40, 16, 15);
    String fpsString(avgFps, 1);
    gfx.drawString(0, 0, fpsString.c_str(), 3);

    // Update screen
    gfx.update();
}

Remember to copy bitmaps.h from the other post.

In this new version of the program, we can decide the number of spaceships that move simultaneously on the screen. In addition, the average frame rate at which the program is running will be displayed in the upper left corner.

Let’s try to run it first with a single spacecraft and then with 10. These are the average frame rates I got:

As you can see, the frame rate is quite low in both cases. The two frame rates are not very different because currently the update speed is limited by the transfer via SPI of the entire screen buffer to the display. So, if we can optimize the transfer of data to the screen, we can achieve a significant increase in performance.

 

Dirty rectangles

But is it really necessary to transfer the entire screen buffer to the display every time we need to update a frame? If we look at our example, we see that there are parts of the screen that do not change from one frame to the next (where there are no spaceships). In general, even in the case of fast paced games, there are parts of the screen that change little, such as the UI.

Thus, the first optimization we will make is to redraw only the parts of the screen that have changed from the previous frame. One technique used for this purpose is that of dirty rectangles: the graphics library keeps a list of rectangular areas of the screen that have been changed; when it comes time to redraw the screen, only the areas of these rectangles are processed.

We will use a simpler version of this technique. Instead of storing a list of rectangles, we will divide the screen into a certain number of fixed rectangles; when at least one pixel is changed within one of these rectangles, the entire rectangle is marked as “dirty” and will be redrawn during the next update.

 

Interlaced mode

If we need to redraw the entire screen, we get no benefit from the use of dirty rectangles (in fact, we would be penalized due to the additional code). Is there a way to take advantage of the partial screen redraw even in this case?

There is a way: when we exceed a certain percentage of screen to be redrawn, we can switch to an interlaced update mode: only the even or odd rows are redrawn at each frame. This way, you get an effective doubling of the frame rate; the other side of the coin is that we introduce motion artifacts (known as combing), especially visible for fast moving images.

To easily combine the dirty rectangles technique with interlaced mode, I have chosen to divide each line of the screen into 5 rectangles of 64×1 pixels (so, we have a total of 480 x 5 = 2400 rectangles). When we exceed the percentage threshold, we will redraw only the rectangles on the odd or even rows.

To begin, let’s add to GFX.h:

…
#define DIRTY_RECT_X(x)  ((x) >> 6)
…

class GFX {
    …
    private:
    …
    uint8_t dirtyRects[480][5];
    int16_t scanLine;
    …
}

 

Setting dirty rectangles

In every function that modifies the screen buffer, we need to add the code to set the dirty rectangles.

in the fillScreen function, add:

memset(dirtyRects, true, 2400);

At the beginning of the drawPixel function, add:

// Set dirty rectangle
dirtyRects[y][DIRTY_RECT_X(x)] = true;

In the drawHorizontalLine, function, add:

// Set dirty rectangles
int r1 = DIRTY_RECT_X(x);
int r2 = DIRTY_RECT_X(x + width - 1);
for(int i = r1; i <= r2; i++)
    dirtyRects[y][i] = true;

At the beginning of the drawVerticalLine, function, add:

// Set dirty rectangles
int r = DIRTY_RECT_X(x);
for(int i = y; i <= y + height - 1; i++)
    dirtyRects[i][r] = true;

In the drawBitmap function, change the code that copies the bitmap to the screen:

…
// Copy the bitmap to screen buffer line by line
uint16_t u = x + uOffset;
uint16_t v = y + vOffset;
int r1 = DIRTY_RECT_X(x);
int r2 = DIRTY_RECT_X(x + width - 1);
for( ; y <= yEnd ; y++, v++) {
    int bitmapOffset = bitmapWidth * v + u;
    if(SCREENBUFFER_SECTOR(y)) {
        int ysb = y - 256;
        int screenBufferOffset = 320 * ysb + x;
        memcpy(screenBuffer[1] + screenBufferOffset, bitmap + bitmapOffset, width);
    } else {
        int screenBufferOffset = 320 * y + x;
        memcpy(screenBuffer[0] + screenBufferOffset, bitmap + bitmapOffset, width);
    }
    for(int i = r1; i <= r2; i++)
        dirtyRects[y][i] = true;
}
…

In the drawTransparentBitmap function, change the code that copies the bitmap to the screen:

…
// Copy the bitmap to screen buffer
uint16_t uStart = x + uOffset;
int r1 = DIRTY_RECT_X(x);
int r2 = DIRTY_RECT_X(xEnd);
for(uint16_t v = y + vOffset; y <= yEnd; y++, v++) {
    uint16_t u = uStart;
    for(uint16_t xp = x; xp <= xEnd; xp++, u++) {
        int bitmapOffset = bitmapWidth * v + u;
        if(bitmap[bitmapOffset] != transparentColor) {
            if(SCREENBUFFER_SECTOR(y)) {
                int ysb = y - 256;
                int screenBufferOffset = 320 * ysb + xp;
                screenBuffer[1][screenBufferOffset] = bitmap[bitmapOffset];
            } else {
                int screenBufferOffset = 320 * y + xp;
                screenBuffer[0][screenBufferOffset] = bitmap[bitmapOffset];
            }
        }
    }
    for(int i = r1; i <= r2; i++)
        dirtyRects[y][i] = true;
}
…

 

Updating the screen

Now that the drawing functions correctly set the rectangles that have been modified, we must use this information in the update function. Also, we need to add support for interlaced mode.

First, at the end of the begin function, add this:

// Starting scan line
scanLine = 0;

and then modify the update method:

void GFX::update() {
    uint16_t buffer[64];
    
    // By default, check if all lines need to be redrawn
    int startY = 0;
    int stepY = 1;
    int y;

    // If the number of rectangles to redraw is greater than 900, switch
    // to interlaced mode: only the odd or even rows are checked, based
    // on the current scan line.
    int dirtyRectsCount = 0;
    for(int rect = 0; rect < 5; rect++)
        for(int v = 0; v < 480; v++)
            dirtyRectsCount += dirtyRects[v][rect];
    if(dirtyRectsCount > 900) {
        stepY = 2;
        startY = scanLine;
    }
    
    // Start SPI transaction
    SPI.beginTransaction(SPISettings(HX8357D_SPI_FREQUENCY, MSBFIRST, SPI_MODE0));
    digitalWrite(GPIO_HX8357D_CS, LOW);

    // Top framebuffer sector
    for(y = startY; y < 256; y = y + stepY) {
        for(int rect = 0; rect < 5; rect++) {
            if(dirtyRects[y][rect]) {
                int xStart = rect << 6;
                int offset = 320 * y;
                for(int x = 0; x < 64; x++)
                    buffer[x] = palette[screenBuffer[0][offset + xStart + x]];
                setAddressWindow(xStart, y, 64, 1);
                SPI.writePixels(buffer, 128);
            }
        }
        // Reset dirty rects for this line
        memset(&dirtyRects[y][0], false, 5);
    }

    // Bottom framebuffer sector
    for(y = y - 256; y < 224; y = y + stepY) {
        int realY = y + 256;
        for(int rect = 0; rect < 5; rect++) {
            if(dirtyRects[realY][rect]) {
                int xStart = rect << 6;
                int offset = 320 * y;
                for(int x = 0; x < 64; x++)
                    buffer[x] = palette[screenBuffer[1][offset + xStart + x]];
                setAddressWindow(xStart, realY, 64, 1);
                SPI.writePixels(buffer, 128);
            }
        }
        // Reset dirty rects for this line
        memset(&dirtyRects[realY][0], false, 5);
    }

    // End SPI transaction
    digitalWrite(GPIO_HX8357D_CS, HIGH);
    SPI.endTransaction();

    // Switch the scan line between even and odd
    scanLine = (scanLine + 1) % 2;
}

Testing the new performance

Try to run the program again, first with one and then with 10 spaceships. These are the results I got:

 

The increase in performance, especially in the case of a single spaceship, is crazy!

 

Other optimizations

There are other optimizations we can make to the graphics library to gain a few other tenths of fps.

For example, scaleAndRotateBitmap uses sin and cos trigonometric functions. Floating point functions, especially on microcontrollers, are not very fast, so we can rewrite them using precompiled tables.

In GFX.h, add these declarations:

float fastSin(int deg);
float fastCos(int deg);

and in GFX.cpp add this code:

float sinTable[91] = {
    0,
    0.017452406,
    0.034899497,
    0.052335956,
    0.069756474,
    0.087155743,
    0.104528463,
    0.121869343,
    0.139173101,
    0.156434465,
    0.173648178,
    0.190808995,
    0.207911691,
    0.224951054,
    0.241921896,
    0.258819045,
    0.275637356,
    0.292371705,
    0.309016994,
    0.325568154,
    0.342020143,
    0.35836795,
    0.374606593,
    0.390731128,
    0.406736643,
    0.422618262,
    0.438371147,
    0.4539905,
    0.469471563,
    0.48480962,
    0.5,
    0.515038075,
    0.529919264,
    0.544639035,
    0.559192903,
    0.573576436,
    0.587785252,
    0.601815023,
    0.615661475,
    0.629320391,
    0.64278761,
    0.656059029,
    0.669130606,
    0.68199836,
    0.69465837,
    0.707106781,
    0.7193398,
    0.731353702,
    0.743144825,
    0.75470958,
    0.766044443,
    0.777145961,
    0.788010754,
    0.79863551,
    0.809016994,
    0.819152044,
    0.829037573,
    0.838670568,
    0.848048096,
    0.857167301,
    0.866025404,
    0.874619707,
    0.882947593,
    0.891006524,
    0.898794046,
    0.906307787,
    0.913545458,
    0.920504853,
    0.927183855,
    0.933580426,
    0.939692621,
    0.945518576,
    0.951056516,
    0.956304756,
    0.961261696,
    0.965925826,
    0.970295726,
    0.974370065,
    0.978147601,
    0.981627183,
    0.984807753,
    0.987688341,
    0.990268069,
    0.992546152,
    0.994521895,
    0.996194698,
    0.99756405,
    0.998629535,
    0.999390827,
    0.999847695,
    1
};

float fastSin(int deg) {
    while(deg < 0)
        deg += 360;
    while(deg > 360)
        deg -= 360;

    if(deg <= 90) {
        return sinTable[deg];
    } else if(deg <= 180) {
        return sinTable[180-deg];
    } else if(deg <= 270) {
        return -sinTable[deg-180];
    } else {
        return -sinTable[360-deg];
    }
}

float fastCos(int deg) {
    while(deg < 0)
        deg += 360;
    while(deg > 360)
        deg -= 360;
    
    if(deg <= 90) {
        return sinTable[90-deg];
    } else if(deg <= 180) {
        return -sinTable[deg-90];
    } else if(deg <= 270) {
        return -sinTable[270-deg];
    } else {
        return sinTable[deg-270];
    }
}

finally, change the scaleAndRotateBitmap function to use the new functions:

…
float sinTheta = fastSin(rotation);
float cosTheta = fastCos(rotation);
…

For the same reason, it is better to avoid using the ceil function when calculating the width in bytes of a monochromatic bitmap. In drawMonochromeBitmap and drawMonochromeBitmap2x, make this change:

int widthBytes = (width + 7) >> 3;

and in setFont:

fontSize = ((font->width + 7) >> 3) * font->height;

Another small optimization that we can do is rewriting the SCREENBUFFER_SECTOR macro to use a simple bit shift operation instead of the ? operator. Testing both versions, I noticed that which macro is faster depends on the context in which it’s used (probably due to compiler optimizations).

So, I defined a second version of the macro:

#define SCREENBUFFER_SECTOR_2(y)  ((y) >> 8)

and I used it only where I measured a performance improvement, that is, in the drawPixel, drawHorizontalLine and drawVerticalLine functions.

You can find the complete code of this post here.

 

Conclusions

With this post we have completed the development of the #DIYConsole graphics library. With a rather simple technique (dirty rectangles) we have achieved a great improvement in the performance of our library, and now it’s ready for showtime! Probably in the future I will add some new drawing function, but for now I prefer to focus on other things.

Next time we will start to read the input from the touchscreen and we will finally make our console interactive!


0 Comments

Leave a Reply

Your email address will not be published. Required fields are marked *

By continuing to browse this Website, you consent to the use of cookies. More information

This Website uses:

By continuing to browse this Website without changing the cookies settings of your browser or by clicking "Accept" below, you consent to the use of cookies.

Close