Assembler Lab(s) | PenguinPro.ca

In our most recent lab for Software Portability and Optimization class we were taught to work directly with the assembler. This lab was split over two lab sessions, although for simplicity and to keep information together I will be creating a single post covering topics learned, including social interactions. I mention that separately from the main technical course material because it was been amazing to meet people with a genuine passion for the way computer works, not just the work that can be done with them. I personally get frustrated with so many people trying to always find application and overlooking the beauty of the complexity. There was a reason why pocket watches used to have glass cases; it’s okay to just appreciate the machine itself.

My personal experience in regards to the assembler is rooted in the MOS 6502/6510 processors; because of this I couldn’t help but notice some similarities to the AArch64 chip. For those readers who might not be totally familiar with the 6502 family of processors; they are an older 8 bit CPU used in many early computers, most notably the PET. The 6502 itself had 3 general purpose registers, X, Y and A, and had a clock speed of about 1MHz. The modern processors that we are working with in this class are almost incomprehensibly more powerful, and includes features that programmers of these early chips could only have dreamed of.

For more Info about the pet: https://en.wikipedia.org/wiki/Commodore_PET

The reason I bring up a now obscure piece of hardware, is because for anyone who really, (like really) wants to understand how a computer works, this is an excellent place to start. It is also where I began to learn more than just relating the syntax of a programming language to the output it produces. Again, at this level try not to think about the machine as a tool, but as a work of art. That is the real beauty of assembly language, it can be a window into the machine.

About:

For our labs we worked with two different machines with different CPU architectures, the first was x86_64 and the second was an AArch64 based machine. The task was that we were to create a program that output text in a loop with a counter appended telling which loop iteration each output was from.

For each of the two labs I worked with different groups of people which personally made the experience more interesting. The reason for this is in the first group I was working with started from scratch, wrote our code and then reworked it. The next group I worked with was working on the other architecture (Aarch64) however their design was half complete which was an opportunity to try and wrap my head around blocks of already written code on an unfamiliar platform.

First Lab:

For the first lab I worked with one group where we started from scratch. We created the source file which output the text and then wrapped it in the instructions that created the loop which when contrasted with a higher level programming language you can see the way program control is modified. For example in a language such as C you can see that a control statement such as an “if” or “while” statement boil down to essentially a compare, which sets the appropriate flags in the status registers and then sets or “jumps” The instruction pointer (EIP) to offsets from its current position. These offsets are the entry point of blocks of logic that are defined to handle the data manipulation as appropriate. These EIP jumps can be both forward and backward to either create a loop, where blocks of instructions are executed multiple times or the jump can be used to set EIP to a more forward position essentially “bypassing” blocks of instructions that should not be executed in the current context.

In regards to structure, this group wrote their code into labeled routines that could be jumped to by using the call instruction. A call instruction is different from a jump in that it pushes the return address to the stack before jumping EIP to the entry point of the called label. The great thing about using a call instruction is that paired with a return instruction in the routine, the return address on the stack can be used to restore EIP to its previous state. This creates a very maintainable and straightforward way to handle program execution.

Second Lab:

For our second lab, it was a lot of new material, and much of what I felt I knew about the computer dissolved quickly and was somewhat lost. However, this is where my past interest with the old MOS chip comes in, if you read above. At this point in the exercise our group was trying to work with a load instruction, the name of which got my attention. When we were instructed to change it to a store instruction, I was starting to see an even stronger relation. Then finally the group showed me that the load instruction brought values in from an address accessed by an offset and I was fascinated.

The reason that the aarch64 CPU did this is because it can only handle certain values in immediate mode. Which is because of the limitations introduced by the 32bit instruction encoding. This very quickly reminded me of some of the complex addressing used in some vintage computers (although somewhat reversed.) The vintage MOS 65xx based machines that I used to “play” with had 8 bit registers with a 16 bit address bus. This meant that it was not possible to index past 256 possible addresses from a single loaded address; which, wouldn’t provide access to even a quarter of the possible screen memory locations from any single point. The solution to this was to reload the address once the index hit its limit (256) and then advance another 256 addresses from the new loaded address.

I apologize if I deviated too much from the main topic but it is difficult for me to overlook. This relation gives me a new appreciation for aarch64 architecture, specifically the unique ways of getting around its limitations.

Now back to what we were doing before personal passions took over. The task was very similar, create a program in assembler that printed “Loop: xx” followed by an appropriate index depending on what iteration the program was currently outputting. The way this was accomplished was by using two labeled routines, the first outputting the first number, the second was accessed after comparing the index to see if it required another digit (eg. Greater then 9) and then branching execution forward to the second label which had the ability to handle the second digit of the number. Finally instead of jumping execution to a routine that ended the program, execution was just allowed to hit the next instruction past b.ne (branch not equal) where registers where set up for a system call to exit. This created a type of end point similar to running out of track as opposed to the X86_64 version we created where execution was bounced to a routine that made an exit system call.

Final Overview

In regards to the X86_64 based architecture, it seemed quite straight forward; it is about as user/developer friendly as assembler can get. There are many ways to accomplish a task as well as many instructions that provide all the functionality that would be required at this level. The only major complexity with this chip design is when it comes to working with word sizes, both in registers and on the stack; however, this is not unique to x86_64.

AArch64 based architecture required some personal study to grasp, and I feel it is important to note that x86_64 does not translate easily, not just with instruction encoding but also basic operations. Some examples would be how the stack is maintained and how multiple instructions are fetched and not just pointed to. For example; it seems that multiple instructions are loaded during each instruction cycle; a complication this creates is that multiple saved frame pointers need to be stored on the stack to properly reset the stack.

After playing with the Aarch64 source I needed to dig a little deeper into understanding how this architecture handles the stack as well as calling, or branching execution to subroutines within the program. Admittedly I have been sitting on a resource that I’ve been meaning to read for more than a year, if not longer. These lab exercises finally forced me to read them while try and understand this new concept.

https://azeria-labs.com/writing-arm-assembly-part-1/

The above link is to a tutorial into ARM assembly, and although I am not fully clear on the difference between ARM64 and AArch64 it does provide a lot of information and excellent graphics that illustrate beautifully what is happening during execution on ARM based chips.

Final Resources

During my attempt to understand ARM versioning and the exact meaning of AArch64 I found this anchored link to a Wikipedia article that may provide more info or at least a jump off point for further study into the naming/versions:

https://en.wikipedia.org/wiki/ARM_architecture#AArch64

And for anyone interested in 6502 ASM, here is my fav assembler:

https://www.cc65.org/

https://github.com/cc65/cc65/issues

As always, thank you for reading!