Assembly Programming for Beginners

What is programming in its essence, independent of any specific language? The variety of answers is astounding. The most common definition you’ll hear is that programming is the creation of instructions or commands for a machine to sequentially execute in order to solve a particular problem.

This definition is quite accurate, but in my opinion, it doesn’t capture the full scope, much like defining literature as the composition of sentences for a reader’s sequential consumption. I tend to believe that programming is closer to creativity, to art. Like any form of art—an expression of creative thought and ideas—programming reflects human thought. And thought can range from brilliant to utterly mediocre.

Regardless of the type of programming we engage in, success relies on practical skills combined with an understanding of fundamental principles and theory. Theory and practice, learning and effort are the cornerstones on which success is built.

In recent times, assembly language has unjustly been overshadowed by other programming languages. This is primarily due to the global commercialization that focuses on maximizing profits in the shortest possible time. In other words, mass appeal has triumphed over elitism. From my perspective, assembly aligns more with the latter. It’s more expedient to quickly train a student in languages such as C++, C#, PHP, Java, JavaScript, or Python, making them capable of developing mainstream software without questioning the deeper reasons behind their actions, rather than nurturing an expert in assembly language. A telling example of this is the expansive market for programming courses in nearly every language except assembly.

This same trend is evident in university education and academic literature. Until today, a significant portion of study material is based on early 8086 series processors, focusing on the so-called “real” 16-bit mode and the MS-DOS operating environment. One reason might be that with the advent of IBM PCs, educators had no choice but to switch to this platform due to the unavailability of others. Over time, the 80×86 series continued to support DOS mode, which was a cost-saving measure avoiding the need for new computers and textbooks to study new processor architectures. However, this choice of platform for learning is now completely unacceptable. MS-DOS had become obsolete by the mid-1990s, and with the transition to 32-bit processors, starting with the 80386, the instruction set has become much more logical. Therefore, it’s pointless to waste time learning and explaining the quirks of real mode architecture that will never be seen in any processor again.

When it comes to choosing an operating environment for studying assembly language, especially regarding the 32-bit instruction set, the options are relatively limited. You can choose between Windows operating systems and the UNIX family.

It is also worth discussing which assembler to choose for different operating environments. As you know, there are two types of assembler syntax used with x86 processors: AT&T syntax and Intel syntax. These syntaxes represent the same instructions in completely different ways. For example, an instruction in Intel syntax looks like this:

mov eax,ebx

In AT&T syntax, it will look different:

movl %eax,%ebx

In UNIX environments, the AT&T syntax is more popular, but there are no tutorials available for it; it is described only in reference and technical literature. Therefore, it makes sense to choose an assembler based on Intel syntax. For UNIX systems, there are two main assemblers: NASM (Netwide Assembler) and FASM (Flat Assembler). In the Windows family, FASM and MASM (Macro Assembler) from Microsoft are popular, and there was also TASM (Turbo Assembler) from Borland, which the company stopped supporting quite some time ago.

In this series of articles, we will explore Windows development using the MASM assembly language—simply because I prefer it. Many authors, when introducing assembly language, integrate it into a C language framework. They do this under the assumption that starting with practical examples in an operating environment is quite difficult: you need to understand both basic programming principles and processor commands. However, even this approach requires some preliminary knowledge of C. This series will focus solely on assembly from the very beginning, without confusing the reader with unfamiliar concepts, although connections with other languages will be discussed later on.

It’s important to note that when learning the fundamentals of programming, not just in assembly language, it’s incredibly beneficial to have an understanding of console application culture. It’s not ideal to start learning by immediately creating windows, buttons, and other elements of graphical applications. There’s a misconception that the console is an archaic remnant of the past, but that’s not the case. A console application is almost free of external dependencies from a graphical interface and is primarily focused on accomplishing a specific task. This provides an excellent opportunity to concentrate on learning the basic principles of programming and assembly language without distractions, including becoming familiar with algorithms and developing them to solve practical problems. By the time you’re ready to explore graphical applications, you’ll have a substantial knowledge base, a clear understanding of how the processor works, and, most importantly, a consciousness of your actions: how and what works, and why.

What is Assembly Language?

The term assembler refers to a “collector” or “assembler” in English. It is actually the name of a translator program that takes in text containing symbolic representations of machine instructions, which are human-friendly, and converts these symbols into a sequence of corresponding machine code instructions understood by the processor. Unlike machine instructions, their symbolic representations, also known as mnemonics, are relatively easy to remember since they are abbreviations of English words. For simplicity, we will refer to mnemonics as assembler commands. The language of these symbols is called assembly language.

In the early days of computing, the first electronic computers occupied entire rooms and weighed several tons, with memory capacities smaller than a bird’s brain. The only way to program back then was to enter instructions directly into the computer’s memory in binary form by toggling switches, wires, and buttons. The number of such toggles could reach several hundred and grew as programs became more complex, raising concerns about time and cost efficiency. Thus, the next step in development was the introduction of the first assembler translator in the late 1940s, which allowed machine commands to be conveniently written in human-readable language, automating and simplifying the programming process, speeding up development and debugging. This paved the way for high-level languages and compilers (smarter generators of code from a language more understandable to humans) and interpreters (executors of a written program in real-time). These tools evolved over time, eventually leading to the point where programming can be done simply with a mouse.

In essence, assembly language is a machine-oriented programming language that enables direct interaction with a computer, one-on-one. As such, it’s fully described as a second-generation low-level programming language (following machine code). Assembly instructions directly correspond to processor commands. Given that there are various processor models, each with its own set of instructions, there are different versions or dialects of assembly language. Therefore, using the term “assembly language” might mistakenly imply the existence of a unified low-level language or a standard for such languages. No such standard exists. When referencing the language in which a particular program is written, it is necessary to specify which architecture it is intended for and which dialect of the language is used. Since assembly is tied to the processor’s architecture, and the type of processor strictly determines the set of available machine language commands, assembly programs cannot be transferred to different computer architectures.

Since an assembler is just a program written by a person, there’s nothing stopping another programmer from creating their own assembler, which often happens. In reality, it doesn’t matter much which specific assembly language you choose to study. The key is to understand the principles of working at the processor command level, and then it won’t be difficult to learn not only another assembler but also any other processor with its set of instructions.

Syntax Overview

There is no universally accepted standard for the syntax of assembly languages. However, most assembly language developers follow general traditional approaches. The main such standards are Intel syntax and AT&T syntax.

The general format for documenting instructions is the same for both standards:

[label:] opcode [operands] [;comment]

An opcode is essentially an assembly command, a mnemonic for processor instructions. It can be accompanied by prefixes (such as repetitions or changes in addressing type). Operands can include constants, register names, memory addresses, and so on. The differences between Intel and AT&T standards primarily involve the order of operands and their syntax under different addressing methods.

The set of commands typically remains consistent across all processors within the same architecture or family of architectures (well-known examples include processors and controllers from Motorola, ARM, and x86). These commands are detailed in the processor specifications.

For instance, the Zilog Z80 processor inherited the instruction set of the Intel i8080, expanded on it, and altered some commands (as well as register names) to suit its own needs. For example, it replaced the Intel command mov with ld. Motorola’s Fireball processors inherited the Z80 instruction set, albeit in a slightly reduced form. At the same time, Motorola officially reverted to Intel commands, and currently, half of the assemblers for the Fireball work with Intel commands, while the other half use Zilog commands.

Compiler Directives

In addition to assembler commands, a program may include directives—commands that are not directly translated into machine instructions but control the operation of the compiler. The set and syntax of these directives vary significantly and depend not on the hardware platform, but on the compiler being used. Examples of directives include:

Definition of data (constants and variables)
Managing program structure in memory and output file settings
Setting compiler mode
Various abstractions (elements of high-level languages), from defining procedures and functions (to simplify parameter passing) to conditional structures and loops
Macros

Pros and Cons

The advantages include the following:

Minimal redundant code (using fewer commands and memory references). Consequently, faster execution and smaller program size.
Direct hardware access: I/O ports and special processor registers.
Capability for self-modifying code (allowing an application to create or modify parts of its own code during runtime without the need for a software interpreter).
Maximum optimization for the target platform (utilizing special instructions and hardware specifics).

The drawbacks may include:

Large amounts of code and a high number of additional small tasks
A limited number of available libraries and their poor compatibility
Poor code readability and difficulty in maintenance (debugging, adding features)
Lack of portability to other platforms (except those that are binary compatible)

Why Learn Assembly Language?

In modern industrial programming practice, assembly languages are used very rarely. For low-level software development, the C language is predominantly used because it allows achieving the same goals with significantly less effort and often with equal or even greater efficiency in the resulting executable code (thanks to optimizers). Nowadays, assembly is used for very specific parts of operating system kernels and system libraries. Moreover, assembly programming has been largely replaced even in traditionally assembly-centric areas like microcontroller programming, where most firmware is also written in C. Nonetheless, assembly programming is still frequently employed for tasks that leverage processor capabilities not possible with high-level languages or for implementing various non-standard programming tricks. Separate assembly modules and inline assembly within code written in other languages are found in operating system kernels and in the system libraries of C and other high-level languages. Today, few would conceive the insane idea of writing a large program entirely in pure assembly.

So, why spend time studying it? There are several compelling reasons, and here’s one of them: Assembly language is the cornerstone upon which the entire vast domain of programming is built, dating back to the inception of the first processor. Just as every physicist dreams of unraveling the mysteries of the universe by identifying its fundamental, indivisible (low-level) elements beyond merely a vague understanding through quantum theory, assembly language is the fundamental material composing the universe of processors. It is the tool that enables us to think in terms of machine instructions, a skill essential for any professional programmer, even if they never write a single line of assembly code. It’s akin to becoming a mathematician; one cannot truly achieve this without an understanding of basic arithmetic. Regardless of the programming language you use, it’s crucial to have at least a basic understanding of what the processor actually does when executing your high-level commands. Without this understanding, a programmer may start using all available operations mindlessly, unaware of what they are actually creating.

A professional computer user, whether a systems administrator or programmer, can afford not to know something, but under no circumstances can they afford not to understand the essence of what is happening. They need to know how the computational system is structured on all its levels, from electronic logic circuits to complex application programs. A lack of understanding can lead to a subconscious sense of mystery, of an incomprehensible magic happening as if by the wave of a wand. Such a feeling is categorically unacceptable for a professional. They must have complete confidence, even in the deepest layers of their subconscious, that the device they are dealing with holds no magic or mysteries.

In other words, as long as processors exist, assembly language will be essential.

In this context, it doesn’t really matter which specific architecture or assembly language you choose to study. Once you know one assembly language, you can successfully start writing in any other, spending just a bit of time to review reference materials. The most important thing is that by thinking in the language of the processor, you will always understand what is happening, why, and for what purpose. This goes beyond just programming with a mouse—it’s a path to creating software that embodies great skill and craftsmanship.

Assembly Language: Programming or Art?

Let’s put it this way: it all depends on who is using it. Assembly language is the fundamental building block of the processor’s world; it forms its essence and consciousness. Just like all music in human history is made from combinations of seven notes, the assembly language commands bring the digital world to life. If you only know three chords, that’s “pop music”; if you know the entire range, that’s classical.

Why is science so eager to delve into quantum depths and grasp the elusive fundamental building block of matter? To gain control over it, to alter it at will, to ascend to a level akin to the Creator of the Universe. Who will wield such power remains an open question. Unlike science, the world of programming holds no mysteries; we know its building blocks and, consequently, the power over the processor that comes with understanding assembly language.

To elevate assembly language programming to the level of art, one must grasp its beauty hidden behind the stream of ones and zeros. As with any field of human endeavor, in programming, one can either be mediocre or become a Master. What differentiates the two is the level of culture, education, effort, and, most importantly, the amount of soul the author invests in their creation.

Assembler and the Terminator

Recently, James Cameron released a 3D version of the second “Terminator.” As an intriguing historical tidbit, there’s an interesting moment from the life of the cyborg assassin…

Scene from the movie "Terminator". — Scene from the movie “Terminator”.

Here we see the “vision” of the Terminator, with assembly code listing displayed on the left. Based on this, it seems the infamous Terminator operated on a MOS Technology 6502 or MOS Technology 6510 processor. This processor was initially developed in 1975, used in Apple computers, and famously in gaming consoles of that era like the Atari 2600 and Nintendo Entertainment System (better known as Dendy in some regions). It featured only three 8-bit registers: an accumulator (A) and two index registers (X and Y). The limited number of registers was offset by the fact that the first 256 bytes of RAM (referred to as the zero page) could be specially addressed, essentially functioning as 8-bit or 16-bit registers. This processor had 13 addressing modes and only 53 instructions. In the Terminator’s operations, you see a sequence of instructions like LDA-STA-LDA-STA… In the 6502 family, programs comprised almost entirely of LDA/LDY/LDX/STA/STX/STY instructions.

LDA — load into accumulator
LDY — load into register Y
LDX — load into register X
STA — store from accumulator
STX — store from register X
STY — store from register Y

Reading and writing to input-output ports were also performed by these commands, giving the terminator program a rational appearance rather than the nonsensical invention of a scriptwriter: MOS Technology 6502 / Instruction Set.

Practical Applications

It was previously mentioned that assembly language has almost been replaced by high-level programming languages in modern times. However, it still finds its use even today. Here are a few examples.

Development of embedded software. These are small programs that do not require significant memory on devices such as phones, car ignition systems, security systems, audio and video cards, modems, and printers. Assembler is the ideal tool for these tasks.
Optimization in computer gaming consoles to reduce code size and improve performance.
Utilization of new commands in programs available on new processors. While a high-level compiler optimizes code during compilation, it is rarely capable of generating instructions from extended instruction sets like AVX, CTV, or XOP. This is because processors add commands faster than compiler logic can generate them.
A significant portion of programs for graphics processing units (GPUs) is written in assembly language, alongside high-level languages such as HLSL or GLSL.
Writing code that is difficult or impossible to create in high-level languages, for example, obtaining a memory/stack dump. Even when a high-level language equivalent is possible, assembly language can offer significant benefits. For example, implementing the computation of the average of two numbers with overflow consideration for x86 processors takes only two commands (addition with carry flag and shift with borrow). The high-level equivalent ((long) x + y) >> 1 may not work since sizeof(long) == sizeof(int), or it is compiled into a vast number of processor commands.
Writing viruses and antivirus software. The only programming language suitable for creating serious infectors includes examples like CIH, Sality, and Sinowal.
And, of course, the other side of the coin: hacking, cracking, and its more legal variant, reverse engineering. Knowledge of assembler is a powerful tool in the hands of a reverse engineer. Neither disassembling nor debugging programs are possible without understanding it.

We will continue to dive into assembly language in the upcoming articles.

What is Assembly Language?

Syntax Overview

Compiler Directives

Pros and Cons

Why Learn Assembly Language?

Assembly Language: Programming or Art?

Assembler and the Terminator

Practical Applications

20.04.2023 — Sad Guard. Identifying and exploiting vulnerability in AdGuard driver for Windows

12.01.2022 — Post-quantum VPN. Understanding quantum computers and installing OpenVPN to protect them against future threats

04.04.2022 — Fastest shot. Optimizing Blind SQL injection

09.02.2022 — Dangerous developments: An overview of vulnerabilities in coding services

12.02.2023 — Gateway Bleeding. Pentesting FHRP systems and hijacking network traffic

01.06.2022 — Routing nightmare. How to pentest OSPF and EIGRP dynamic routing protocols

22.01.2023 — Top 5 Ways to Use a VPN for Enhanced Online Privacy and Security

19.04.2023 — Kung fu enumeration. Data collection in attacked systems

12.01.2022 — First contact. Attacks against contactless cards

01.06.2022 — F#ck AMSI! How to bypass Antimalware Scan Interface and infect Windows

20.04.2023 —
Sad Guard. Identifying and exploiting vulnerability in AdGuard driver for Windows

12.01.2022 —
Post-quantum VPN. Understanding quantum computers and installing OpenVPN to protect them against future threats

04.04.2022 —
Fastest shot. Optimizing Blind SQL injection

09.02.2022 —
Dangerous developments: An overview of vulnerabilities in coding services

12.02.2023 —
Gateway Bleeding. Pentesting FHRP systems and hijacking network traffic

01.06.2022 —
Routing nightmare. How to pentest OSPF and EIGRP dynamic routing protocols

22.01.2023 —
Top 5 Ways to Use a VPN for Enhanced Online Privacy and Security

19.04.2023 —
Kung fu enumeration. Data collection in attacked systems

12.01.2022 —
First contact. Attacks against contactless cards

01.06.2022 —
F#ck AMSI! How to bypass Antimalware Scan Interface and infect Windows