www.delorie.com/djgpp/doc/ug/asm/about-386.html   search  
Guide: About the 80386 architecture

Introduction

This section presents an overview of the 80386 processor as it applies to assembly language programming. Most books you will find on this subject are geared toward PC based assembly language products using the Intel 803x6 instruction set. The GNU assembler (GAS), which DJGPP uses, is based on the AT&T 680x0 instruction set. This user guide will present the 803x6 instruction set in the AT&T 680x0 format which can be used to create DJGPP GAS programs which run on an 803x6 PC's.

The information in this text should apply to any 32 bit x86 based processor. This includes the 386, 486, 586 and 686 processors from Intel, AMD and Cyrix. The pentium has a 386 compatibility mode, but how much of this would work with it, I do not know.

This overview should give you enough information about the 80386 to start programming using GAS. However, 80386 advanced topics such as protected mode, virtual memory, and multitasking, will not be presented. A reference section is included at the end of this page which can be used to obtain more in depth information.

If you have any comments, suggestions, or questions, feel free to contact the author of this section via email: schwarz@escmail.orl.mmc.com

GAS 80386 Syntax

There are some minor differences between normal 80386 assembly language (ie MASM, TASM, NASM) and the GNU Assembler (GAS). These are listed in the 80386 Machine Dependent Information section of the GAS on-line info docs. I repeat them here for completeness. In the course of the text below, I use the GAS syntax in all assembly language examples. You can also use this information when referencing 80386 assembly manuals.

Memory Model

The 80386 has two available memory models to choose from. A "Flat" model and a Segmented Model. The "Flat" memory model is basically a single segment. A single segment can address up to 4Gb of memory ((2^32) - 1). The segmented model can consist of 16,383 segments. The "Flat" memory model is similar to the Motorola 680x0 processor memory model which the GNU tools were original designed to support. The rest of this section will describe a little more about the "Flat" model. The segmented, model will not be discussed.

The "flat" model presents the programmer with a single array of up to 2^32 bytes (4 Gb). A program running under a "flat" model will be running in the 80386's protected mode under DJGPP. This basically means while you can access any memory address within the "flat" model linear address space, you cannot directly access memory locations outside the address space.

Data Types

Bytes, words, and double words are the fundamental data types within the 80386. A byte is eight contiguous bits starting at any logical address. Bits are numbered 0 through 7, with bit zero being the least significant bit.

A word is two contiguous bytes, or 16 bits, starting at any byte address. The bits are numbered 0 through 15, with zero being least significant. Each byte within a word has its own address, with the lower byte's address representing the words address.

A doubleword is two contiguous words, or 32 bits, starting at any byte address. Bits are numbered 0 through 31, with bit 0 being the least significant.

The two bytes in a word can be referred to as the low byte and the high byte. Similarly the two words in a double word can be referred to as low word and a high word, each having a low byte and a high byte. Note that the least significant bit in each of these is at a lower memory location than the most significant bit. Also the low byte, or low word, is at a lower memory location than the high byte, and high word.

The 80386 is a little endian processor. That is the lower byte of a word is at a lower memory location than the higher byte of a word. Also, lower bits (ie bit 0) are at a lower memory location than higher bits (ie bit 4).

Note that there is no requirement for aligning words on even-numbered addresses, nor do double words need be aligned on addresses evenly divisible by four. However, you should attempt to do this when designing data structures. If you don't, the 80386 will do some extra work to align the data before processing.

The 80386 also supports additional data types, based on the instruction used. The follows types are understood:

Packed and unpacked BCD is also supported.

Hexadecimal numbers are represented by pre-pending a 0x in front of the number. For example decimal 15 would be hex 0xF. Note that this is different from standard 80386 assembly language.

Processor Registers

There are sixteen registers that are of use to general purpose programmers. (There are several other registers for system level programming that are not discussed in this guide.)

General Registers

There are 8 general purpose, 32-bit registers in the 80386. They are EAX, EBX, ECX, EDX, EBP, ESP, ESI, and EDI. Each register can hold a doubleword containing any of the data types listed above.

A backwards compatible feature is built into the 80386. The lower word of each register can be addressed as a separate unit. This is useful for handling 16-bit data items and for compatibility with older 8086 and 80286 programs. The registers are named AX, BX, CX, DX, BP, SP, SI and DI. Note, the upper word of each register cannot be addressed separately.

Furthermore, each byte of the four 16-bit registers AX, BX, CX, and DX can be separately addressed. The high bytes are named AH, BH, CH and DH. The lower bytes are named AL, BL, CL and DL.

All of the general purpose registers are available for addressing calculations and for the results of most calculations. However, a number of functions expect there data to be in specific registers. This allows for more compact and efficient instructions.

Segment Registers

There are six segment registers, CS, DS, SS, ES, FS, and GS. They are used to identify the current six segments in use by a program. The CS registers contains the address of the currently running code segment. The DS register contains the address of the currently accessible data segment. The SS register contains the address of the current stack segment. The ES, FS, and GS registers contains additional segments as required by the program.

The flat memory model used by GAS, has only one segment and therefore programmer's don't normally need too worry about the segment registers. They are usually loaded up with selectors for descriptors that contain the entire 32-bit linear address space. Once loaded, there is no need to change them, and the 32-bit pointers can address the entire program. You don't need to worry about this initialization since defaults are selected when your program is loaded. (Selectors and descriptors are part of protected mode programming.)

While GAS does not use the segmentation model of the 80386, it has it's own segmentation model using sections. As a minimum there are 3 sections, the text section containing code, the data section and the bss section, which contains initialization data. You can also create your own sections. All sections are contained within the same segment. Sections with the same name are combined together during linking.

Stack Implementation

The 80386 allows for multiple stacks, with each stack being a separate segment. The stack pointer (ESP) register points to the top of the stack. The stack is a push-down stack, and is referenced implicitly by PUSH and POP instructions, subroutine calls and returns and interrupt operations. When an item is pushed on the stack, ESP is first decremented, then the data is written to the new ESP location. The opposite occurs when data is popped off the stack. The data is first copied out of the stack, then the ESP is incremented. The stack grows down in memory toward lesser addresses.

The stack frame base pointer (EBP) register is the best register to use for accessing data within the stack. It typically identifies the base address of the current stack frame in use by the current procedure. When this register is used in an offset calculation, the data is automatically fetched from the stack segment. This means the stack segment does not have to be included in the instruction, making for a more compact instruction.

Flags Register

The flags register is a 32-bit register named EFLAGS. The low-order 16 bits of EFLAGS is named FLAGS for compatibility with older 8086 and 80286 code. There are three basic groups of flags, status flags, control flags and the system flags. The flags are as follows:

  16  15                0
----  ---- ---- ---- ---X  CF  Carry Flag
----  ---- ---- ---- -X--  PF  Parity Flag
----  ---- ---- ---X ----  AF  Auxiliary Carry
----  ---- ---- -X-- ----  ZF  Zero Flag
----  ---- ---- X--- ----  SF  Sign Flag
----  ---- ---X ---- ----  TF  Trap Flag
----  ---- --X- ---- ----  IF  Interrupt Flag
----  ---- -X-- ---- ----  DF  Direction Flag
----  ---- X--- ---- ----  OF  Overflow Flag
----  --XX ---- ---- ----  PL  I/O Privilege Level
----  -X-- ---- ---- ----  NT  Nested Task Flag
---X  ---- ---- ---- ----  RF  Resume Flag
--X-  ---- ---- ---- ----  v8  Virtual 8086 MODE

The remaining bits are reserved for future Intel use. A flag is considered cleared when it is zero, set when it is 1.

The status flags used by application programmers are CF, PF, AF, ZF, SF, and OF. These flags hold the results of various instructions that are then used by later instructions. What follows is a brief description of each flag.

The only control flag at this time is the Direction Flag. It is used by string instructions to determine whether to process strings from the end of the string (auto-decrement), or from the beginning of the string (auto-increment).

The other flags are system flags. Refer to the 80386 Programmers guide for further information on these flags.

Instruction Pointer

The instruction pointer register (EIP) contains the offset into the current code segment. It is a 32-bit value. The lower 16-bit field is addressable as a single unit for backwards compatibility, and is named IP. The programmer should not fiddle with this register. It is controlled by instructions that transfer control of the program such as jump instructions, interrupts, and exceptions.

Assembly Instruction

The basic format of an instruction in 80386 assembly programming is:

Label Opcode Operands ; Comments

All of these are optional. For example an instruction can contain only a label or comment, It may have an Opcode that requires no operands, and is on a line by itself. Here are a couple of examples:

    start:                   ; This is a label
          ret                ; An instruction with no operands
                             ; Comment line
    here: jmp start          ; Instruction with a label, opcode, and a
                             ; single operand 
          movw %ax, %bx      ; Instruction with opcode and two
                             ; operands.

It should be noted that you cannot have an operand by itself in an instruction. There must be an opcode.

In the case where there are two operands, one is considered the source operand, and one is considered the destination operand. In GAS the source operand is first, the destination operand is second. This happens to be opposite of normal 80386 assembly language. For example, the command movw %ax, %bx will move the data from register ax and place it in register bx. The data in the destination operand is usually overwritten with the results of the instruction.

An instruction can include any of the following elements:

Operand Selection

As mentioned above, an instruction can include zero or more operands. The operands represent the data being operated on by the instruction. An operand can be in any of these locations:

Immediate operands and operands in registers are accessed faster than operands in memory, since memory operands have to be fetched from memory. Register operands are available within the CPU. Immediate operands are also available within the CPU since they are pre-fetched as part of the instruction.

Of instructions that have operands, some specify the operands implicitly, requiring the operand to be in a predefined location, usually a register or the stack. Others specify the operands explicitly, requiring the operands to be encoded in the instruction, after the opcode.

Instructions which contain two explicitly specified operands, generally overwrite one of the two operands with the result. The operand that is overwritten is called the destination operand. The other is called the source operand. For most instructions, either the source or destination operand can be in a register or in memory. The other operand must be in a register or be an immediate source operand. Therefore, two operand instructions permit operations of the following kind:

Some string instructions, and stack manipulation instructions transfer data from memory to memory. Both operands of some string instructions are in memory and implicitly specified. Stack operations allow the transfer of data between memory operands and the stack, which is located in memory.

Immediate Operands

As mentioned earlier, immediate operands are data embedded in the instruction itself. Immediate operands can be 8, 16, or 32 bits long. Some examples of these operands follows:

Notice the use of the "$" sign to indicate the immediate operand. This is a GAS requirement that is different than general 80386 assembly language.

Register Operands

Operands may be located in any one of the general registers. In the two examples above, ax, and dx are the Register operands. Notice the use of the "%" sign to indicate the register operand. This again is a GAS requirement that is different than general 80386 assembly language.

Memory Operands

Data-manipulation instructions that address operands in memory must specify the segment that contains the operand and the offset of the operand within the segment. Since GAS uses the flat memory model, you will not ordinarily worry about the segment, since your whole program is in a single, O/S determined, segment. There are two general methods for specifying the offset of a memory operand:

Effective Address Computation

The effective address is calculated by taking the sum of up to three components:

The general form for the operand is:

SECTION:DISP(BASE, INDEX, SCALE)

Note that this is GAS's format, and is different from general 80386 assembly languages.

The offset that results is useful for accessing various kinds of arrays and data structures. The displacement component is useful for fixed aspects of addressing:

The base and index components have similar functions. Both can be used for dynamic addressing, for example:

The uses of the general registers have the following restrictions:

The scaling factor permits efficient indexing into an array whose elements are 2, 4, or 8 bytes wide.

The base, index, and displacement components may be used in any combination. Any of the components may be null. A scale factor can only be used with an index. The following are several examples of effective addresses:

Interrupts and Exceptions

The 80386 has two mechanisms for interrupting program execution:

In most cases the various exceptions are associated with various 80836 instructions. In addition to servicing hardware interrupts, there is an INT command that programmers can use to access various hardware and operating system information. The following shows the 80836 reserved exceptions and interrupts of use to application programmers:

    0  Divide Error
    1  Debug Exceptions
    2  NMI Interrupt
    3  Breakpoint
    4  INTO Detected Overflow
    5  BOUND Range Exceeded
    6  Invalid Opcode
    7  Coprocessor Not Available
    8  Double Fault
    9  Coprocessor Segment Overrun
   10  Invalid Task State Segment
   11  Segment Not Present
   12  Stack Fault
   13  General Protection Fault
   14  Page Fault
   15  Reserved
   16  Coprocessor Error
   17  Alignment Check
18-31  Reserved

Instruction Set

And finally, the 80386 instructions. Sorry, but there are way too many instructions to include them all here. If your going to do a lot of 80386 assembly language programming you should get a good 80386 assembly book.

What I will show in this section, is several command that are different in GAS than they are in general 80386 assembly. They perform basically the same operation, but the calling form is different.

References

I first learned assembly program on the 80286 and have used several books over the years. These are the ones I used for this section:

On a finally note, if you don't want to deal with the GAS Syntax issues, there is a freeware 80386 assembler named NASM available on the net. I don't know where, but I think its listed on DJ's site. A quick search of the web should find the home page. NASM is supposed to be a MASM/TASM compatible assembler.

This section was provided by Jim Schwarz (a.k.a J.E. Schwarz, Jr. PE)

Email questions or comments to: schwarz@escmail.orl.mmc.com


  webmaster   donations   bookstore     delorie software   privacy  
  Copyright 1998   by DJ Delorie     Updated Jan 1998