mirror of
https://github.com/opsxcq/mirror-textfiles.com.git
synced 2025-08-11 04:04:02 +02:00
349 lines
16 KiB
Plaintext
349 lines
16 KiB
Plaintext
CHAPTER 6 THE 86 INSTRUCTION SET
|
||
|
||
|
||
Effective Addresses
|
||
|
||
Most memory data accessing in the 86 family is accomplished via
|
||
the mechanism of the effective address. Wherever an effective
|
||
address specifier "eb", "ew" or "ed" appears in the list of 8086
|
||
instructions, you may use a wide variety of actual operands in
|
||
that instruction. These include general registers, memory
|
||
variables, and a variety of indexed memory quantities.
|
||
|
||
GENERAL REGISTERS: Wherever an "ew" appears, you can use any of
|
||
the 16-bit registers AX,BX,CX,DX,SI,DI,SP, or BP. Wherever an
|
||
"eb" appears, you can use any of the 8-bit registers
|
||
AL,BL,CL,DL,AH,BH,CH, or DH. For example, the "ADD ew,rw" form
|
||
subsumes the 16-bit register-to-register adds; for example, ADD
|
||
AX,BX; ADD SI,BP; ADD SP,AX.
|
||
|
||
MEMORY VARIABLES: Wherever an "ew" appears, you can use a word
|
||
memory variable. Wherever an "eb" appears, you can use a byte
|
||
memory variable. Variables are typically declared in the DATA
|
||
segment, using a DW declaration for a word variable, or a DB
|
||
declaration for a byte variable. For example, you can declare
|
||
variables:
|
||
|
||
DATA_PTR DW ?
|
||
ESC_CHAR DB ?
|
||
|
||
Later, you can load or store these variables:
|
||
|
||
MOV SI,DATA_PTR ; load DATA_PTR into SI for use
|
||
LODSW ; fetch the word pointed to by DATA_PTR
|
||
MOV DATA_PTR,SI ; store the value incremented by the LODSW
|
||
MOV BL,ESC_CHAR ; load the byte variable ESC_CHAR
|
||
|
||
Alternatively, you can address specific unnamed memory locations
|
||
by enclosing the location value in square brackets; for example,
|
||
|
||
MOV AL,[02000] ; load contents of location 02000 into AL
|
||
|
||
Note that A86 discerned from context (loading into AL) that a
|
||
BYTE at 02000 was intended. Sometimes this is impossible, and
|
||
you must specify byte or word:
|
||
|
||
INC B[02000] ; increment the byte at location 02000
|
||
MOV W[02000],0 ; set the WORD at location 02000 to zero
|
||
6-2
|
||
|
||
INDEXED MEMORY: The 86 supports the use of certain registers as
|
||
base pointers and index registers into memory. BX and BP are the
|
||
base registers; SI and DI are the index registers. You may
|
||
combine at most one base register, at most one index register,
|
||
and a constant number into a run time pointer that determines the
|
||
location of the effective address memory to be used in the
|
||
instruction. These can be given explicitly, by enclosing the
|
||
index registers in brackets:
|
||
|
||
MOV AX,[BX]
|
||
MOV CX,W[SI+17]
|
||
MOV AX,[BX+SI+5]
|
||
MOV AX,[BX][SI]5 ; another way to write the same instr.
|
||
|
||
Or, indexing can be accomplished by declaring variables in a
|
||
based structure (see the STRUC directive in Chapter 9):
|
||
|
||
STRUC [BP] ; NOTE: based structures are unique to A86!
|
||
BP_SAVE DW ? ; BP_SAVE is a word at [BP]
|
||
RET_ADDR DW ? ; RET_ADDR is a word at [BP+2]
|
||
PARM1 DW ? ; PARM1 is a word at [BP+4]
|
||
PARM2 DW ? ; PARM2 is a word at [BP+6]
|
||
ENDS
|
||
INC PARM1 ; equivalent to INC W[BP+4]
|
||
|
||
Finally, indexing can be done by mixing explicit components with
|
||
declared ones:
|
||
|
||
TABLE DB 4,2,1,3,5
|
||
MOV AL,TABLE[BX] ; load byte number BX of TABLE
|
||
|
||
|
||
Segmentation and Effective Addresses
|
||
|
||
The 86 family has four segment registers, CS, DS, ES, and SS,
|
||
used to address memory. Each segment register points to 64K
|
||
bytes of memory within the 1-megabyte memory space of the 86.
|
||
(The start of the 64K is calculated by multiplying the segment
|
||
register value by 16; i.e., by shifting the value left by one hex
|
||
digit.) If your program's code, data and stack areas can all fit
|
||
in the same 64K bytes, you can leave all the segment registers
|
||
set to the same value. In that case, you won't have to think
|
||
about segment registers--no matter which one is used to address
|
||
memory, you'll still get the same 64K. If your program needs
|
||
more than 64K, you must point one or more segment registers to
|
||
other parts of the memory space. In this case, you must take
|
||
care that your memory references use the segment registers you
|
||
intended.
|
||
|
||
Each effective address memory access has a default segment
|
||
register, to be used if you do not explicitly specify which
|
||
segment register you wish. For most effective addresses, the
|
||
default segment register is DS. The exceptions are those
|
||
effective addresses that use the BP register for indexing. All
|
||
BP-indexed memory references have a default of SS. (This is
|
||
because BP is intended to be used for addressing local variables,
|
||
stored on the stack.)
|
||
6-3
|
||
|
||
If you wish your memory access to use a different segment
|
||
register, you provide a segment override byte before the
|
||
instruction containing the effective address operand. In the A86
|
||
language, you code the override by giving the name of the segment
|
||
register you wish before the instruction mnemonic. For example,
|
||
suppose you want to load the AL register with the memory byte
|
||
pointed to by BX. If you code MOV AL,[BX], the DS register will
|
||
be used to determine which 64K segment BX is pointing to. If you
|
||
want the byte to come from the CS-segment instead, you code CS
|
||
MOV AL,[BX]. Be aware that the segment override byte has effect
|
||
only upon the single instruction that follows it. If you have a
|
||
sequence of instructions requiring overrides, you must give an
|
||
override byte before every instruction in the sequence. (In that
|
||
case, you may wish to consider changing the value of the default
|
||
segment register for the duration of the sequence.)
|
||
|
||
NOTE: This method for providing segment overrides is unique to
|
||
the A86 assembler! The assemblers provided by Intel and IBM
|
||
(MS-DOS) attempt to figure out segment allocation for you, and
|
||
plug in segment override bytes "behind your back". In order to
|
||
do this, those assemblers require you to inform them which
|
||
variables and structures are pointed to by which segment
|
||
registers. That is what the ASSUME directive in those assemblers
|
||
is all about. I wrote Intel's first 86 assembler, ASM86, so I
|
||
have been watching the situation since day one. Over the years,
|
||
I have concluded that the ASSUME mechanism creates far, far more
|
||
confusion that it solves. So I scrapped it; and the result is an
|
||
assembler with far less red tape. But if your program needs more
|
||
than 64K, you do have to manage those segment registers yourself;
|
||
so take care!
|
||
|
||
|
||
Effective Use of Effective Addresses
|
||
|
||
Remember that all of the common instructions of the 86 family
|
||
allow effective addresses as operands. (The only major functions
|
||
that don't are the AL/AX specific ones: multiply, divide, and
|
||
input/output). This means that you don't have to funnel many
|
||
through AL or AX just to do something with them. You can perform
|
||
all the common arithmetic, PUSH/POP, and MOVes from any general
|
||
register to any general register; from any memory location
|
||
(indexed if you like) to any register; and (this is most often
|
||
overlooked) from any register TO memory. The only thing you
|
||
can't do in general is memory-to-memory. Among the more common
|
||
operations that inexperienced 86 programmers overlook are:
|
||
|
||
* setting memory variables to immediate values
|
||
|
||
* testing memory variables, and comparing them to constants
|
||
|
||
* preserving memory variables by PUSHing and POPping them
|
||
|
||
* incrementing and decrementing memory variables
|
||
|
||
* adding into memory variables
|
||
6-4
|
||
|
||
Encoding of Effective Addresses
|
||
|
||
Unless you are concerned with the nitty-gritty details of 86
|
||
instruction encoding, you don't need to read this section.
|
||
|
||
Every instruction with an effective address has an encoded byte,
|
||
known as the effective address byte, following the 1-byte opcode
|
||
for the instruction. (For obscure reasons, Intel calls this byte
|
||
the ModRM byte.) If the effective address is a memory variable,
|
||
or an indexed memory location with a non-zero constant offset,
|
||
then the effective address byte will be immediately followed by
|
||
the offset amount. Amounts in the range -128 to +127 are given
|
||
by a single signed byte, denoted by "d8" in the table below.
|
||
Amounts requiring a 2-byte representation are denoted by "d16" in
|
||
the table below. As with all 16-bit memory quantities in the 86
|
||
family, the word is stored with the least significant byte FIRST.
|
||
|
||
The following table of effective address byte values is organized
|
||
into 32 rows and 8 columns. The 32 rows give the possible values
|
||
for the effective address operand: 8 registers and 24 memory
|
||
indexing modes. A 25th indexing mode, [BP] with zero
|
||
displacement, has been pre-empted by the simple-memory-variable
|
||
case. If you code [BP] with no displacement, you will get
|
||
[BP]+d8, with a d8-value of zero.
|
||
|
||
The 8 columns of the table reflect further information given by
|
||
the effective address byte. Usually, this is the identity of the
|
||
other (always a register) operand of a 2-operand instruction.
|
||
Those instructions are identified by a "/r" following the opcode
|
||
byte in the instruction list. Sometimes, the information given
|
||
supplements the opcode byte in identifying the instruction
|
||
itself. Those instructions are identified by a "/" followed by a
|
||
digit from 0 through 7. The digit tells which of the 8 columns
|
||
you should use to find the effective address byte.
|
||
|
||
For example, suppose you have a perverse wish to know the precise
|
||
bytes encoded by the instruction SUB B[BX+17],100. This
|
||
instruction subtracts an immediate quantity, 100, from an
|
||
effective address quantity, B[BX+17]. By consulting the
|
||
instruction list, you find the general form SUB eb,ib. The
|
||
opcode bytes given there are 80 /5 ib. The "/5" denotes an
|
||
effective address byte, whose value will be taken from column 5
|
||
of the following table. The offset 17 decimal, which is 11 hex,
|
||
will fit in a single "d8" byte, so we take our value from the
|
||
"[BX] + d8" row. The table tells us that the effective address
|
||
byte is 6F. Immediately following the 6F is the offset, 11 hex.
|
||
Following that is the ib-value of 100 decimal, which is 64 hex.
|
||
So the bytes generated by SUB B[BX+17],100 are 80 6F 11 64.
|
||
|
||
|
||
6-5
|
||
|
||
Table of Effective Address byte values
|
||
|
||
s = ES CS SS DS
|
||
rb = AL CL DL BL AH CH DH BH
|
||
rw = AX CX DX BX SP BP SI DI
|
||
digit= 0 1 2 3 4 5 6 7
|
||
Effective
|
||
EA byte address:
|
||
values: 00 08 10 18 20 28 30 38 [BX + SI]
|
||
01 09 11 19 21 29 31 39 [BX + DI]
|
||
02 0A 12 1A 22 2A 32 3A [BP + SI]
|
||
03 0B 13 1B 23 2B 33 3B [BP + DI]
|
||
|
||
04 0C 14 1C 24 2C 34 3C [SI]
|
||
05 0D 15 1D 25 2D 35 3D [DI]
|
||
06 0E 16 1E 26 2E 36 3E d16 (simple var)
|
||
07 0F 17 1F 27 2F 37 3F [BX]
|
||
|
||
40 48 50 58 60 68 70 78 [BX + SI] + d8
|
||
41 49 51 59 61 69 71 79 [BX + DI] + d8
|
||
42 4A 52 5A 62 6A 72 7A [BP + SI] + d8
|
||
43 4B 53 5B 63 6B 73 7B [BP + DI] + d8
|
||
|
||
44 4C 54 5C 64 6C 74 7C [SI] + d8
|
||
45 4D 55 5D 65 6D 75 7D [DI] + d8
|
||
46 4E 56 5E 66 6E 76 7E [BP] + d8
|
||
47 4F 57 5F 67 6F 77 7F [BX] + d8
|
||
|
||
80 88 90 98 A0 A8 B0 B8 [BX + SI] + d16
|
||
81 89 91 99 A1 A9 B1 B9 [BX + DI] + d16
|
||
82 8A 92 9A A2 AA B2 BA [BP + SI] + d16
|
||
83 8B 93 9B A3 AB B3 BB [BP + DI] + d16
|
||
|
||
84 8C 94 9C A4 AC B4 BC [SI] + d16
|
||
85 8D 95 9D A5 AD B5 BD [DI] + d16
|
||
86 8E 96 9E A6 AE B6 BE [BP] + d16
|
||
87 8F 97 9F A7 AF B7 BF [BX] + d16
|
||
|
||
C0 C8 D0 D8 E0 E8 F0 F8 ew=AX eb=AL
|
||
C1 C9 D1 D9 E1 E9 F1 F9 ew=CX eb=CL
|
||
C2 CA D2 DA E2 EA F2 FA ew=DX eb=DL
|
||
C3 CB D3 DB E3 EB F3 FB ew=BX eb=BL
|
||
|
||
C4 CC D4 DC E4 EC F4 FC ew=SP eb=AH
|
||
C5 CD D5 DD E5 ED F5 FD ew=BP eb=CH
|
||
C6 CE D6 DE E6 EE F6 FE ew=SI eb=DH
|
||
C7 CF D7 DF E7 EF F7 FF ew=DI eb=BH
|
||
|
||
d8 denotes an 8-bit displacement following the EA byte, to be
|
||
sign-extended and added to the index.
|
||
|
||
d16 denotes a 16-bit displacement following the EA byte, to be
|
||
added to the index.
|
||
|
||
Default segment register is SS for effective addresses containing
|
||
a BP index; DS for other memory effective addresses.
|
||
6-6
|
||
|
||
|
||
|
||
|
||
|
||
How to Read the Instruction Set Chart
|
||
|
||
The following chart summarizes the machine instructions you can
|
||
program with A86. In order to use the chart, you need to learn
|
||
the meanings of the specifiers (each given by 2 lower case
|
||
letters) that follow most of the instruction mnemonics. Each
|
||
specifier indicates the type of operand (register byte, immediate
|
||
word, etc.) that follows the mnemonic to produce the given
|
||
opcodes.
|
||
|
||
|
||
"c" means the operand is a code label, pointing to a part of the
|
||
program to be jumped to or called. A86 will also accept a
|
||
constant offset in this place (or a constant segment-offset
|
||
pair in the case of "cd"). "cb" is a label within about 128
|
||
bytes (in either direction) of the current location. "cw" is
|
||
a label within the same code segment as this program; "cd" is
|
||
a pair of constants separated by a colon-- the segment value
|
||
to the left of the colon, and the offset to the right. Note
|
||
that in both the cb and cw cases, the object code generated
|
||
is the offset from the location following the current
|
||
instruction, not the absolute location of the label operand.
|
||
In some assemblers (most notably for the Z-80 processor) you
|
||
have to code this offset explicitly by putting "$-" before
|
||
every relative jump operand in your source code. You do NOT
|
||
need to, and should not do so with A86.
|
||
|
||
"e" means the operand is an Effective Address. The concept of
|
||
an Effective Address is central to the 86 machine
|
||
architecture, and thus to 86 assembly language programming.
|
||
It is described in detail at the start of this chapter. We
|
||
summarize here by saying that an Effective Address is either
|
||
a general purpose register, a memory variable, or an indexed
|
||
memory quantity. For example, the instruction "ADD rb,eb"
|
||
includes the instructions: ADD AL,BL, and ADD CH,BYTEVAR, and
|
||
ADD DL,B[BX+17].
|
||
|
||
"i" means the operand is an immediate constant, provided as part
|
||
of the instruction itself. "ib" is a byte-sized constant;
|
||
"iw" is a constant occupying a full 16-bit word. The operand
|
||
can also be a label, defined with a colon. In that case, the
|
||
immediate constant which is the location of the label is
|
||
used. Examples: "MOV rw,iw" includes the instructions: MOV
|
||
AX,17, or MOV SI,VAR_ARRAY, where "VAR_ARRAY:" appears
|
||
somewhere in the program, defined with a colon. NOTE that if
|
||
VAR_ARRAY were defined without a colon, e.g., "VAR_ARRAY DW
|
||
1,2,3", then "MOV SI,VAR_ARRAY" would be a "MOV rw,ew" NOT a
|
||
"MOV rw,iw". The MOV would move the contents of memory at
|
||
VAR_ARRAY (in this case 1) into SI, instead of the location
|
||
of the memory. To load the location, you can code "MOV
|
||
SI,OFFSET VAR_ARRAY".
|
||
6-7
|
||
|
||
"m" means a memory variable or an indexed memory quantity; i.e.,
|
||
any Effective Address EXCEPT a register.
|
||
|
||
"r" means the operand is a general purpose register. The 8 "rb"
|
||
registers are AL,BL,CL,DL,AH,BH,CH,DH; the 8 "rw" registers
|
||
are AX,BX,CX,DX,SI,DI,BP,SP.
|
||
|
||
|
||
WARNING: Instruction forms marked with "*" by the mnemonic are
|
||
part of the extended 186/286/NEC instruction set. Instructions
|
||
marked with "#" are unique to the NEC processors. These
|
||
instructions will NOT work on the 8088 of the IBM-PC; nor will
|
||
they work on the 8086; nor will the NEC instructions work on the
|
||
186 or 286. If you wish your programs to run on all PC's, do not
|
||
use these instructions!
|
||
|
||
|