By Susam Pal on 27 Oct 2005

The following 12-byte program composed of pure x86 machine code
writes itself to standard output when executed in a DOS environment:

fc b1 0c ac 92 b4 02 cd 21 e2 f8 c3

We can write these bytes to a file with the .COM extension and
execute it in DOS. It runs successfully in MS-DOS 6.22, Windows 98,
as well as in DOSBox and writes a copy of itself to standard output.

Contents

Demo

On a Unix or Linux system, the following commands demonstrate this
program with the help of DOSBox:

echo fc b1 0c ac 92 b4 02 cd 21 e2 f8 c3 | xxd -r -p > foo.com
dosbox -c 'MOUNT C .' -c 'C:FOO > C:OUT.COM' -c 'EXIT'
diff foo.com OUT.COM

The diff command should produce no output confirming
that the output of the program is identical to the program itself.
On an actual MS-DOS 6.22 system or a Windows 98 system, we can
demonstrate this program in the following manner:

C:>DEBUG
-E 100 fc b1 0c ac 92 b4 02 cd 21 e2 f8 c3
-N FOO.COM
-R CX
CX 0000
:C
-W
Writing 0000C bytes
-Q

C:>FOO > OUT.COM

C:>FC FOO.COM OUT.COM
Comparing files FOO.COM and OUT.COM
FC: no differences encountered

In the DEBUG session shown above, we use the debugger
command E to enter the machine code at offset 0x100 of
the code segment. Then we use the N command to name the
file we want to write this machine code to. The command R
CX
is used to specify that we want to write 0xC (decimal 12)
bytes to this file. The W command writes the 12 bytes
entered at offset 0x100. The Q command quits the
debugger. Then we run the new FOO.COM program while
redirecting its output to OUT.COM. Finally, we use
the FC command to compare the two files and confirm
that they are exactly the same.

Let us disasssemble this program now and see what it does. The
output below is generated using the Netwide Disassembler (NDISASM),
a tool that comes with Netwide Assembler (NASM):

$ ndisasm -o 0x100 foo.com
00000100  FC                cld
00000101  B10C              mov cl,0xc
00000103  AC                lodsb
00000104  92                xchg ax,dx
00000105  B402              mov ah,0x2
00000107  CD21              int 0x21
00000109  E2F8              loop 0x103
0000010B  C3                ret

When DOS executes a program in .COM file, it loads the machine code
in the file at offset 0x100 of the code segment chosen by DOS. That
is why we ask the disassembler to assume a load address of 0x100
with the -o command line option. The first instruction
clears the direction flag. The purpose of this instruction is
explained later. The next instruction sets the register CL to 0xc
(decimal 12). The register CH is already set to 0 by default when a
.COM program starts. Thus setting the register CL to 0 effectively
sets the entire register CX to 0xc. The register CX is used as a
loop counter for the loop 0x103 instruction that comes
later. Everytime this loop instruction executes, it decrements CX
and makes a near jump to offset 0x103 if CX is not 0. This results
in 12 iterations of the loop.

In each iteration of the loop, the instructions from offset 0x103 to
offset 0x109 are executed. The lodsb instruction loads
a byte from address DS:SI into AL. When DOS starts executing this
program, DS and SI are set to CS and 0x100 by default, so at the
beginning DS:SI points to the first byte of the program.
The xchg instruction exchanges the values in AX and DX.
Thus the byte we just loaded into AL ends up in DL. Then we set AH
to 2 and generate the software interrupt 0x21 (decimal 33) to write
the byte in DL to standard output. This is how each iteration reads
a byte of this program and writes it to standard output.

The lodsb instruction increments or decrements SI
depending on the state of the direction flag (DF). When DF is
cleared, it increments SI. If DF is set, it decrements SI. We use
the cld instruction at the beginning to clear DF, so
that in each iteration of the loop, SI moves forward to point to the
next byte of the program. This is how the 12 iterations of the loop
write 12 bytes of the program to standard output. In many DOS
environments, the DF flag is already in cleared state when a .COM
program starts, so the CLD instruction could be omitted in such
environments. However, there are some environments where DF may not
be in cleared state when our program starts, so it is a best
practice to clear DF before relying on it.

Finally, when the loop terminates, we execute the RET
instruction to terminate the program.

Quine Conundrums

While reading the description of the self-printing program presented
earlier, one might wonder if it is a quine. While there is no
standardized definition of the term quine, it is generally
accepted that a quine is a computer program that takes no input and
produces an exact copy of its own source code as its output. Since a
quine cannot take any input, tricks involving reading its own source
code or evaluating itself are ruled out.

For example, this shell script is a valid quine:

s='s=47%s47;printf "$s" "$s"n';printf "$s" "$s"

However, the following shell script is not considered a proper
quine:

cat $0

The shell script above reads its own source code which is considered
cheating. Improper quines like this are often called cheating
quines
.

Is our 12-byte x86 program a quine? It turns out that we have a
conundrum. There is no notion of source code for our program. There
would have been one if we had written out the source code of this
program in assembly language. In such a case we would first need to
choose an assembler and a proper quine would need to produce an
exact copy of the assembly language source code (not the machine
code bytes) for the chosen assembler. But we are not doing that
here. We want the machine code to produce an exact copy of itself.
There is no source code involved. We only have machine code. So we
could argue that the whole notion of machine code quine is nonsense.
No machine code quine can exist because there is no source code to
produce as output.

However, we could also argue that the machine code is the input for
the CPU that the CPU fetches, decodes, and converts to a sequence of
state changes in the CPU. If we define a machine code quine to be a
machine code program that writes its own bytes, then we could say
that we have a machine code quine here.

Let us now entertain the thought that our 12-byte program is indeed
a machine code quine. Now we have a new conundrum. Is it a proper
quine? This program reads its own bytes from memory and writes them.
Does that make it a cheating quine? What would a proper quine
written in pure machine code even look like? If we look at the shell
script quine above, we see that it contains parts of the executable
part of the script code embedded in a string as data. Then we format
the string cleverly to produce a new string that looks exactly like
the entire shell script. It is a common pattern followed in many
quines. The quine does not read its own code but it reads some data
defined by the code and formats that data to look like its own code.
However, in a pure machine code like this the lines between data and
code are blurred. Even if we try to keep the bytes we want to read
at a separate place in the memory and treat it like data, they would
look exactly like machine instructions, so one might wonder if there
is any point in trying to make a machine quine that does not read
its own bytes. Nevertheless the next section shows how to accomplish
this.

Proper Quines

If the thought of a machine code quine program reading its own bytes
from the memory makes you uncomfortable, here is an adapation of the
previous program that keeps the machine instructions to be executed
separate from the data bytes to be read by the program.

fc b3 02 b1 14 be 14 01 ac 92 b4 02 cd 21 e2 f8 4b 75 f0 c3
fc b3 02 b1 14 be 14 01 ac 92 b4 02 cd 21 e2 f8 4b 75 f0 c3

Here is how we can demonstrate this 40-byte program:

echo fc b3 02 b1 14 be 14 01 ac 92 b4 02 cd 21 e2 f8 4b 75 f0 c3 | xxd -r -p > foo.com
echo fc b3 02 b1 14 be 14 01 ac 92 b4 02 cd 21 e2 f8 4b 75 f0 c3 | xxd -r -p >> foo.com
dosbox -c 'MOUNT C .' -c 'C:FOO > C:OUT.COM' -c 'EXIT'
diff foo.com OUT.COM

Here is the disassembly:

$ ndisasm -o 0x100 foo.com
00000100  FC                cld
00000101  B302              mov bl,0x2
00000103  B114              mov cl,0x14
00000105  BE1401            mov si,0x114
00000108  AC                lodsb
00000109  92                xchg ax,dx
0000010A  B402              mov ah,0x2
0000010C  CD21              int 0x21
0000010E  E2F8              loop 0x108
00000110  4B                dec bx
00000111  75F0              jnz 0x103
00000113  C3                ret
00000114  FC                cld
00000115  B302              mov bl,0x2
00000117  B114              mov cl,0x14
00000119  BE1401            mov si,0x114
0000011C  AC                lodsb
0000011D  92                xchg ax,dx
0000011E  B402              mov ah,0x2
00000120  CD21              int 0x21
00000122  E2F8              loop 0x11c
00000124  4B                dec bx
00000125  75F0              jnz 0x117
00000127  C3                ret

The first 20 bytes is the executable part of the program. The next
20 bytes is the data read by the program. The executable bytes are
identical to the data bytes. The executable part of the program has
an outer loop that iterates twice. In each iteration, it reads the
data bytes and writes them to standard output. Therefore, in two
iterations of the outer loop, it writes the data bytes twice. In
this manner, the output is identical to the program itself.

Here is another simpler 32-byte quine based on this approach:

b8 23 09 fe c0 a2 20 01 ba 10 01 cd 21 cd 21 c3
b8 23 09 fe c0 a2 20 01 ba 10 01 cd 21 cd 21 c3

Here are the commands to demostrate this quine:

echo b8 23 09 fe c0 a2 20 01 ba 10 01 cd 21 cd 21 c3 | xxd -r -p > foo.com
echo b8 23 09 fe c0 a2 20 01 ba 10 01 cd 21 cd 21 c3 | xxd -r -p >> foo.com
dosbox -c 'MOUNT C .' -c 'C:FOO > C:OUT.COM' -c 'EXIT'
diff foo.com OUT.COM

Here is the disassembly:

$ ndisasm -o 0x100 foo.com
00000100  B82309            mov ax,0x923
00000103  FEC0              inc al
00000105  A22001            mov [0x120],al
00000108  BA1001            mov dx,0x110
0000010B  CD21              int 0x21
0000010D  CD21              int 0x21
0000010F  C3                ret
00000110  B82309            mov ax,0x923
00000113  FEC0              inc al
00000115  A22001            mov [0x120],al
00000118  BA1001            mov dx,0x110
0000011B  CD21              int 0x21
0000011D  CD21              int 0x21
0000011F  C3                ret

This example too has two parts. The first half has the executable
bytes and the second half has the data bytes. Both parts are
identical. This example sets AH to 9 in the first instruction and
then later uses int 0x21 to invoke the DOS service that
prints a dollar-terminated string beginning at the address specifed
in DS:DX. When a .COM program starts, DS already points to the
current code segment, so we don’t have to set it explicitly. The
dollar symbol has an ASCII code of 0x24 (decimal 36). We need to be
careful about not having this value anywhere within the the data
bytes or this DOS function would prematurely stop printing our data
bytes as soon as it encounters this value. That is why we set AL to
0x23 in the first instruction, then increment it to 0x24 in the
second instruction, and then copy this value to the end of the data
bytes in the third instruction. Finally, we execute int
0x21
twice to write the data bytes twice to standard output,
so that the output matches the program itself.

While both these programs take care not to read the same memory
region that is being executed by the CPU, the data bytes they read
look exactly like the executable bytes. This is what I meant when I
mentioned earlier that the lines between code and data are blurred
in an exercise like this. This is why I don’t really see a point in
keeping the executable bytes separate from the data bytes while
writing machine code quines.

A Note on DOS Services

The self-printing programs presented above use int 0x21
which offers DOS services that support various input/output
functions. In the first two programs, we selected the function to
write a character to standard output by setting AH to 2 before
invoking this software interrupt. In the next program, we selected
the function to write a dollar-terminated string to standard output
by setting AH to 9.

The ret instruction in the end too relies on DOS
services. When a .COM program starts, the register SP contains
0xfffe. The stack memory locations at offset 0xfffe and 0xffff
contain 0x00 and 0x00, respectively. Further, the memory address at
offset 0x0000 contains the instruction int 0x20 which
is a DOS service that terminates the program. As a result, executing
the ret instruction pops 0x0000 off the stack at 0xfffe
and loads it into IP. This results in the instruction int
0x20
at offset 0x0000 getting executed. This instruction
terminates the program and returns to DOS.

Relying on DOS services gives us a comfortable environment to work
with. In particular, DOS implements the notion of standard
output
which lets us redirect standard output to a file. This
lets us conveniently compare the original program file and the
output file with the FC command and confirm that they
are identical.

But one might wonder if we could avoid relying on DOS services
completely and still write a program that prints its own bytes to
screen. We definitely can. We could write directly to video memory
at address 0xb800:0x0000 and show the bytes of the program on
screen. We could also forgo DOS completely and let BIOS load our
program from the boot sector and execute it. The next two sections
discuss these things.

Writing to Video Memory Directly

Here is an example of an 18-byte self-printing program that writes
directly to the video memory at address 0xb800:0x0000.

fc b4 b8 8e c0 31 ff b1 12 b4 0a ac ab e2 fc f4 eb fd

Here are the commands to create and run this program:

echo fc b4 b8 8e c0 31 ff b1 12 b4 0a ac ab e2 fc f4 eb fd | xxd -r -p > foo.com
dosbox foo.com

With the default code page active, i.e., with code page 437 active,
the program should display an output that looks approximately like
the following and halt:

ⁿ┤╕Ä└1 ▒↕┤◙¼½Γⁿ⌠δ²

Now of course this type of output looks gibberish but there is a
quick and dirty way to confirm that this output indeed represents
the bytes of our program. We can use the TYPE command
of DOS to print the program and check if the symbols that appear in
its output seem consistent with the output above. Here is an
example:

C:>TYPE FOO.COM
ⁿ┤╕Ä└1 ▒↕┤
          ¼½Γⁿ⌠δ²
C:>

This output looks very similar to the previous one except that the
byte value 0x0a is rendered as a line break in this output whereas
in the previous output this byte value is represented as a circle in
a box. This method would not have worked if there were any control
characters such as backspace or carriage return that result in
characters being erased in the displayed output.

A proper way to verify that the output of the program represents the
bytes of the program would be to find each symbol in the output in a
chart for code page 437 and confirm that the byte value of each
symbol matches each byte value in the program. Here is one such
chart that approximates the symbols in code page 437 with Unicode
symbols: cp437.html.

Here is the disassembly of the above program:

$ ndisasm -o 0x100 foo.com
00000100  FC                cld
00000101  B4B8              mov ah,0xb8
00000103  8EC0              mov es,ax
00000105  31FF              xor di,di
00000107  B112              mov cl,0x12
00000109  B40A              mov ah,0xa
0000010B  AC                lodsb
0000010C  AB                stosw
0000010D  E2FC              loop 0x10b
0000010F  F4                hlt
00000110  EBFD              jmp short 0x10f

This program sets ES to 0xb800 and DI to 0. Thus ES:DI points to the
video memory at address 0xb800:0x0000. DS:SI points to the first
instruction of this program by default. Further AH is set to 0xa.
This is used to specify the colour attribute of the text to be
displayed on screen. Each iteration of the loop in this program
loads a byte of the program and writes it along with the colour
attribute to video memory. The lodsb instruction loads
a byte of the program from the memory address specified by DS:SI
into AL and increments SI by 1. AH is already set to 0xa. The value
0xa (binary 00001010) here specifies black as the background colour
and bright green as the foreground colour. The stosw
instruction stores a word from AX to the memory address specified by
ES:DI and increments DI by 2. In this manner, the byte in AL and its
colour attribute in AH gets copied to the video memory.

Once again, if you are not happy about the program reading its own
executable bytes, we can keep the bytes we read separate from the
bytes the CPU executes. Here is a 54-byte program that does this:

fc b3 02 b4 b8 8e c0 31 ff be 1b 01 b9 1b 00 b4
0a ac ab e2 fc 4b 75 f1 f4 eb fd fc b3 02 b4 b8
8e c0 31 ff be 1b 01 b9 1b 00 b4 0a ac ab e2 fc
4b 75 f1 f4 eb fd

Here is how we can create and run this program:

echo fc b3 02 b4 b8 8e c0 31 ff be 1b 01 b9 1b 00 b4 | xxd -r -p > foo.com
echo 0a ac ab e2 fc 4b 75 f1 f4 eb fd fc b3 02 b4 b8 | xxd -r -p >> foo.com
echo 8e c0 31 ff be 1b 01 b9 1b 00 b4 0a ac ab e2 fc | xxd -r -p >> foo.com
echo 4b 75 f1 f4 eb fd | xxd -r -p >> foo.com
dosbox foo.com

With code page 437 active, the output should look approximately like this:

ⁿ│☻┤╕Ä└1 ╛←☺╣← ┤◙¼½ΓⁿKu±⌠δ²ⁿ│☻┤╕Ä└1 ╛←☺╣← ┤◙¼½ΓⁿKu±⌠δ²

We can clearly see in this output that the first 27 bytes of output
are identical to the next 27 bytes of the output. Like the proper
quines discussed earlier, this one too has two halves that are
identical to each other. The executable code in the first half reads
the data bytes from the second half and prints the data bytes twice
so that the output bytes is an exact copy of all 54 bytes in the
program. Here is the disassembly:

$ ndisasm -o 0x100 foo.com
00000100  FC                cld
00000101  B302              mov bl,0x2
00000103  B4B8              mov ah,0xb8
00000105  8EC0              mov es,ax
00000107  31FF              xor di,di
00000109  BE1B01            mov si,0x11b
0000010C  B91B00            mov cx,0x1b
0000010F  B40A              mov ah,0xa
00000111  AC                lodsb
00000112  AB                stosw
00000113  E2FC              loop 0x111
00000115  4B                dec bx
00000116  75F1              jnz 0x109
00000118  F4                hlt
00000119  EBFD              jmp short 0x118
0000011B  FC                cld
0000011C  B302              mov bl,0x2
0000011E  B4B8              mov ah,0xb8
00000120  8EC0              mov es,ax
00000122  31FF              xor di,di
00000124  BE1B01            mov si,0x11b
00000127  B91B00            mov cx,0x1b
0000012A  B40A              mov ah,0xa
0000012C  AC                lodsb
0000012D  AB                stosw
0000012E  E2FC              loop 0x12c
00000130  4B                dec bx
00000131  75F1              jnz 0x124
00000133  F4                hlt
00000134  EBFD              jmp short 0x133

This disassembly is rather long but we can clearly see that the
bytes from offset 0x100 to offset 0x11a are identical to the bytes
from offset 0x11b to 0x135. These are the bytes we see in the output
of the program too.

Boot Program

The 32-byte program below writes itself to video memory when
executed from the boot sector:

ea 05 7c 00 00 fc b8 00 b8 8e c0 8c c8 8e d8 31
ff be 00 7c b9 20 00 b4 0a ac ab e2 fc f4 eb fd

We can create a boot image that contains these bytes, write it to
the boot sector of a drive and boot an IBM PC compatible computer
with it. On booting, this program prints its own bytes on the
screen.

On a Unix or Linux system, the following commands can be used to
create a boot image with the above program:

echo ea 05 7c 00 00 fc b8 00 b8 8e c0 8c c8 8e d8 31 | xxd -r -p > boot.img
echo ff be 00 7c b9 20 00 b4 0a ac ab e2 fc f4 eb fd | xxd -r -p >> boot.img
echo 55 aa | xxd -r -p | dd seek=510 bs=1 of=boot.img

Now we can test this boot image using DOSBox with the following command:

dosbox -c cls -c 'boot boot.img'

We can also test this image using QEMU x86 system emulator as follows:

qemu-system-i386 -fda boot.img

We could also write this image to the boot sector of an actual
physical storage device, such as a USB flash drive, and then boot
the computer with it. Here is an example command that writes the
boot image to the drive represented by the device
path /dev/sdx.

cp a.img /dev/sdx


CAUTION: You need to be absolutely sure of the device path of the
device being written to. The device path /dev/sdx is
only an example here. If the boot image is written to the wrong
device, access to the data on that would be lost.

On testing this boot image with an emulator or a real computer, the
output should look approximately like this:

Ω♣|  ⁿ╕ ╕Ä└î╚Ä╪1 ╛ |╣  ┤◙¼½Γⁿ⌠δ²

This looks like gibberish, however every symbol in the above output
corresponds to a byte of the program mentioned earlier. For example,
the first symbol (omega) represents the byte value 0xea, the second
symbol (club) represents the byte value 0x05, and so on. The chart
at cp437.html can be used to
confirm that every symbol in the output indeed represents every byte
of the program.

Here is the disassembly of the program:

$ ndisasm -o 0x7c00 boot.img
00007C00  EA057C0000        jmp 0x0:0x7c05
00007C05  FC                cld
00007C06  B800B8            mov ax,0xb800
00007C09  8EC0              mov es,ax
00007C0B  8CC8              mov ax,cs
00007C0D  8ED8              mov ds,ax
00007C0F  31FF              xor di,di
00007C11  BE007C            mov si,0x7c00
00007C14  B92000            mov cx,0x20
00007C17  B40A              mov ah,0xa
00007C19  AC                lodsb
00007C1A  AB                stosw
00007C1B  E2FC              loop 0x7c19
00007C1D  F4                hlt
00007C1E  EBFD              jmp short 0x7c1d
00007C20  0000              add [bx+si],al
00007C22  0000              add [bx+si],al
...

The ellipsis in the end represents the remainder of the bytes that
contains zeroes and the boot sector magic bytes 0x55 and 0xaa in the
end. They have been omitted here for the sake of brevity.

When a computer boots, the BIOS reads the boot sector code from the
first sector of the boot device into the memory at physical address
0x7c00 and jumps to this address. Most BIOS implementations jump to
0x0000:0x7c00 but there are some implementations that jump to
0x07c0:0x0000 instead. Both these jumps are jumps to the same
physical address 0x7c00 but this difference poses a problem for us
because the offsets in our program depend on which jump the BIOS
executed. In order to ensure that our program can run with both
types of BIOS implementations, we use a popular trick of having the
first instruction of our program execute a jump to address
0x0000:0x7c05 in order to reach the second instruction. This sets
the register CS to 0 and IP to 0x7c05 and we don’t have to worry
about the differences between BIOS implementations anymore. We can
now pretend as if a BIOS implementation that jumps to 0x0000:0x7c00
is going to load our program.

The remainder of the program is similar to the one in the previous
section. However, there are some small but important differences.
While the DOS environment guarantees that AH and CH are initialized
to 0 when a .COM program starts, the BIOS offers no such guarantee
while loading and executing a boot program. This is why we use the
registers AX and CX (as opposed to only AH and CL) in
the mov instructions to initialize them. Similarly,
while DOS initializes SI to 0x100 when a .COM program starts, for a
boot program, we set the register SI ourselves.

If you feel uncomfortable about calling the above program a quine
because it reads its own bytes from the memory, we could have the
program read the bytes it needs to print from a separate place in
memory. We do not execute these bytes. We only read them and copy
them to video memory. The following 76-byte program does this:

ea 05 7c 00 00 fc bb 02 00 b8 00 b8 8e c0 8c c8
8e d8 31 ff be 26 7c b9 26 00 b4 0a ac ab e2 fc
4b 75 f1 f4 eb fd ea 05 7c 00 00 fc bb 02 00 b8
00 b8 8e c0 8c c8 8e d8 31 ff be 26 7c b9 26 00
b4 0a ac ab e2 fc 4b 75 f1 f4 eb fd

Here is how we can create a boot image with this:

echo ea 05 7c 00 00 fc bb 02 00 b8 00 b8 8e c0 8c c8 | xxd -r -p > boot.img
echo 8e d8 31 ff be 26 7c b9 26 00 b4 0a ac ab e2 fc | xxd -r -p >> boot.img
echo 4b 75 f1 f4 eb fd ea 05 7c 00 00 fc bb 02 00 b8 | xxd -r -p >> boot.img
echo 00 b8 8e c0 8c c8 8e d8 31 ff be 26 7c b9 26 00 | xxd -r -p >> boot.img
echo b4 0a ac ab e2 fc 4b 75 f1 f4 eb fd | xxd -r -p >> boot.img
echo 55 aa | xxd -r -p | dd seek=510 bs=1 of=boot.img

Here are the commands to test this boot image:

dosbox -c cls -c 'boot boot.img'
qemu-system-i386 -fda boot.img

The output should look like this:

Ω♣|  ⁿ╗☻ ╕ ╕Ä└î╚Ä╪1 ╛&|╣& ┤◙¼½ΓⁿKu±⌠δ²Ω♣|  ⁿ╗☻ ╕ ╕Ä└î╚Ä╪1 ╛&|╣& ┤◙¼½ΓⁿKu±⌠δ²

Here is the disassembly of this program:

$ ndisasm -o 0x7c00 boot.img
00007C00  EA057C0000        jmp 0x0:0x7c05
00007C05  FC                cld
00007C06  BB0200            mov bx,0x2
00007C09  B800B8            mov ax,0xb800
00007C0C  8EC0              mov es,ax
00007C0E  8CC8              mov ax,cs
00007C10  8ED8              mov ds,ax
00007C12  31FF              xor di,di
00007C14  BE267C            mov si,0x7c26
00007C17  B92600            mov cx,0x26
00007C1A  B40A              mov ah,0xa
00007C1C  AC                lodsb
00007C1D  AB                stosw
00007C1E  E2FC              loop 0x7c1c
00007C20  4B                dec bx
00007C21  75F1              jnz 0x7c14
00007C23  F4                hlt
00007C24  EBFD              jmp short 0x7c23
00007C26  EA057C0000        jmp 0x0:0x7c05
00007C2B  FC                cld
00007C2C  BB0200            mov bx,0x2
00007C2F  B800B8            mov ax,0xb800
00007C32  8EC0              mov es,ax
00007C34  8CC8              mov ax,cs
00007C36  8ED8              mov ds,ax
00007C38  31FF              xor di,di
00007C3A  BE267C            mov si,0x7c26
00007C3D  B92600            mov cx,0x26
00007C40  B40A              mov ah,0xa
00007C42  AC                lodsb
00007C43  AB                stosw
00007C44  E2FC              loop 0x7c42
00007C46  4B                dec bx
00007C47  75F1              jnz 0x7c3a
00007C49  F4                hlt
00007C4A  EBFD              jmp short 0x7c49
00007C4C  0000              add [bx+si],al
00007C4E  0000              add [bx+si],al
...

This program has two identical halves. The first half from offset
0x7c00 to offset 0x7c25 are executable bytes. The second half from
offset 0x7c26 to 0x7c4b are the data bytes read by the executable
bytes. The executable part of the code has an outer loop that uses
the register BX as the counter variable. It sets BX to 2 so that the
outer loop iterates twice. In each iteration, it reads data bytes
from the second half of the program and prints them. The code to
read bytes and print them is very similar to our earlier program.
Since the data bytes in the second half are identical to the
executable bytes in the first half, printing the data bytes twice
amounts to printing all bytes of the program.

While this program does avoid reading the bytes that the CPU
executes, the data bytes look exactly like the executable bytes.
Although I do not see any point in trying to avoid reading
executable bytes in an exercise like, this program serves as an
example of a self-printing boot program that does not execute the
bytes it reads.

Read More