COM file
From Wikipedia, the free encyclopedia
| COM | |
|---|---|
| File name extension | .COM |
| Type of format | Executable |
The file name extension .com has been used in various computer systems for different purposes. Originally, the term stood for "Command file" and was a text file containing commands to be issued to the operating system. This was the practice on many of the Digital Equipment Corporation mini and mainframe computer systems going back to the 1970s.[1]
With the introduction of microcomputers this use of files ending with the extension .com changed. In MS-DOS and compatible DOSes, and in 8-bit CP/M, a COM file is a simple type of executable file. The name of the file format is derived from the file name extension .com (not to be confused with the .com top-level domain), which was originally the extension used for such files. However there is no actual association between the file format and the file name extension in any but CP/M and very early versions of MS-DOS.
Contents |
[edit] Binary format
COM is a very old executable format, and the simplest one of all. The variant used on IBM PC has no header, contains no metadata, only code and data. It is limited to 64 KiB in size and since it lacks relocation information, can only contain one code segment, and is loaded at offset 0x0100 of some segment and executed. Because of how the segmentation model works, there is no need for relocation.
Its simplicity exacts a price, however: the binary has a maximum size of 65,280 (0xFF00) bytes and stores all its code and data in one segment. This was not an issue on early 8-bit machines, but it is the main reason why the format fell into disuse soon after the introduction of 16- and then 32-bit processors with their much larger, segmented memories.
In Intel 8080 CPU architecture, only 65,536 bytes of memory could be addressed (address range 0x0000 to 0xFFFF). Under CP/M, the first page of this memory, from 0x0000 to 0x00FF was reserved for system use, and any user program had to be loaded at exactly 0x0100 to be executed. COM files fit this model perfectly. Note that there was no possibility of running more than one program or command at a time: the program loaded at 0x0100 was run, and no other.
Although the file format is the same in MS-DOS and CP/M, this does not mean that CP/M programs can be directly executed under MS-DOS or vice versa; MS-DOS COM files contain x86 instructions, while CP/M COM files contain 8080, 8085 or Z80 instructions. Additionally, MS-DOS COM files often depend on operating system traps supplied exclusively by MS-DOS via interrupt 21h. It is possible to construct a fat COM file which both processor families can execute.
Files may have names ending in .COM, but not be in the simple format described above; this is indicated by a magic number at the start of the file. For example, the COMMAND.COM file in DR-DOS 6 is actually in DOS executable format, indicated by the first two bytes being MZ (0x4D 0x5A), the initials of Mark Zbikowski. Under CP/M 3, if the first byte of a COM file is 0xC9 then this indicates the presence of a 256-byte header; since 0xC9 corresponds to the 8080 instruction RET, this means that the COM file will immediately terminate if run on an earlier version of CP/M that does not support this extension.
[edit] Execution preference
If a directory contains both a COM file and an EXE file with same name (not including extension), the COM file is preferred. For example, if a directory contains two files named foo.com and foo.exe, the following would execute foo.com:
C:\>foo
If the user wishes to run foo.exe, they can explicitly use the complete filename:
C:\>foo.exe
Taking advantage of this default behaviour, virus writers and other malicious programmers sometimes use names like notepad.com for their creations. Their hope is that, if it is placed in the directory of the corresponding EXE file, a run command or batch file may accidentally trigger their program instead of the ubiquitous notepad.exe text editor.
On Windows NT and derivatives (Windows 2000, Windows XP, and Windows Vista), the PATHEXT variable is used to determine the order of preference (and acceptable extensions) for calling files without extensions from the command line. The default value still places .com files before .exe files, however.
[edit] Platform support
The format is still executable on many modern Windows-based platforms, but it is run in an MS-DOS-emulating subsystem (NTVDM) which was removed from the 64-bit variants. COM files can also be executed on DOS emulators such as DOSBox, on any platform supported by these emulators. Many commands such as the MS-DOS version of more used this format, as well as small, early applications.
[edit] Recent malicious usage of the .com extension
Some recent computer virus writers have hoped to capitalise on modern computer users' likely lack of knowledge of the COM command-file format, along with their more likely familiarity with dot-com Internet domain names. E-mails have been sent with an attachment named along the lines of "www.example.com". Unwary Microsoft Windows users who clicked on the attachment would not be visiting a web site at http://www.example.com/, but rather would find themselves running a carefully crafted, and probably malicious, binary command called www.example and giving it full permission to do to their machine whatever its author had in mind.
Note that there is nothing malicious about the COM file format itself. This example highlights an unintended name collision between .com command files and, a decade or more later, .com commercial web sites.

