To: cnc AT netcom DOT com (Christopher Christensen) Cc: djgpp AT sun DOT soe DOT clarkson DOT edu Subject: Re: 32/16bit? Date: Wed, 12 Oct 94 14:49:08 +0200 From: "Eli Zaretskii" I wrote: >> With all due respect to counting clock cycles, you forget the prefetch >> and decode queues of the processor, which all but annihilate that extra >> clock cycle (except for the very first instruction each time the queues >> are flushed, which shouldn't happen too often). You wrote: > I am skeptical. Could you point to a source (preferrably in the Intel > literature) that confirms this? In the Intel Pentium Processor User's > Manual, in the section on optimization (section 24-4 if memory serves), > it specifically says to avoid instructions that require the operand size > prefix and it strongly implies that the operand size prefix costs you an > extra 1 clock on every instruction that uses it. > My understanding has > been that these prefix bytes (operand size, segment override, address size, > etc) are essentially treated as separate instructions by the CPU. I don't have Intel manuals, so I use the "PC Magazine Programmer's Technical Reference: The Processor and Coprocessor", by Robert L. Hummel, Ziff-Davis Press, 1992 (ISBN 1-56276-016-5). Good enough? It says on pp. 345-346 that Address Size Prefix and Operand Size Prefix both take 0 clock cycles on a 386 and 1 clock cycle on a 486. Pentium clock counts aren't listed, but 1 clock penalty, like for 486, seems reasonable. On p. 151 the above book says that Segment Override Prefix takes 0 clock cycles for both 386 and 486, because beginning with 286 the effective address calculation is performed by dedicated hardware. Again, Pentium clock cycles aren't listed (it didn't exist in '92). So I would say that for 486 and Pentium you are right 2 times out of 3, while for 386 you are wrong. So much for the theory. If and when you'll have the result of your benchmark program, I'd be interested to hear how it *really* performs. EZ