From: nxk3 AT b63526 DOT student DOT cwru DOT edu (Natarajan Krishnaswami)
Newsgroups: comp.os.msdos.djgpp
Subject: Re: Optimization
Date: 29 Nov 1996 18:41:43 GMT
Organization: Case Western Reserve University, Cleveland OH (USA)
Lines: 33
Message-ID: <slrn59ubiq.nb.nxk3@b63526.student.cwru.edu>
References: <57hg9b$or5 AT kannews DOT ca DOT newbridge DOT com> <329C95AD DOT C3E AT silo DOT csci DOT unt DOT edu> <57k531$5bu AT kannews DOT ca DOT newbridge DOT com>
NNTP-Posting-Host: b63526.student.cwru.edu
To: djgpp AT delorie DOT com
DJ-Gateway: from newsgroup comp.os.msdos.djgpp

>Well, my logic is this: You have to move 2x as much data around; this
>means your L1 cache fills up 2x as fast. This is not good. 

You piqued my interest, so I looked it up:

  "To gain efficiency in the implementation of the internal cache, storage is
  allocated in chunks of 128 bits, called cache lines.  External caches are not
  likely to use cache lines smaller than those of the internal cache."
  [...]
  "To simplify the hardware implementation, cache lines can only be mapped to
  aligned 128-bit blocks of main memory."

(i486 Microprocessor Programmer's Reference Manual, 12-1, 12-2)

Also, 

  "Because the i486 microprocessor has a 32-bit data bus, communications
  between the processor and memory take place as doubleword transfers
  aligned to addresses evenly divisible by 4; the processor converts
  doubleword transfers aligned to other addresses into multiple transfers;..."

(ibid., 2-4, 2-6)


Well, have fun optimizing your program.


Cheers,
Natarajan 
-==(UDIC)==-
"Time flies like an arrow; fruit flies like a banana."
			-Groucho Marx