Message-ID: <19990812174541.47519@atrey.karlin.mff.cuni.cz> Date: Thu, 12 Aug 1999 17:45:41 +0200 From: Jan Hubicka To: pgcc AT delorie DOT com Subject: Re: optimizing for k6 References: <3 DOT 0 DOT 32 DOT 19990808144013 DOT 0119ad00 AT pop DOT xs4all DOT nl> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 0.84 In-Reply-To: <3.0.32.19990808144013.0119ad00@pop.xs4all.nl>; from Vincent Diepeveen on Sun, Aug 08, 1999 at 02:40:16PM +0100 Reply-To: pgcc AT delorie DOT com X-Mailing-List: pgcc AT delorie DOT com X-Unsubscribes-To: listserv AT delorie DOT com Precedence: bulk > There is a very easy way of optimizing for K6, > just rewrite everything in 8 bits and you're 2 times faster. > Why? The 8 bit arithmetic is issues to X pipe only, so it ought to be 2 times slower... many 8 bit insns have longer decoding latencies in 8 bit versions. Honza > Greetings, > Vincent > > /At 11:49 AM 8/7/99 +0200, you wrote: > >Henrik Berglund SdU wrote: > >> > >> ftp://ftp.sinica.edu.tw/pub/doc/cpu/www.amd.com/K6/k6docs/pdf/21828a.pdf > >> > >> > ----------------------------------------------------------------------------- > >> Henrik DOT Berglund AT mds DOT mdh DOT se > >> http://www.mds.mdh.se/~adb94hbd/ > > > >This is a long known document, it does some help in optimizing. But the > >information is just too incomplete to get really good optimizations. > > > >There is also a lot of mistakes in that document. I had a little > >discussion > >with AMD technical support, but they did not help :-( > >AMD Technical Support wrote: > >> > >> >Return-Path: > >> >Sender: wolfi AT neuss DOT netsurf DOT de > >> >Date: Fri, 12 Mar 1999 19:10:15 +0100 > >> >From: Wolfgang Formann > >> >To: AMD Technical Support > >> >Subject: Re: Some question to your literature, maybe a typo? > >> >References: <3 DOT 0 DOT 32 DOT 19990303153034 DOT 0074931c AT pedigree DOT amd DOT com> > >> > > >> > >> Hi, > >> > >> it is the last update of the document. I think you must try it. > >> > >> Kind regards > >> > >> Bernard > >> > >> >AMD Technical Support wrote: > >> >> > >> >> >Return-Path: > >> >> >X-Sender: support2 AT pedigree > >> >> >Date: Thu, 25 Feb 1999 06:39:16 +0100 > >> >> >To: blikefet AT pedigree DOT amd DOT com > >> >> >From: Wolfgang Formann (by way of CPA > ) > >> >> >Subject: Some question to your literature, maybe a typo? > >> >> > > >> >> >I just downloaded the document > http://www.amd.com/K6/k6docs/pdf/21828a.pdf. > >> >> >The table in Chaper 4, Pages 37 to 40 says, that all the shift > operations > >> >> >like SHIFT mreg16/32,imm8; SHIFT mreg16/32, 1; SHIFT mreg16/32, CL; > where > >> >> >SHIFT can be replaced by SAR, SHL/SAL and SHR, are executed as > RISC86(tm) > >> >> >Opcode alu. This RISC86(tm) operation is explained on page 24 as > >> >> >`alu - either of the integer execution units`. > >> >> > > >> >> >Whereas in chapter 3 on page 12, this document lists some (all?) > operations > >> >> >which can be performed in the Integer Y execution unit. In the list of > >> >> >operations '(ADD, AND, CMP, OR, SUB and XOR)' there is none of the > SHIFT's > >> >> >mentioned. > >> >> > > >> >> >By trying it out (I think) I found that chapter 3 is right and the > table > >> >> >in chapter 4 has typos. > >> >> > > >> >> >My question: Is there any updated version of this document available or > >> >> >do I have to try out all the other opcodes not listed in chapter 3, but > >> >> >marked as 'alu' in the table in chapter 4 (like mov, movzx)? > >> >> > > >> >> >Thank you > >> >> > >> >> Hi, > >> >> > >> >> the latest version of the document is on the our webside. > >> > > >> >so, it still seems to have different information on the same > instruction :-( > >> > > >> >Is there any additional information available, not shown on your web page? > >> > > >> >Thanks again! > >> > > >> >> > >> >> Kind regards > >> >> Bernard Likefett > >> >> AMD Technical Support > >> > > >> > > >> Bernard Likefett > >> AMD Technical Support > >> > >> Please included all previous emails > >> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > >> Advanced Micro Devices _______ > >> AMD House \____ | Advanced > >> Frimley Business Park /| | | Micro > >> Frimley, Camberley | |___| | Devices > >> Surrey |____/ \| > >> GU16 5SL > >> United Kingdom > >> > >> EMail id euro DOT tech AT amd DOT com Our Web site is http://www.amd.com > >> Phone +44 (0)1276 803299 Fax +44 (0)1276 803298 > >> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > > >Another thing in that manual is the nice table labeled 'Instruction > >Dispatch and Execution Timing' starting at page 35. Just a few > >questions: > >How many internal cycles do all these vector operations take? > >What internal execution units are used? > > > >Well, there is no answer, so you have to try them out. The only thing > >you can be sure of, is that you should always use opcodes which can get > >decoded in parallel, these are the ones marked with 'short' since it > >seems that the bottleneck of that CPU is the decoder. > > > >The next thing is the nice tables in the chapter labeled 'Code Sample > >Analysis'. Did you really understand them? I tried to optimize some > >real code and took these tables as input, but I failed :-( My processor > >seems to behave very different. I did not find out what was wrong. > >So it seems to me, that a lot of information in this document is > >only for marketing purposes, there are too few details and too many > >wrong informations to really help to optimize the code. > > > >Wolfgang > > > > -- OK. Lets make a signature file. +-------------------------------------------------------------------------+ | Jan Hubicka (Jan Hubi\v{c}ka in TeX) hubicka AT freesoft DOT cz | | Czech free software foundation: http://www.freesoft.cz | |AA project - the new way for computer graphics - http://www.ta.jcu.cz/aa | | homepage: http://www.paru.cas.cz/~hubicka/, games koules, Xonix, fast | | fractal zoomer XaoS, index of Czech GNU/Linux/UN*X documentation etc. | +-------------------------------------------------------------------------+