www.delorie.com/archives/browse.cgi   search  
Mail Archives: pgcc/1999/08/07/21:07:43

Sender: wolfi AT neuss DOT netsurf DOT de
Message-ID: <37AC0114.F3BC458A@neuss.netsurf.de>
Date: Sat, 07 Aug 1999 11:49:08 +0200
From: Wolfgang Formann <w DOT formann AT neuss DOT netsurf DOT de>
X-Mailer: Mozilla 4.6 [en] (X11; I; Linux 2.2.8 i586)
X-Accept-Language: German, de, en
MIME-Version: 1.0
To: pgcc AT delorie DOT com
CC: adb94hbd AT mds DOT mdh DOT se'
Subject: Re: optimizing for k6
References: <Pine DOT GSO DOT 4 DOT 10 DOT 9908051303340 DOT 29067-100000 AT legolas DOT mdh DOT se>
Reply-To: pgcc AT delorie DOT com

Henrik Berglund SdU wrote:
> 
> ftp://ftp.sinica.edu.tw/pub/doc/cpu/www.amd.com/K6/k6docs/pdf/21828a.pdf
> 
> -----------------------------------------------------------------------------
> Henrik DOT Berglund AT mds DOT mdh DOT se
> http://www.mds.mdh.se/~adb94hbd/

This is a long known document, it does some help in optimizing. But the
information is just too incomplete to get really good optimizations.

There is also a lot of mistakes in that document. I had a little
discussion
with AMD technical support, but they did not help :-(
AMD Technical Support wrote:
> 
> >Return-Path: <w DOT formann AT neuss DOT netsurf DOT de>
> >Sender: wolfi AT neuss DOT netsurf DOT de
> >Date: Fri, 12 Mar 1999 19:10:15 +0100
> >From: Wolfgang Formann <w DOT formann AT neuss DOT netsurf DOT de>
> >To: AMD Technical Support <blikefet AT pedigree DOT amd DOT com>
> >Subject: Re: Some question to your literature, maybe a typo?
> >References: <3 DOT 0 DOT 32 DOT 19990303153034 DOT 0074931c AT pedigree DOT amd DOT com>
> >
> 
> Hi,
> 
> it is the last update of the document. I think you must try it.
> 
> Kind regards
> 
> Bernard
> 
> >AMD Technical Support wrote:
> >>
> >> >Return-Path: <euro DOT lit AT amd DOT com>
> >> >X-Sender: support2 AT pedigree
> >> >Date: Thu, 25 Feb 1999 06:39:16 +0100
> >> >To: blikefet AT pedigree DOT amd DOT com
> >> >From: Wolfgang Formann <w DOT formann AT neuss DOT netsurf DOT de> (by way of CPA <euro DOT lit AT amd DOT com>)
> >> >Subject: Some question to your literature, maybe a typo?
> >> >
> >> >I just downloaded the document http://www.amd.com/K6/k6docs/pdf/21828a.pdf.
> >> >The table in Chaper 4, Pages 37 to 40 says, that all the shift operations
> >> >like SHIFT mreg16/32,imm8; SHIFT mreg16/32, 1; SHIFT mreg16/32, CL; where
> >> >SHIFT can be replaced by SAR, SHL/SAL and SHR, are executed as RISC86(tm)
> >> >Opcode alu. This RISC86(tm) operation is explained on page 24 as
> >> >`alu - either of the integer execution units`.
> >> >
> >> >Whereas in chapter 3 on page 12, this document lists some (all?) operations
> >> >which can be performed in the Integer Y execution unit. In the list of
> >> >operations '(ADD, AND, CMP, OR, SUB and XOR)' there is none of the SHIFT's
> >> >mentioned.
> >> >
> >> >By trying it out (I think) I found that chapter 3 is right and the table
> >> >in chapter 4 has typos.
> >> >
> >> >My question: Is there any updated version of this document available or
> >> >do I have to try out all the other opcodes not listed in chapter 3, but
> >> >marked as 'alu' in the table in chapter 4 (like mov, movzx)?
> >> >
> >> >Thank you
> >>
> >> Hi,
> >>
> >> the latest version of the document is on the our webside.
> >
> >so, it still seems to have different information on the same instruction :-(
> >
> >Is there any additional information available, not shown on your web page?
> >
> >Thanks again!
> >
> >>
> >> Kind regards
> >> Bernard Likefett
> >> AMD Technical Support
> >
> >
> Bernard Likefett
> AMD Technical Support
> 
> Please included all previous emails
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> Advanced Micro Devices _______
> AMD House \____ | Advanced
> Frimley Business Park /| | | Micro
> Frimley, Camberley | |___| | Devices
> Surrey |____/ \|
> GU16 5SL
> United Kingdom
> 
> EMail id euro DOT tech AT amd DOT com Our Web site is http://www.amd.com
> Phone +44 (0)1276 803299 Fax +44 (0)1276 803298
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Another thing in that manual is the nice table labeled 'Instruction
Dispatch and Execution Timing' starting at page 35. Just a few
questions:
How many internal cycles do all these vector operations take?
What internal execution units are used?

Well, there is no answer, so you have to try them out. The only thing
you can be sure of, is that you should always use opcodes which can get
decoded in parallel, these are the ones marked with 'short' since it
seems that the bottleneck of that CPU is the decoder.

The next thing is the nice tables in the chapter labeled 'Code Sample
Analysis'. Did you really understand them? I tried to optimize some
real code and took these tables as input, but I failed :-( My processor
seems to behave very different. I did not find out what was wrong.
So it seems to me, that a lot of information in this document is
only for marketing purposes, there are too few details and too many
wrong informations to really help to optimize the code.

Wolfgang

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019