From: "David Jonsson" To: Subject: SSI/KNI support (was RE: Intel/Cygnus) Date: Fri, 5 Mar 1999 12:57:34 +0100 Message-ID: <000001be66ff$5c17a660$3bd16482@ellemtel.se> MIME-Version: 1.0 Content-Type: text/plain; charset="Windows-1252" X-Priority: 3 (Normal) X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook 8.5, Build 4.71.2173.0 Importance: Normal In-Reply-To: <19990304152121.42144@insula.local> X-MimeOLE: Produced By Microsoft MimeOLE V5.00.0810.800 Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by delorie.com id GAA03055 Reply-To: pgcc AT delorie DOT com > > ---------- > > From: Philipp Rumpf[SMTP:PRUMPF AT JCSBS DOT LANOBIS DOT DE] > > Sent: Thursday, March 04, 1999 4:21:21 PM > > To: pgcc AT delorie DOT com > > Subject: Re: Intel/Cygnus > > Auto forwarded by a Rule > > > > This is far from trivial. The C syntax need to be abandoned if > the optimization > > is to be transparent from the programmer, see SWAR > http://shay.ecn.purdue.edu/~swar/ > > I cannot see what is so difficult about it[1] ... I think it is > just a special case of loop unrolling. > > char *p; > int i; > > for(i=0; i<4; i++) > p[i] |= 0x80; > > should become a 32-bit OR ... once we can do that, the rest of > SIMD should be trivial[2] What you write is trivial if it is allowed. I am no compiler expert but I don't think that a compiler is allowed to unroll that loop. It isn't obvious that p and i are independent. Or p[0] and p[1] etc. > > Another approach is to use a MACRO like addition to ordinary compilers. > > This is what Apple has done with AltiVec wich is more promising than MMX > > or KNI/SSI, http://developer.apple.com/hardware/altivec/model.html > > Intel is doing something very similar in their compilers, they > even give the > compiler intrinsics or whatever they call them in the instruction > set reference ... Like libmmx below? > The macro approach has additional advantages though, I really > would not like to get > 11 bits precision for a normal float though I probably would not > mind sometimes. This is enough many times like for sound-processing or simple geometry. > [2] - Well, it could be a bit difficult to ensure a float * is > 128-bit aligned ... Just align all memory on 128-bit boundaries when compiling or what about a new type like Randy Fisher's libmmx http://min.ecn.purdue.edu/~rfisher/Research/Libmmx/libmmx.html typedef union { long long long long o; /* Octalword (128-bit) value */ unsigned long long long long uo; /* Unsigned Octalword */ int d[4]; /* 4 Doubleword (64-bit) values */ unsigned int ud[4]; /* 4 Unsigned Doubleword */ short w[8]; /* 8 Word (16-bit) values */ unsigned short uw[8]; /* 8 Unsigned Word */ char b[16]; /* 16 Byte (8-bit) values */ unsigned char ub[16]; /* 16 Unsigned Byte */ float s[4]; /* Single-precision (32-bit) value */ } __attribute__ ((aligned (16))) ssi_t; /* On an 16-byte (128-bit) boundary */ He also defines macros making it possiblem to write like paddd_m2r(variable, mm0) I asked him in december if he should support SSI but he said he had full time with MMX. I hope all extra instructions like 3Dnow! MMX SSI can be in the same .h file. David