Mailing-List: contact cygwin-developers-help AT cygwin DOT com; run by ezmlm List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-developers-owner AT cygwin DOT com Delivered-To: mailing list cygwin-developers AT cygwin DOT com Message-ID: <006d01c24872$d2f15c60$6132bc3e@BABEL> From: "Conrad Scott" To: Subject: __stdcall and regparm Date: Tue, 20 Aug 2002 18:55:51 +0100 MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2600.0000 As part of my fiddling about with a putative readv/writev implementation, I checked the improvements gained by the __stdcall and the regparm attributes. In summary, __stdcall makes the DLL slower as does regparm (3); the fastest combination is to avoid __stdcall and to use regparm (2) (and this seems to be insensitive to the number of arguments passed to the function). (Nb. This is just tested on my m/c, a Pentium III, and with gcc 2.95.3-5.) If this is true for other CPUs / compiler versions, it might be worthwhile changing these settings throughout the DLL, unless these declarations have been added for some other reason than speed. I tested this with the cygwin DLL itself, changing the declarations of the fhandler::read (and fhandler::readv) methods, then testing the DLL with a program that reads 16Mb from /dev/zero one byte at a time and writes it to /dev/null (again, one byte at a time). The combinations I tested are as follows (fastest first): regparm (2) 0m37.354s __stdcall, regparm (2) 0m37.440s regparm (1) 0m37.482s regparm (2), regparm (3) 0m38.364s (*) regparm (3) 0m38.566s neither 0m38.654s __stdcall 0m38.848s __stdcall, regparm (3) 0m39.409s (**) (*) This uses regparm (2) for fhandler::read and regparm (3) for fhandler::readv, which has 3 arguments in my current implementation. (**) These are the current settings for the cygwin DLL. My guess is that regparm (3) wrecks the optimization of the calling function, since all three x86 temporary registers have to made available for the call. Given this, the new gcc (3.2) might do better here as it's got a different register allocator (as I understand it). If I can be bothered I'll do some tests on that tomorrow. Nb, the difference in performance here of nearly 2 seconds between slowest and fastest results amounts to about an eighth of a microsecond per read(2) call; perhaps not immensely significant. Compare that to a difference between the stock DLL and my current readv/writev changes of something like half a microsecond per read(2) call (simply due to the increased numbers of function calls since read(2) is forwarded to readv(2) and so forth.). HTH, // Conrad