Message-ID: <005b01bf1c99$90901940$0201a8c0@center1.mivlgu.ru> From: "Sergey Vlasov" To: "Eli Zaretskii" Cc: Subject: Re: [vsu AT au DOT ru: bugs in itimer.c] Date: Fri, 22 Oct 1999 17:23:30 +0300 MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 4.72.3110.5 X-MimeOLE: Produced By Microsoft MimeOLE V4.72.3110.3 Reply-To: djgpp-workers AT delorie DOT com >> This means that setitimer() and getitimer() take 0.9 seconds under >> Windows 95? This is really slow... > There's something that goes on here, which it would be nice to > understand. I don't think 1 second is what it takes to call getitimer > on Windows. If you can try to find out why does this happen, please > do. The real problem is that sometimes getitimer() returns tv_usec > 999999. My test program uses printf("%ld.%06ld\n", tv_sec, tv_usec) to print the result which looks like a fractional number, and in this case the fractional part overflows. This happens sometimes (not every run) under Windows, and also under DOS when using PMODETSR. It seems that I have found the source of this problem. The conversion from uclocks to microseconds in getitimer() calculates the tv_usec field as uclk*3433/4096 (here uclk = expire % UCLOCKS_PER_SEC). When uclk == 1193179 (the maximum value, UCLOCKS_PER_SEC-1), this gives 1000044, because the coefficient values 3433 and 4096 are not precise. The problem can be seen with my test program when setitimer() and getitimer() execute fast enough. It seems that under Windows some parts of setitimer() execute faster than under DOS with CWSDPMI. With PMODETSR, the problem can be seen under clean DOS also. The bug can be fixed by increasing the precision of the conversion from uclocks to microseconds. The easy fix is just to do it as follows: tv_sec = expire / UCLOCKS_PER_SEC; tv_usec = (expire % UCLOCKS_PER_SEC) * 1000000 / UCLOCKS_PER_SEC; This will work correctly because `expire' is 64-bit. However, it involves many operations with long longs and therefore is slow. I have performed some optimization using `asm' to increase the speed. Calculating (uclk*1000000/UCLOCKS_PER_SEC) does not need to return a 64-bit result -- only the intermediate value needs to be 64-bit wide. Using 32*32->64 expanding multiplication and 64/32->32 division saves both time and space. Maybe setitimer() should be changed to use the more precise version also, so that getitimer() and setitimer() will have identical behavior? The diff against the last version of itimer.c is at the end of this letter. --- Sergey Vlasov =========================== itimer.diff =========================== *** itimer0.c Fri Oct 22 15:51:28 1999 --- itimer.c Fri Oct 22 15:52:18 1999 *************** static uclock_t r_exp, r_rel, /* When R *** 34,39 **** --- 34,56 ---- static uclock_t u_now; + /* Multiply a signed 32-bit integer (val) by a signed 32-bit integer (m) + and divide the 64-bit intermediate result by a signed 32-bit integer (d). + Returns 32-bit signed integer result of the division. + + In other words, calculate (long)(((long long)val * m)/d), but + in a more efficient way, using 32*32->64 multiplication and + 64/32->32 division commands of Intel processors. */ + static inline long muldiv(long val, long m, long d) + { + long rv; + asm( "imull %2\n\t" + "idivl %3" + : "=a" (rv) : "a" (val), "r" (m), "r" (d) : "dx" ); + return rv; + } + + int getitimer(int which, struct itimerval *value) { *************** getitimer(int which, struct itimerval *v *** 61,70 **** errno = EINVAL; return -1; } ! value->it_value.tv_sec = expire / UCLOCKS_PER_SEC; ! value->it_value.tv_usec = (expire % UCLOCKS_PER_SEC)*3433/4096; ! value->it_interval.tv_sec = reload / UCLOCKS_PER_SEC; ! value->it_interval.tv_usec= (reload % UCLOCKS_PER_SEC)*3433/4096; return 0; } --- 78,89 ---- errno = EINVAL; return -1; } ! value->it_value.tv_sec = expire / UCLOCKS_PER_SEC; ! value->it_value.tv_usec = muldiv(expire % UCLOCKS_PER_SEC, ! 1000000, UCLOCKS_PER_SEC); ! value->it_interval.tv_sec = reload / UCLOCKS_PER_SEC; ! value->it_interval.tv_usec = muldiv(reload % UCLOCKS_PER_SEC, ! 1000000, UCLOCKS_PER_SEC); return 0; }