Message-ID: In-Reply-To: References: Conversation with last message Priority: Normal X-MSMail-Priority: Normal X-Priority: 3 To: "Eli Zaretskii" Cc: djgpp-workers AT delorie DOT com, sandmann AT clio DOT rice DOT edu, pavenis AT lanet DOT lv MIME-Version: 1.0 From: "Erik Berglund" Subject: Re: gcc-crash - and a possible solution Date: Thu, 08 Jul 99 22:50:00 +0100 (DJG) Content-Type: text/plain; charset="ISO-8859-1"; X-MAPIextension=".TXT" Content-Transfer-Encoding: 7bit Reply-To: djgpp-workers AT delorie DOT com X-Mailing-List: djgpp-workers AT delorie DOT com X-Unsubscribes-To: listserv AT delorie DOT com Precedence: bulk Eli Zaretskii wrote: > > Finally, I did one more observation: The error disappears when > > I turn off "Internal Cache" in BIOS setup, but then the compilation > > will take 5 minutes instead of 5 seconds... > Last time I saw strange problems that disappeared when the cache was > turned off, it was a case of a bad motherboard: the system clock was > driving the memory chips too fast, and some of the SIMMs were > sometimes not keeping up. Interestingly enough, the problem was > detected by GCC (crashes during compilation), although I think it was > on plain DOS, not in Windows. Yes, I wouldn't rule out hardware problems completely yet, but my pc (PP200 (SY013), ASUS P/I-P6RP4, 128 Mb EDO, Orion 450KX PCIset, 16 kb L1-cache, 256 kb L2-cache, pc from 1996) uses to work very well, especially in plain DOS. I've also tried to change to some old parity SIMMs or "shift" the EDOs one step down, but the error didn't go away. However, I did some interesting observations when I started to examine the contents of the physical RAM. First I made a little program which maps all physical memory, and then searches for a specified 32-bit pattern. The addresses below are all physical addresses. Not all occurences of the data are shown here, only most important. 1) Virtual Memory is enabled: a) After a successful run: address 0x127e004: 0x243d5450 address 0x138a004: 0x00292fec (right data) b) After crash (triggered by 3-program previously run): address 0x127e004: 0x243d5450 address 0x1388e10: 0x243d5450 address 0x138b004: 0x243d5450 (wrong data) 2) Virtual Memory is enabled, but _CRT0_FLAG_LOCK_MEMORY is used in CC1: a) After a successful run: address 0x1342004: 0x243d5450 address 0x138b004: 0x243d5450 (right data) address 0x14a2004: 0x00292fec (right data) b) Crash could not be triggered! Here is my new theory: For a successful run, what we would expect on physical address 0x138a004 (or 0x138b004) is a valid lp->limit value (0x00292fec), which is the case in 1a) and 2a). In case 1b), however, someone has accidently written 0x243d5450 ("PT=$") on address 0x138b004. This happens inside morecore() in malloc.c (see previous mail). The lp-limit value (0x00292fec) has disappeared, so we will get a crash. Now look at case 2a)! Here we will find both values (0x243d5450 and 0x00292fec) but on _different_ addresses, and all goes well! What I think is: There is nothing wrong with malloc.c or morecore() or sbrk() or the whole CC1 for that matter. But Windows 3.11 DPMI somehow mixes up pages in some cases when Virtual Memory is enabled. Probably what we see is a "double-use" of a physical page at 0x0138b004, which could give all sorts of really strange results. It's fully possible to investigate this further, these are just my first impressions. Meanwhile, I'm going to use _CRT0_FLAG_LOCK_MEMORY in CC1 as a first work-around, and see if I get any crashes. So far, I've done a hundred compilations with different infiles and it seems to work. Also, there is probably nothing wrong with high addresses (>2Gb), as I first thought. -- Erik