Message-Id: <199907072333.TAA22512@delorie.com> To: "Eli Zaretskii" Cc: djgpp-workers AT delorie DOT com, sandmann AT clio DOT rice DOT edu, pavenis AT lanet DOT lv MIME-Version: 1.0 From: "Erik Berglund" Subject: Re: gcc-crash - and a possible solution Date: Thu, 08 Jul 99 01:15:59 +0100 (DJG) Content-Type: text/plain; charset="ISO-8859-1"; X-MAPIextension=".TXT" Content-Transfer-Encoding: 7bit Reply-To: djgpp-workers AT delorie DOT com Eli Zaretskii wrote: > On Mon, 5 Jul 1999, Erik Berglund wrote: > > > xmalloc-call number 609: > > The malloc itself goes well, but the data in the previously > > allocated block is changed during the malloc call. > > xmalloc: > > dataprobe: @0x292004 = 0x292fec > > call malloc(), returns 0x294004 > > dataprobe: @0x292004 = 0x243d5450 > > return; > > This seems to imply some corruption of the memory pool maintained by > malloc. I would suggest to try to find out why does this happen. Today I looked into malloc, and it seems that the error happens inside morecore, as shown below (my observations are marked "dataprobe"). The last malloc-call before the crash is number 707. In this malloc-call, malloc will allocate a new block at 0x294004, but there is some sort of side-effect: The previous block at 0x292004 is affected, its first longword is either changed or becomes mapped to the wrong physical memory. The funny data that suddenly appears, for instance "PROMPT=$P$G", comes from other malloc-blocks which may be either free or still busy. I'm almost sure about that, because I do recognize environ[]-data (which ought to be still busy), and also some data fragments coming from files I just edited in my editor, which uses malloc. void * malloc(size_t nbytes) { union overhead *op; int bucket, n; unsigned amt; /* * First time malloc is called, setup page size and * align break pointer so all data will be page aligned. */ ***dataprobe: malloc-call number 707, @0x292004=0x292fec (OK). if (pagesz == 0) { pagesz = n = getpagesize(); op = (union overhead *)sbrk(0); n = n - sizeof (*op) - ((int)op & (n - 1)); if (n < 0) n += pagesz; if (n) { if (sbrk(n) == (char *)-1) return (NULL); } bucket = 0; amt = 8; while (pagesz > amt) { amt <<= 1; bucket++; } pagebucket = bucket; } /* * Convert amount of memory requested into closest block size * stored in hash buckets which satisfies request. * Account for space used per block for accounting. */ if (nbytes <= (n = pagesz - sizeof (*op) - RSLOP)) { amt = 8; /* size of first bucket */ bucket = 0; n = -(sizeof (*op) + RSLOP); } else { amt = pagesz; bucket = pagebucket; } while (nbytes > amt + n) { amt <<= 1; if (amt == 0) return (NULL); bucket++; } /* * If nothing in hash bucket right now, * request more memory from the system. */ ***dataprobe: malloc-call number 707, @0x292004=0x292fec (OK). if ((op = nextf[bucket]) == NULL) { morecore(bucket); if ((op = nextf[bucket]) == NULL) return (NULL); } ***dataprobe: malloc-call number 707, @0x292004=0x243d5450 ("PT=$"). ***This shouldn't happen, we are in the process of allocating a block at ***0x294004. Why is a foreign block at 0x292004 affected at this point?? /* remove from linked list */ nextf[bucket] = op->ov_next; op->ov_magic = MAGIC; op->ov_index = bucket; return ((char *)(op + 1)); } > > There is another possibility though: The previously > > allocated block may just have been accidently free'd > > and now malloc thinks it's free for use. Maybe it's a > > good idea to "dataprobe" all calls to free and realloc > > as well. But even so it's hard to explain the sudden > > appearance of the new interesting PROMPT=$P$G data. > It's not hard at all. Using unallocated or uninitialized memory > always shows some chunks of text previously processed by the program. > As I wrote elsewhere, the startup code allocates memory for all > environment variables, so at some time all VAR=VALUE pairs are copied > to memory allocated by the program. But in this particular case, the data at 0x292004 is first good (0x292fec, which is the limit pointer), and then becomes bad (0x243d5450) inside malloc, when malloc allocates the next block. What seems to be the strange thing to me, is how a fixed point in the heap (0x292004) can change this way, as a side-effect of malloc or morecore. Even if free(0x292004) was accidently called, malloc itself wouldn't copy new data in it? If I would guess, could it be an accidental remapping of a page to a new physical memory location, taking place inside win3.11? Next, I will try looking into morecore and sbrk. Problem is, the error disappears when I place the "dataprobe" into one of these modules. I think this is due to the virtual memory facility, pages will be allocated in a different order whenever I patch in a "jump" to the dataprobe somewhere else in the program image. Sort of Heisenberg's theory :-). Then I disabled virtual memory using _CRT0_FLAG_LOCK_MEMORY in CC1, and again, the error disappeared... But when it shows up again, I think it will be easier to find. Finally, I did one more observation: The error disappears when I turn off "Internal Cache" in BIOS setup, but then the compilation will take 5 minutes instead of 5 seconds... -- Erik