Date: Thu, 13 Jan 2000 18:14:50 +0200 (IST) From: Eli Zaretskii X-Sender: eliz AT is To: jazir cc: djgpp AT delorie DOT com Subject: Re: upgrade chaos In-Reply-To: <387D90EF.1D5B0DCB@mpx.com.au> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Reply-To: djgpp AT delorie DOT com Errors-To: dj-admin AT delorie DOT com X-Mailing-List: djgpp AT delorie DOT com X-Unsubscribes-To: listserv AT delorie DOT com Precedence: bulk On Thu, 13 Jan 2000, jazir wrote: > In fact, when I compiled my program with YAMD, everything worked as expected. > There was no crash..that could just mean that my bug goes back to doing no > damage, as in djgpp v2.02, but YAMD should still detect it... i'm puzzled. It might mean that your bug is very subtle, and goes away when the code is rearranged a bit (using YAMD adds some code). Can you run the program under GDB or RHIDE and see if the problem still happens? If it does, you could look at some data structures that seem to be involved in this particular call to malloc/free, and perhaps find the corrupted buffer. If the crashes happen inside the debugger, but you cannot find the corrupted buffer, then the next question is: does it crash because of some consistently wrong address? Looking at the registers printed when it crashes would tell: if the registers are identical, then some variable gets consistently garbled. I then suggest the following procedure to continue debugging: - disassemble the code of malloc and free near the locus of their crash, and find out which register holds the garbled value (section 12.2 of the FAQ explains how to do this); - look at the disassembled code and find out where (from what address) did the garbled value come from; - put a watchpoint at the offending address and run the program again. Now when the offending address is garbled, the debugger will kick in and show you the code which writes there. The rest should be easy. > Here is the symify'ed output for two different runs {compressing different > files}. As you can see, one GPF occurs in malloc, the other in free. These crashes indicate that the data structures used by malloc/free are garbled, which usually happens when buffers are overwritten. > I just ran the program on yet another file. It worked the first time, but > crashed the second time, with a result the same as the second one above. > It makes things even harder when the bug doesn't appear every time, even if > the program input is the same. One more sign that the bug is very subtle, sigh... > Is there a possibility I had the old malloc+free in a v2.02 distribution? You can verify this by looking at the date malloc.o was compiled: ar tv libc.a malloc.o What do you see when you do this for libc.a from v2.02?