Date: Thu, 10 Dec 1998 11:06:10 +0200 (IST) From: Eli Zaretskii X-Sender: eliz AT is To: Leonid Pauzner cc: djgpp AT delorie DOT com Subject: Re: DJGPP 2.02 fails immediately on FPU-less machine ! In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Reply-To: djgpp AT delorie DOT com On Wed, 9 Dec 1998, Leonid Pauzner wrote: > > Somebody needs to establish how to fix it. Please consider debugging > > this problem on your machine, since no one of the people who work on > > DJGPP development have access to a machine with no FPU. > > Well, I have a very limited experience with debugging It doesn't take too much experience to crack these problems, it only takes some motivation and a bit of mundane work. I even wrote a section in the FAQ (section 12.2) that explains how to start a debugging session using the crash traceback info. Here's what I've done in this case, to find out the reason (see my other mail for its description): - Compiled and linked with -g a simple program that used floating point. - Set 387=n (since that was on a machine with an FPU) and set emu387=c:/djgpp/bin/emu387.dxe. - Run the program and get the crash, then run `symify' on it. The traceback pointed to the first FP instruction in the program. Since the crash says SIGNOFP, meaning the emulator is not present, the immediate guess was that the emulator is somehow not installed (as opposed to the case where it is installed but doesn't work correctly: that case would probably yield a SIGFPE). - Looked at library sources of the function that installs the emulator (file npxsetup.c). There are several possible causes for the failure to load the emulator, so sources alone were not enough to find the reason. - Run the program under a debugger, set a breakpoint inside the function `npxsetup' and stepped through its instructions (you need either an assembly-level debugger, such as FSDB, or to use assembly-level commands of GDB/RHIDE, like `stepi', `nexti', etc.; I used FSDB). This clearly showed that the problem happens because a call to `_dxe_load' returns a NULL pointer, meaning that it failed to load the emulator. - Run the program under a debugger again, this time set a breakpoint inside `_dxe_load', and stepped through it. This clearly shows that the test of the magic signature "DXE1" in the emulator header fails. - Looked at the file emu387.dxe with Less (you can use any other program that displays a binary file, e.g. `od' from Textutils). This immediately made evident that the signature is wrong: it's "1EXD" in the version supplied with v2.02, whereas emu387.dxe from v2.01 has the correct signature "DXE1". - Edited emu387.dxe with a binary editor (I used the `hexl' feature of Emacs, but any other binary editor will do) and changed the signature to the right one. Run my test program again; it crashed again. - Run the test program under the debugger yet again. This time, `_dxe_load' passes the signature test, but fails later, when it uses other fields in the DXE header. - Looked closer at the two versions of emu387.dxe (from v2.02 as opposed to v2.01). This time, I saw that ALL the other fields of the DXE header, which are 4-byte integers, are byte-reversed, which also explained how the "DXE1" signature got reversed. - Concluded that `dxegen', the program that generates emu387.dxe, somehow didn't put the bytes in the correct order (since v2.02 was built on a Unix box with a big-endian byte order). I don't think the above is a complicated procedure. I think anybody with enough motivation should be able to do it.