www.delorie.com/archives/browse.cgi   search  
Mail Archives: djgpp/1998/12/10/05:01:53

Date: Thu, 10 Dec 1998 11:06:10 +0200 (IST)
From: Eli Zaretskii <eliz AT is DOT elta DOT co DOT il>
X-Sender: eliz AT is
To: Leonid Pauzner <leonid AT pauzner DOT mccme DOT ru>
cc: djgpp AT delorie DOT com
Subject: Re: DJGPP 2.02 fails immediately on FPU-less machine !
In-Reply-To: <AB3PgRsqX4@pauzner.mccme.ru>
Message-ID: <Pine.SUN.3.91.981210110523.1269A-100000@is>
MIME-Version: 1.0
Reply-To: djgpp AT delorie DOT com

On Wed, 9 Dec 1998, Leonid Pauzner wrote:

> > Somebody needs to establish how to fix it.  Please consider debugging
> > this problem on your machine, since no one of the people who work on
> > DJGPP development have access to a machine with no FPU.
> 
> Well, I have a very limited experience with debugging

It doesn't take too much experience to crack these problems, it only
takes some motivation and a bit of mundane work.  I even wrote a
section in the FAQ (section 12.2) that explains how to start a
debugging session using the crash traceback info.

Here's what I've done in this case, to find out the reason (see my
other mail for its description):

  - Compiled and linked with -g a simple program that used floating
    point.

  - Set 387=n (since that was on a machine with an FPU) and set
    emu387=c:/djgpp/bin/emu387.dxe.

  - Run the program and get the crash, then run `symify' on it.  The
    traceback pointed to the first FP instruction in the program.
    Since the crash says SIGNOFP, meaning the emulator is not present,
    the immediate guess was that the emulator is somehow not
    installed (as opposed to the case where it is installed but
    doesn't work correctly: that case would probably yield a SIGFPE).

  - Looked at library sources of the function that installs the
    emulator (file npxsetup.c).  There are several possible causes for
    the failure to load the emulator, so sources alone were not enough
    to find the reason.

  - Run the program under a debugger, set a breakpoint inside the
    function `npxsetup' and stepped through its instructions (you need
    either an assembly-level debugger, such as FSDB, or to use
    assembly-level commands of GDB/RHIDE, like `stepi', `nexti', etc.;
    I used FSDB).  This clearly showed that the problem happens
    because a call to `_dxe_load' returns a NULL pointer, meaning that
    it failed to load the emulator.

  - Run the program under a debugger again, this time set a breakpoint
    inside `_dxe_load', and stepped through it.  This clearly shows
    that the test of the magic signature "DXE1" in the emulator header
    fails.

  - Looked at the file emu387.dxe with Less (you can use any other
    program that displays a binary file, e.g. `od' from Textutils).
    This immediately made evident that the signature is wrong: it's
    "1EXD" in the version supplied with v2.02, whereas emu387.dxe from
    v2.01 has the correct signature "DXE1".

  - Edited emu387.dxe with a binary editor (I used the `hexl' feature
    of Emacs, but any other binary editor will do) and changed the
    signature to the right one.  Run my test program again; it crashed
    again.

  - Run the test program under the debugger yet again.  This time,
    `_dxe_load' passes the signature test, but fails later, when it
    uses other fields in the DXE header.

  - Looked closer at the two versions of emu387.dxe (from v2.02 as
    opposed to v2.01).  This time, I saw that ALL the other fields of
    the DXE header, which are 4-byte integers, are byte-reversed,
    which also explained how the "DXE1" signature got reversed.

  - Concluded that `dxegen', the program that generates emu387.dxe,
    somehow didn't put the bytes in the correct order (since v2.02 was
    built on a Unix box with a big-endian byte order).

I don't think the above is a complicated procedure.  I think anybody
with enough motivation should be able to do it.

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019