Date: Thu, 14 Oct 1999 10:42:55 +0200 (IST)
From: Eli Zaretskii <eliz AT is DOT elta DOT co DOT il>
X-Sender: eliz AT is
To: Johnny Chan <jchan AT paclink DOT net>
cc: djgpp AT delorie DOT com
Subject: RE: Q: Want to know the starting address and size of my program
In-Reply-To: <001401bf15b7$9a7635e0$ae3d7a86@phoenix.com>
Message-ID: <Pine.SUN.3.91.991014104235.26124H-100000@is>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Reply-To: djgpp AT delorie DOT com
X-Mailing-List: djgpp AT delorie DOT com
X-Unsubscribes-To: listserv AT delorie DOT com
Precedence: bulk


On Wed, 13 Oct 1999, Johnny Chan wrote:

> >>If you want safety, your program shall not write to any address below
> >>what sbrk(0) returns, because some or all of the addresses in this
> >>range are used by the code or the data of your program.
> 
> Is it possible to find out exactly where is my data, stack, code area
> located.

Yes, there is a way, but why do you need it?  Writing over your data
or stack is no less dangerous than writing over your code, right?

Anyway, the memory layout is defined by the linker script
lib/djgpp.djl.  Take a look at it: it says that the symbol "_etext"
marks the end of code, "_edata" marks the end of data, and "_end"
marks the end of the bss section.  These sections follow one another
in the order mentioned, so, e.g., data begins at _etext.  The stack
begins at __djgpp_stack_limit+_stklen and expands downwards.  Above
__djgpp_stack_limit is the heap that goes all the way to sbrk(0).

The linker script also shows the section alignment, so you can compute
the holes between the sections that are unused.

> The code and data will be very small

However small the code and data are, you should not write over them.

> no heap will be involved here.

You cannot say this, unless you rewrite many libc functions or don't
use library functions at all (which in practice means you need to
rewrite the entire startup code).  The reason is that the startup code
calls functions that allocate data via malloc.  So you *do* have a
heap.

In any case, memory up to sbrk(0) is IN USE by something that doesn't
react well to being overwritten.  If you want to do that anyway,
expect trouble.

> I am trying to maximize the test coverage and limited the untouched 
> area that code and data area.

You could refrain from writing to the used memory by reading it and
comparing it with the program's image (for code and data only).

But I'd think that even this is an overkill: if a program runs, it
already means that the memory occupied by it is working.

> The stub program should know exactly where is this "program"
> located.

No, the stub doesn't know everything, it only knows where the code and
data are, because the stub is the agent that loads the code and data
into extended memory.  But the stack and the heap are set up by the
startup code in crt0.S and the subroutines it calls, so the stub
doesn't know about that.

> my program uses djgpp_map_physical_address() to map the phyiscal address
> I wanted, then use DPMI functions to access the memory. It gives me an
> impression that this method can let me access all the available physical
> memory. (Correct if I am wrong)

The null page is unreadable because the startup code unmaps it (for
catching NULL pointer dereferences).  If you try to access it, you
will crash, at least under CWSDPMI.

> I am now trying to find out where my program is located (even in linear 
> address) so that I can set up some overwrite protection by myself.

It is easier not to write there at all, than to invent some
write-protection machinery.

> Can I scan the memory downward from the address which is returned from 
> sbrk() to find out where is my code area and data area?

Use the symbols I listed above, it's much easier.