From: Shawn Hargreaves <Shawn AT talula DOT demon DOT co DOT uk>
Newsgroups: comp.os.msdos.djgpp
Subject: Re: Allegro & sprite stretching optimization
Date: Mon, 2 Jun 1997 22:38:35 +0100
Organization: None
Distribution: world
Message-ID: <nWvqRIAb1zkzEwAi@talula.demon.co.uk>
References: <01bc6d4e$720f1900$ec3e63c3 AT default> <EB25Mr DOT 1HG AT world DOT std DOT com>
NNTP-Posting-Host: talula.demon.co.uk
MIME-Version: 1.0
Lines: 55
To: djgpp AT delorie DOT com
DJ-Gateway: from newsgroup comp.os.msdos.djgpp
Precedence: bulk

Tom writes:
>Recent discussion made me think about Allegro's sprite-stretching. It
>seems to me that there is a major redundancy there in that multiple uses
>of the same stretch routine recompile the same thing over and over.

This is true. I did at one point think about trying to optimise this
case, but never got round to doing anything about it :-) But you are
right, it would be possible to get some dramatic speed improvements when
doing repeated stretches by identical amounts..

>Would it be possible to simply split do_stretch_blit to separate the
>stretch-compile functionality (make_stretcher and a lot of
>do_stretch_blit) and the functionality that uses it (_do_stretch and the
>rest of do_stretch_blit), so that a user desirous of speed can compile a
>stretcher into memory that they control and pass that to _do_stretch?

That would work, but I'm very wary of an API that exposes the internal
workings of the implementation like that. Designing an interface that is
dependent on this kind of implementation detail could cause no end of
problems in the long run, and would restrict the ways in which the
routine could be developed in the future. It makes me nervous :-)

IMHO a much better approach would be simply to make the stretch_blit()
code cache the last few (say 4) routines that it compiled, and reuse
them wherever it can. This could be added in do_stretch_blit() without
too much hassle (I think just after the clipping code but before the
first call to make_stretcher()), and would provide the speed improvement
without any API clutter. Of course there would still be a few obscure
situations where such a general implementation would fall down, but to
my way of thinking that is the price of writing generic library code. If
I can handle 99% of situations in an efficient way, I'm willing to
sacrifice the remaining 1% in exchange for a cleaner interface (and of
course the beauty of having source code available is that people with
really specialised requirements are able to customise the routines to
fit those needs...)

>But then I realized a much easier approach, that also doesn't require
>the user to guess, calculate, or overallocate the memory that's needed
>is to compile into _scratch_mem as now, and then copy that result into
>allocated memory.

It can be done even more simply than that, and there's no need for the
copy! At the start of the function, push the values of _scratch_mem and
_scratch_mem_size into some local variables, and reset _scratch_mem to
NULL and _scratch_mem_size to zero. Run the compiler function as normal,
and it will allocate some new space for the resulting routine. When it
is done, pop the stored _scratch_mem and _scratch_mem_size back into the
global variables, and return the new _scratch_mem buffer (that was
allocated by the compiler) to the caller. When they are done with it
they can just free() the memory, and all will be well...


--
Shawn Hargreaves - shawn AT talula DOT demon DOT co DOT uk - http://www.talula.demon.co.uk/
Beauty is a French phonetic corruption of a short cloth neck ornament.