From: buers AT gmx DOT de (Dieter Buerssner) Newsgroups: comp.os.msdos.djgpp Subject: Re: Assembly code doesn't work properly. Date: 3 May 2000 15:05:54 GMT Lines: 109 Message-ID: <8epma2.3vvq7at.0@buerssner-17104.user.cis.dfn.de> References: <39103287 DOT 92B19ACF AT htsol DOT com> NNTP-Posting-Host: pec-145-106.tnt10.s2.uunet.de (149.225.145.106) Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8bit X-Trace: fu-berlin.de 957366354 10004382 149.225.145.106 (16 [17104]) X-Posting-Agent: Hamster/1.3.13.0 User-Agent: Xnews/03.02.04 To: djgpp AT delorie DOT com DJ-Gateway: from newsgroup comp.os.msdos.djgpp Reply-To: djgpp AT delorie DOT com Yoram Hofman wrote: >Assembly code doesn't work properly. >I need to transfer an image from frame memory controller (far physical >address - "src" in my program) to DRAM. >To optimize the transfer I wrote simple assembly code (my first) >The problem is that it works well in little application. But when I put >this module (.c file as is) to our project it stops to work. This is not too surprising. It will depend on many factors, whether your assembly works or not. It looks, like you just edited the gcc -S output and converged this into inline assembly. This way, there are some assumptions in your inline code, that depend i.e. on optimization options. >int Read_image_directly(unsigned long src, unsigned char * dest) >{ > unsigned long dest_index = 0; > unsigned long src_index = src; > src_index += IMAGE_START; > src_index += 1L; > > _farsetsel(mem_sel); > >/* this is original code I want to optimize > while( dest_index < IMAGE_SIZE ) //IMAGE_SIZE = 154560 bytes > { > *(dest + dest_index) = _farnspeekb( src_index ); > dest_index++; > src_index = src_index + 4; > } >*/ You may be able to speed this up in C. /* Untested code */ int Read_image_directly(unsigned long src, unsigned char * dest) { unsigned long src_index, n; _farsetsel(mem_sel); src_index = src + IMAGE_START + 1L; n = IMAGE_SIZE; /* assume IMAGE_SIZE > 0 */ do { *dest++ = _farnspeekb( src_index ); src_index += 4; } while (--n != 0); return 1; } If this is still too slow, you may try the gcc switch -funroll-loops or do manual loop-unrolling like (assuming IMAGE_SIZE%2==0) n = IMAGE_SIZE/2; do { dest[0] = _farnspeekb( src_index ); src_index += 4; dest[1] = _farnspeekb( src_index ); scr_index += 4; dest += 2; } while (--n != 0); >/* my assembly */ > asm("m_loop: > cmpl $154559, -4(%ebp) > jle m_code > jmp m_end > m_code: > movl 12(%ebp),%eax > movl -4(%ebp),%edx This tries to put dest_index into edx. When gcc decides, that dest_index can be kept in a register, it won't allocate space for it on the stack, an this will fail. Also, when you use -fomit-frame-pointer, this won't work. And the whole loop looks entirely inefficient, just as it was the output of gcc -S and without -O. I think the C code should be faster when compiled with -O. If you really do need inline assembly, the FAQ will have quite a few pointers to documentation. But I doubt, that you can get much faster with inline assembly. >One more question for what ".byte 0x64" do I need? This is for segment overwrite. .byte 0x64 movb (%edx),%cl is the same as movb %fs:(%edx), %cl At least some versions of gas would produce (sometimes) wrong upcodes with the latter line. So people got used in hardcoding the segment overwrite with .byte. One last suggestion (with some speculation). I guess, when compiling your posted code with gcc -Wall -O, it will produce some warnings about unused variables. This would suggest, that gcc didn't allocate space for those variables at all, and may give a hint, that accessing them via the frame pointer will be wrong. -- Regards, Dieter