Date: Mon, 7 Nov 1994 10:18:42 +0100 From: Kim Jersin To: djgpp AT sun DOT soe DOT clarkson DOT edu Cc: u940422 AT daimi DOT aau DOT dk Subject: Fast disk I/O in DJGPP programs. To all in need of reading large amounts of data... fast! Last week there was some mail about poor disk I/O performance under the go32 dos extender. The conclusion to the problem was that go32 needs to switch to real mode when it needs data from any DOS file and/or device and it has to do that a lot of times for large data transfer processes due to a small transfer buffer. That is not only a problem for go32, but for any dos extender. The task swithcing process takes some time to accomplish, because all registers has to be saved, reloaded with real mode values, a new stackframe has to be set up, etc. and all this has to be reversed on return. I have come up with a solution that minimizes the problem (please take a look at the two benchmark tests below), at least on configurations with plenty of RAM at hand (8Mb or more). I hope that some of you will take the time and do some testing, especially the one who made the benchmarks tests under different dos extendes (sorry I can''t rember your name, my recieved mailboks with all my old mail has wanished - novice on Unix platform). The solution that I have sketched out, in the source code following the benchmark tests, uses a small TSR (gpphlp.asm) wich must be installed prior to running go32 and also before starting Windows if you prefere to run your programs in a windows dos box (wich actually gives better performance to the go32 extended program, when using with 32bit disk and file access turned on, than when run from dos using the QEMM DPMI manager). The TSR does as much continues reading as the availible dos memory allows (up to aprox 448Kb under windows) in 32Kb blocks, and the processor remains in real mode for the whole duration (all 448Kb). The control then goes back to the extended program wich copies the dos memory contents into whatever buffer was requested by the calling function. The TSR is then called again for the next chunck of data etc... until the number of requested bytes has been read. The entire control lays within the contens of the high level function HugeRead(). Thus error checking etc. is a troublesome thing to do in assembler. And the goal is also to reduce the memory requirements by the TSR to a minimum. At the moment this is 448 bytes wich will increase by some few hundred bytes when fully equiped with both read and write. But on the other hand the TSR is currently held in an .EXE file. If coded into a .BIN or .SYS file and loaded in the config.sys file the size can be reduced by some 200 bytes. There is a small catch: It only runs on platforms where DPMI sevices are available (most memory managment utilities does fortunately this these days). That's because dos memory allocation/free is only done when needed and from the extended program. I use the term sketch when talking about the two programs. This means that what is done is done and nothing more for the moment (eg. you can read data but not write) and it should work as expected, but of course there is no warranty and at your own risk. I also hope that it could lead to a discussion of perhaps incorporating it into future realaeses of the DJGPP package or more generally having a small real mode based helper TSR to help minimize task switching when using real mode facilities. It would be to fun finish it if I had the time. If you have the time to wait till the end of the year (I have some really serious study comming up at the Univesity) and if there is an interrest for some of the ideas mentioned here then I would be happy to round of the cornes, eg. the method of using a fixed interrupt is very fast to code but bounches if someone else also uses it. Perhaps writing a specialized streambuf for use in the iostream libraray etc. you name it. Comments please.. --- Kim Jersin. Benchmarks: ----------- The machine used is an i386/387 33mhz clone with 8Mb RAM and a 405Mb Conner AT-bus hard disk. The reading was done on a single continues file. Norton sysinfo states aprox 1150 Kbytes/Sec in continues read. So the 90Kb (Windows) and 200Kb (QEMM DPMI) the two examples read less a second is proberbly due to the time it takes copying from DOS (real mode) memory to go32 application memory. Both benchtest was done from a fresh booted machine to prevent any chached data (especially the windows 32bit disk and file access chaching) from influence on the results. Both test was done on the same configuration (same autoexec.bat, config.sys files), eg. the QEMM DPMI manager was also present when windows was started (but properbly taken over by windows). The two tables is redirected output as i would appear on screen. In WFW 3.11 dos box: -------------------- Trying to read 2097152 bytes Allocate 0xFFFF: return: 8 Paragraphs available: 30611 Paragraphs to allocate: 28672 Reading 458752 bytes, copying to memory above 1Mb. Reading 458752 bytes, copying to memory above 1Mb. Reading 458752 bytes, copying to memory above 1Mb. Reading 458752 bytes, copying to memory above 1Mb. Reading 262144 bytes, copying to memory above 1Mb. Free dos mem return: 0 Bytes read: 2097152 Time elapsed: 1.92258 KBytes pr. second: 1065.23 QEMM DPMI manager: ------------------ Trying to read 2146304 bytes Allocate 0xFFFF: return: 8 Paragraphs available: 24786 Paragraphs to allocate: 24576 Reading 393216 bytes, copying to memory above 1Mb. Reading 393216 bytes, copying to memory above 1Mb. Reading 393216 bytes, copying to memory above 1Mb. Reading 393216 bytes, copying to memory above 1Mb. Reading 393216 bytes, copying to memory above 1Mb. Reading 180224 bytes, copying to memory above 1Mb. Free dos mem return: 0 Bytes read: 2146304 Time elapsed: 2.19724 KBytes pr. second: 953.924 The C++ high level routines: ---------------------------- When compiling please be sure to include the iostream library (libiostr.a). This code is where the must improvement of usability is to be done. I hope that it is written in such a way that it is possible to pick out the essense of it if the need is to use it (rewrite). #include #include #include #include #include #include #include #include #include #include #include #include int HugeRead( int fHandle, void *Buf, int Count ) { // Allocate dos memory to be used as transfer buffer. // The size is the largest continuing buffer divisible by 32kb. // -- 32kb is the block size GppHlp uses for reading. _go32_dpmi_seginfo info; memset( &info, 0, sizeof(info) ); info.size= 0xFFFF; // Try to allocate more than is available cout << "Allocate 0xFFFF:\n"; cout << " return: " << _go32_dpmi_allocate_dos_memory(&info) << "\n"; cout << " Paragraphs available: " << info.size << "\n"; info.size= (info.size/0x800)*0x800; // Paragraphs to allocate int BufSize= info.size<<4; // Size more convienent cout << "\nParagraphs to allocate: " << info.size << "\n"; int SizeRead= 0; if( info.size!=0 && _go32_dpmi_allocate_dos_memory(&info)==0 ) { _go32_dpmi_registers r; memset(&r, 0, sizeof(r)); r.x.bx= (u_short) fHandle; r.x.dx= 0; r.x.ds= info.rm_segment; // Read as much from the file as DOS memory allows, and copy it to // the destination buffer, in each loop. r.x.ax= 1; for(; ((int)r.d.eax)>0 && Count>0; Count-=r.d.eax ) { if( BufSize>Count ) r.d.ecx= (u_long) Count; else r.d.ecx= (u_long) BufSize; cout << "Reading " << r.d.ecx << " bytes"; r.x.ax= 0x213F; _go32_dpmi_simulate_int(0x65, &r); cout << ", copying to memory above 1Mb. \n"; if( r.d.eax> 0 ) { // Copy the read data from DOS mem to the buffer dosmemget(((u_long)info.rm_segment)<<4, r.d.eax, Buf); Buf+= r.d.eax; SizeRead+= r.d.eax; } } // Cleanup cout << "Free dos mem return: " << _go32_dpmi_free_dos_memory(&info) << "\n"; } return SizeRead; } int TestHugeRead() { int BytesRead= 0; int data; data= open( "data", O_RDONLY ); if( data ) { int BufSize; cout << "Number of Kbytes to read: "; cin >> BufSize; BufSize= BufSize<<10; cout << "\nTrying to read " << BufSize << " bytes\n"; char *Buf= (char*) malloc(BufSize); if( Buf ) { clock_t EndClock, StartClock= clock(); if( (BytesRead= HugeRead(data, Buf, BufSize))>=0 ) { // Some statistics EndClock= clock(); double TimeElapsed= ((double)EndClock- (double)StartClock)/CLOCKS_PER_SEC; cout << "Bytes read: " << BytesRead << "\n"; cout << "Time elapsed: " << TimeElapsed << "\n"; cout << "KBytes pr. second: " << BytesRead/1024/TimeElapsed << "\n"; } else cerr << "Error reading \n"; free(Buf); } else cerr << "Error allocation buffer\n"; close(data); } else cerr << "Error opening \n"; return BytesRead; } const char GppHlpStr[]= "GPPHLP"; int main() { int Res= 0; // Make sure the GppHlp int vector has been set _go32_dpmi_seginfo iv; _go32_dpmi_get_real_mode_interrupt_vector(0x65, &iv); if( iv.rm_segment!= 0 || iv.rm_segment!= 0) { // Check for the existens of the GppHlp driver char CheckStr[7]; _go32_dpmi_registers r; memset(&r, 0, sizeof(r)); r.x.ax= 0x6500; _go32_dpmi_simulate_int(0x65, &r); if( r.x.ax== 0x0065 ) { dosmemget( r.d.ebx, sizeof(CheckStr), CheckStr ); if( strcmp(GppHlpStr, CheckStr)== 0 ) TestHugeRead(); else Res= 3; } else Res= 2; // if( Res!= 0 ) { cerr << "Error: GppHlp was not installed on INT 0x65.\n"; cerr << " But something else was.....\n"; } } else { cerr << "Error: The GppHlp interrupt vector was not set (0000:0000)\n"; Res= 1; } return Res; } The assembler TSR: ------------------ You need borlands TASM assembler to assemble. Please allow the assembler to do multipass (the /M# switch) assembling to be able to resolve the conditional jumps going furter than 127 bytes. Remember to install (excute) before running the C++ program or before running windows. IDEAL P386N ; Allow the use of 386 instructions JUMPS ; Resolove conditional jumps going further than ; 127 bytes ; Defines INTHANDLE = 65h ; The interrupt used for HLP<=>GPP communication WRONGINT = 1 ; Error code returned on error when int is used SEGMENT DSEG WORD 'DATA' ; Importened values MinMem DW ? ENDS DSEG SEGMENT INTSEG PARA 'CODE' ASSUME cs:INTSEG ASSUME ds:NOTHING,es:NOTHING GppHlpStr DB "GPPHLP",0 ; ; Entry point of the interrupt service ; ------------------------------------ ; Available services: ; AX= 6500h - Existens check ; Call this function to check if the interrupt is installed by ; this program. First check the contens of AX and if ok then ; compare the string pointed to by EBX. ; Return: AL= 65h ; AH= 0 ; EBX= Linear address of "GPPHLP" string. ; AX= 213Fh - Huge read from file or device ; Like dos function 3Fh, except that it is able to read into a ; huge buffer (larger than 64Kb-1). ; BX= File handler ; ECX= Number of bytes to read ; DS:DX= Linear address of buffer ; Return: EAX= Number of bytes read or error code if the value ; is negative (take the ABS() and you have the ; dos error code as returned by int 21h AH=3Fh). ; Destroyed: EDI ESI, the rest is preserved. ; PROC RealModeHlp FAR ; Make primary function selection cmp ah,21h je @@Dos cmp ah,65h je @@GppHlp jmp @@Chain @@GppHlp: ;-- GppHlp specific functions -- cmp al,0 jne @@Chain ; Return information that says "alive and well". xchg al,ah xor ebx,ebx ; Clear mov bx,cs shl ebx,4 ; Calculate the add ebx,OFFSET GppHlpStr ; linear address jmp @@End @@Dos: ;-- Dos functions extended by this TSR -- push OFFSET @@End ; Inforce a near return frame to the ; "one point out" exit. cmp al,3Fh je Dos3F ;-- Add additional dos extensions here -- ; Not a valid dos extension - ; remove the not needed return address and exit this TSR add sp,2 jmp @@Chain @@Chain: ; If coded correctly this would include a call (or jump) to the next ; int handler in the chain (the one installed prior to this TSR). iret @@End: ; We inforce all returns from valid functions to go through this label, ; so that any generel cleanup can be done at this point. iret ENDP RealModeHlp ; ; Huge read from file or device, using a handler: ; ----------------------------------------------- PROC Dos3F NEAR ASSUME ds:NOTHING,es:NOTHING push DS ES xor eax,eax ; We havent read any bytes yet or ecx,ecx jz @@End @@Read: push eax ecx push bx dx ds ; We don't rely on DOS preserving these cmp ecx,8000h jb @@Less32Kb mov cx,8000h @@Less32Kb: mov ah,3Fh int 21h movzx edi,ax ; Store the read result pop ds dx bx pop ecx eax jc @@Error or di,di jz @@End ; If nothing read than return add eax,edi ; The new read total sub ecx,edi ; decreament the counter jz @@End ; ..and return if it reaches zero mov di,ds add di,800h mov ds,di ; Move the buffer pointer (DS:DX) 32Kb jmp @@Read @@Error: xor eax,eax sub eax,edi ; The error return code @@End: pop es ds ret @@ToEOF: add ecx,8000h jmp @@Read ENDP Dos3F ENDS INTSEG ; ************************************************************************** ; The segments past this point won't be valid after the program goes resident. ; Eg. All the instalation code and the stack is thrown away and only the parts ; needed to execute the GPP requests will remain in memory. ; SEGMENT CSEG PARA 'CODE' ASSUME cs:CSEG ASSUME ds:DSEG,es:NOTHING start: mov ax,DSEG mov ds,ax ; Initialize the the data segment mov [PSP],es ; The PSP address ; Calculate how much memory is used by this program mov ax,es ; Start of the program mov bx,CSEG ; End of last segment used sub bx,ax inc bx mov [MinMem],bx ; Memory needed, in paragraphs ; Release memory not used by this program mov es,[es:2Ch] ; Segment of environment string mov ah,49h int 21h ; Make sure the interrupt handler we want to use ain't taken allready mov ah,35h mov al,INTHANDLE int 21h mov ax,es or ax,bx jnz @@IntError ; Install our interrupt handler ; -- Please notice that no chaining is done, this should be improved ; -- if you plan to use this facility in the future. ; -- This is NOT the right way. The services should be installed ; -- using the Multiplex interrupt (INT 2Fh). push ds mov dx,INTSEG mov ds,dx mov dx,OFFSET RealModeHlp mov ah,25h mov al,INTHANDLE int 21h pop ds ; Terminat but stay resident mov ax,3100h ; TSR command and 0 as return code mov dx,[MinMem] int 21h @@IntErrorMsg: DB "Error installing GPPHLP.EXE.",13,10 DB "The needed interrupt is used by someone else...",13,10,'$' @@IntError: ; Display error message and return to dos push ds mov ax,cs mov ds,ax mov dx,OFFSET @@IntErrorMsg mov ah,9 int 21h pop ds @@Exit: mov ah,4Ch mov al,WRONGINT int 21h ENDS CSEG SEGMENT SSEG PARA STACK 'STACK' ; 256 bytes stack should be enough. ; Notice that the stack is only used during initialization. The stack ; that is used when called from a GPP program is supplied by the ; dos extender so this stack gets obsolute as soon as the program ; goes resident. DW 50h DUP (?) ENDS SSEG END start ---