Mailing-List: contact cygwin-apps-help AT sourceware DOT cygnus DOT com; run by ezmlm Sender: cygwin-apps-owner AT sourceware DOT cygnus DOT com List-Subscribe: List-Archive: List-Post: List-Help: , Delivered-To: mailing list cygwin-apps AT sources DOT redhat DOT com Message-ID: <044401c14d40$8b5d2e70$01000001@lifelesswks> From: "Robert Collins" To: References: <3BBD05EB DOT 2357D53A AT etr-usa DOT com> <20011004212030 DOT C1118 AT redhat DOT com> Subject: Re: File handling in setup.exe Date: Fri, 5 Oct 2001 11:51:03 +1000 MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 5.50.4133.2400 X-MimeOLE: Produced By Microsoft MimeOLE V5.50.4133.2400 ----- Original Message ----- From: "Christopher Faylor" To: Cc: Sent: Friday, October 05, 2001 11:20 AM Subject: Re: File handling in setup.exe > FWIW, I really like what you've proposed. It feels right. Ditto. > Although, I guess we should wait for a little more input first. I'm got some - inline below. > >This implies some kind of link between archive handling and the current > >NetIO hierarchy. This would also require changes to geturl.cc and the > >code that calls functions in geturl.cc. The foremost issue is, should I > >be chasing this at all, or should I simply refactor the tar handling > >mechanism as it exists right now? I think that refactoring the tar handling is really just bit twidling. IMO bringing it all together, and _then_ handling the magic number issue can be done cleanly. > >I assume that reading packages from the network would be useful for > >allowing setup.exe to install directly from the network, without writing > >the packages out to disk first as it does today. Yet, we need to keep > >that "caching" mechanism somehow, because it's useful. Currently, file > >handling logic exists in geturl.cc, nio-file.cc, tar.cc, and probably > >other places. To deal with all that, I have in mind something like > >this: > > > >class Source { > >public: > > Source(out_pathname); > > virtual int read(buffer, size); > > virtual int write(buffer, size); > > > > ... > >private: > > Source() { } // can't create Source objects directly > > > > FILE* fp_out; > >}; > > > >class HTTPSource : public Source { > >public: > > HTTPSource(in_url, out_pathname = 0); > > ... > >}; > > All good... > >By default, Source reads data from a file and has the option to cache > >the data it reads out to another file. (If out_pathname == 0, the data > >isn't cached to a file as it's read.) Subclasses override the > >constructor and read() to retrieve data from various network sources. > >(HTTP, FTP, WinInet.dll, etc.) When reading straight from a file, you > >would set the Source to non-cacheable, but when reading via HTTP, you > >could elect to either cache the data to a file, or simply read the data > >in without caching it. > > > >This implies a fairly major refactoring all by itself. As I stated > >above, there's a lot of code that assumes that it can write data out to > >disk and read it back. My proposal would mean that everything deals > >with Source objects. Because the data may not be cached, you'd want to > >keep the data pipeline simple: in the HTTP case, you'd read the data > >from the network, pass it to the gz/bz unpacker, and pass that stream to > >the tar file unpacker. That is, go from initial network connection open > >to final unpacking, all in one operation. Here's the bit I want to comment on: I think this got missed from the prior discussion: (If it didn't, and is simply wrong/not logical, feel free to say so!). Let me restate what you've said to be sure I understand you correctly: You're proposing something like read from Source write to Decomp Read from Decomp write to Archive while nextfilename() read from archive write to filename wend (sure this could be written as foo = new source (...) bar = new decomp (foo) new archive (bar) ) but thats a presentation thing, not really important. I don't like this, because each of the three classes all perform read and write. (and Archive is the only one of them is able to generate multiple streams - as it should be :]). I propose the following modification to your class hierarchy. Class Stream { public: /* create a new stream from an existing one - used to get decompressed data * or open archives. * will return NULL if there is no sub-stream available (ie (peek() didn't * match any known magic number) && nextfilename () = NULL */ static Stream * factory (Stream *); /* read data (duh!) */ virtual ssize_t read(void *buffer, size_t len); /* provide data to (double duh!) */ virtual ssize_t write(void *buffer, size_t len); /* read data without removing it from the class's internal buffer */ virtual ssize_t peek(void *buffer, size_t len); /* Find out the next stream name - * ie for foo.tar.gz, at offset 0, next_file_name = foo.tar * for foobar that is an archive, next_file_name is the next extractable filename. */ virtual const char* next_file_name() = NULL; }; So Source becomes: class Source : Stream { public: Source(out_pathname); ... and likewise for Archive and Decomp. This minor change will immediately allow archives-within-archives, double-compressed-files, and whathaveyou - without hacing to code to handle that. Rob