Date: Mon, 9 Jan 1995 20:18:08 +0900 From: Stephen Turnbull To: djgpp AT sun DOT soe DOT clarkson DOT edu Subject: Searchable DJGPP archive (FAQ? ;) Two announcements concerning the Yaseppochi-gumi archive: (1) I am making Eli Zaretskii's beta FAQ available. I am adding a search capability for it. (2) I now believe that the *.stripped.gz files now have no loss of content. From my announcement: One caveat: to speed up searches, I have stripped duplicate headers generated by RMail and nuisance headers (such as "Reply-to:" and "Received:") from the archives. However, the reduction in size of the *.gz files is suspiciously large, [...] I was right. This has been fixed; the *.stripped.gz are now substantially larger in several cases (you can look at the DU-sorted file on my server---of course I used the "-s" option; before comes first). Some are smaller because I added "X400-[-A-Za-z]+:" to the list of nuisance headers. Some FAQs on the archive search (well, these are the *only* questions I've got so far, so they're the most F-lyAQs ;) That is probably just the Received headers. Your message came here as seen below, so you can see that you should expect some shrinking. I deleted the appended "Received:" headers, we all know what they look like---and if we don't, we don't want to. That's why I filter them. (Well, much more important, it substantially speeds up the greps.) As Bob Babcock (I think it was) pointed out, if stuff like "Received:" headers can make the *.gz files balloon (in one case, to 3 times the size!), gzip ain't on the job. In fact, when I filtered my own .sig, I was stripping large amounts of content from a couple of files. This is due to that fact that the "last-line-of-my-sig" regexp didn't catch some variant .sigs I use ;-) I don't use my .sig all the time (that's why whole files didn't disappear). Also, I just tried searching for "unsubscibe" and came up with way ^ | typo ;-) ---------------------------------+ to little text. I don't know why. These are my personal received-mail files---my left middle finger sits on the 'd' key just to filter "unsubscribe". I will eventually use the Clarkson archives, but for the moment I'm using my personal stuff as it's more easily available to me right at the moment (my Clarkson copy is about 4 months old and offline). When I *do* get the Clarkson archives, I will filter all messages with less than 4 lines content containing "subscribe", "add", or "delete" ;-) I hope there's nothing too mortifyingly personal or insulting in there ;-) --Steve