X-Recipient: archive-cygwin AT delorie DOT com X-SWARE-Spam-Status: No, hits=0.8 required=5.0 tests=BAYES_50 X-Spam-Check-By: sourceware.org X-MDAV-Processed: mail1.multiplay.co.uk, Fri, 10 Dec 2010 22:30:20 +0000 X-Spam-Processed: mail1.multiplay.co.uk, Fri, 10 Dec 2010 22:30:20 +0000 X-MDRemoteIP: 188.220.16.49 X-Return-Path: prvs=19600282c8=killing AT multiplay DOT co DOT uk X-Envelope-From: killing AT multiplay DOT co DOT uk X-MDaemon-Deliver-To: cygwin AT cygwin DOT com Message-ID: <3D3D7FA2B44B477A8342F96F72AE1BE7@multiplay.co.uk> From: "Steven Hartland" To: References: <4D026815 DOT 4070606 AT gmx DOT de> <20101210182652 DOT GA27615 AT ednor DOT casa DOT cgf DOT cx> Subject: Re: 1.7.7: rm -rf sometimes fails - race condition? Date: Fri, 10 Dec 2010 22:30:36 -0000 MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset="iso-8859-1"; reply-type=original Content-Transfer-Encoding: 7bit X-IsSubscribed: yes Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner AT cygwin DOT com Mail-Followup-To: cygwin AT cygwin DOT com Delivered-To: mailing list cygwin AT cygwin DOT com ----- Original Message ----- From: "Christopher Faylor" >>This looks like either a premature return from a syscall or libcall, or like a >>genuine race in the system. >> >>Has anyone seen similar things? > > Yes and you seem to have nailed the problem - it happens when a virus checker > hooks into a syscall and allows it to return before completion. I don't think > we want to modify Cygwin to not trust success return values from system calls. Is this the age old delete on close raising its ugly head again? So the rm kicks in a file is shared locked, rm uses the cygwin unlink code which "schedules" the file for deletion and returns success without actually succeeding, hence when it comes to delete the parent dir it fails as the file actually still exists. Finally figured this is the cause of unlink in perl returning success when the file still existed, I was like WTF!! A shared resource file locked by another process in our case, and this behaviour lead to many hours of head scratching and large amounts of workaround code. Personally I think the only solution is to remove this delete on close code and fail hard for shared locked files, as it gives a much more predictable code flow. Having unlink return success but the file not being deleted before return is confusing as hell :( Regards Steve -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple