| www.delorie.com/archives/browse.cgi | search |
| X-Recipient: | archive-cygwin AT delorie DOT com |
| DomainKey-Signature: | a=rsa-sha1; c=nofws; d=sourceware.org; h=list-id |
| :list-unsubscribe:list-subscribe:list-archive:list-post | |
| :list-help:sender:message-id:date:from:mime-version:to:subject | |
| :references:in-reply-to:content-type:content-transfer-encoding; | |
| q=dns; s=default; b=xZ2BFmbJsNCopIrj/ek2UN+j/sDXHHIH+iNFt6zZytK | |
| uIm5G3Ty7uiGeG65IOX6wPV9Ezd/AgJla/9cgfF9BykDbZa63khvJxftWLJbkw1U | |
| UIMEFuncQ76dkgn2cmx40W8VuRB+vZldweH51w5MhqXAMxkhBdTKYbXPK3zSdZ7s | |
| = | |
| DKIM-Signature: | v=1; a=rsa-sha1; c=relaxed; d=sourceware.org; h=list-id |
| :list-unsubscribe:list-subscribe:list-archive:list-post | |
| :list-help:sender:message-id:date:from:mime-version:to:subject | |
| :references:in-reply-to:content-type:content-transfer-encoding; | |
| s=default; bh=S/SaV3rt7G5aovmts8XWZKCJznE=; b=nFYtgyo8j4C3onrfl | |
| VKqZMqBGDYimkkCatHagd8XTydQNMhMrLWusQE8wrf0Ws4gTvK8Vo9B+g2yFrpEQ | |
| DZWF4MMO/hyIUDiDhTNK86owC27wFwr+BVv5je8ULamdhMwSvxCXccvW9q1T3aoG | |
| OF6Gn4pJAX43U6IqLVVMJ4Ayz4= | |
| Mailing-List: | contact cygwin-help AT cygwin DOT com; run by ezmlm |
| List-Id: | <cygwin.cygwin.com> |
| List-Subscribe: | <mailto:cygwin-subscribe AT cygwin DOT com> |
| List-Archive: | <http://sourceware.org/ml/cygwin/> |
| List-Post: | <mailto:cygwin AT cygwin DOT com> |
| List-Help: | <mailto:cygwin-help AT cygwin DOT com>, <http://sourceware.org/ml/#faqs> |
| Sender: | cygwin-owner AT cygwin DOT com |
| Mail-Followup-To: | cygwin AT cygwin DOT com |
| Delivered-To: | mailing list cygwin AT cygwin DOT com |
| Authentication-Results: | sourceware.org; auth=none |
| X-Virus-Found: | No |
| X-Spam-SWARE-Status: | No, score=1.6 required=5.0 tests=BAYES_50,RDNS_NONE,SPF_PASS,URIBL_BLOCKED autolearn=no version=3.3.2 |
| X-HELO: | na01-bl2-obe.outbound.protection.outlook.com |
| Message-ID: | <52844B2E.5050902@coverity.com> |
| Date: | Wed, 13 Nov 2013 23:01:50 -0500 |
| From: | Tom Honermann <thonermann AT coverity DOT com> |
| User-Agent: | Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.0 |
| MIME-Version: | 1.0 |
| To: | <cygwin AT cygwin DOT com> |
| Subject: | Re: Intermittent failures retrieving process exit codes |
| References: | <50C2498C DOT 2000003 AT coverity DOT com> <50C276AC DOT 9090301 AT mailme DOT ath DOT cx> <50D401EF DOT 9040705 AT coverity DOT com> |
| In-Reply-To: | <50D401EF.9040705@coverity.com> |
| X-ClientProxiedBy: | BN1PR04CA011.namprd04.prod.outlook.com (10.141.56.11) To BLUPR05MB450.namprd05.prod.outlook.com (10.141.28.19) |
| X-Forefront-PRVS: | 0030839EEE |
| X-Forefront-Antispam-Report: | SFV:NSPM;SFS:(51704005)(52314003)(24454002)(479174003)(199002)(189002)(377454003)(63696002)(76482001)(31966008)(59896001)(51856001)(80316001)(53806001)(19580395003)(42186004)(77982001)(85306002)(59766001)(76786001)(76796001)(56816003)(77096001)(54356001)(46102001)(65956001)(66066001)(81686001)(47776003)(80976001)(15975445006)(64126003)(74502001)(74662001)(15202345003)(56776001)(81816001)(65806001)(80022001)(79102001)(83322001)(83506001)(23756003)(47446002)(54316002)(4396001)(50466002)(33656001)(50986001)(74706001)(47976001)(81342001)(74876001)(81542001)(47736001)(49866001)(83072001)(69226001)(74366001)(36756003)(87976001)(460985004)(2480315003)(134885004);DIR:OUT;SFP:;SCL:1;SRVR:BLUPR05MB450;H:[192.168.1.16];CLIP:96.253.80.174;FPR:;RD:InfoNoRecords;A:1;MX:1;LANG:en; |
| X-OriginatorOrg: | coverity.com |
| X-IsSubscribed: | yes |
On 12/21/2012 01:30 AM, Tom Honermann wrote:
> I spent most of the week debugging this issue. This appears to be a
> defect in Windows. I can reproduce the issue without Cygwin. I can't
> rule out other third party kernel mode software possibly contributing to
> the issue. A simple change to Cygwin works around the problem for me.
>
> I don't know which Windows releases are affected by this. I've only
> reproduced the problem (outside of Cygwin) with Wow64 processes running
> on 64-bit Windows 7. I haven't yet tried elsewhere.
>
> The problem appears to be a race condition involving concurrent calls to
> TerminateProcess() and ExitThread(). The example code below minimally
> mimics the threads created and exit process/thread calls that are
> performed when running Cygwin's false.exe. The primary thread exits the
> process via TerminateProcess() ala pinfo::exit() in
> winsup/cygwin/pinfo.cc. The secondary thread exits itself via
> ExitThread() ala Cygwin's signal processing thread function, wait_sig(),
> in winsup/cygwin/sigproc.cc.
>
> When the race condition results in the undesirable outcome, the exit
> code for the process is set to the exit code for the secondary thread's
> call to ExitThread(). I can only speculate at this point, but my guess
> is that the TerminateProcess() code disassociates the calling thread
> from the process before other threads are stopped such that
> ExitThread(), concurrently running in another thread, may determine that
> the calling thread is the last thread of the process and overwrite the
> process exit code.
>
> The issue also reproduces if ExitProcess() is called in place of
> TerminateProcess(). The test case below only uses TerminateProcess()
> because that is what Cygwin does.
>
> Source code to reproduce the issue follows. Again, Cygwin is not
> required to reproduce the problem. For my own testing, I compiled the
> code using Microsoft's Visual Studio 2010 x86 compiler with the command
> 'cl /Fetest-exit-code.exe test-exit-code.cpp'
>
> test-exit-code.cpp:
>
> #include <windows.h>
> #include <stdio.h>
> #include <stdlib.h>
>
> DWORD WINAPI SecondaryThread(
> LPVOID lpParameter)
> {
> Sleep(1);
> ExitThread(2);
> }
>
> int main() {
> HANDLE hSecondaryThread = CreateThread(
> NULL, // lpThreadAttributes
> 0, // dwStackSize
> SecondaryThread, // lpStartAddress
> (LPVOID)0, // lpParameter
> 0, // dwCreationFlags
> NULL); // lpThreadId
> if (!hSecondaryThread) {
> fprintf(stderr, "CreateThread failed. GLE=%lu\n",
> (unsigned long)GetLastError());
> exit(127);
> }
>
> Sleep(1);
>
> if (!TerminateProcess(GetCurrentProcess(), 1)) {
> fprintf(stderr, "TerminateProcess failed. GLE=%lu\n",
> (unsigned long)GetLastError());
> exit(127);
> }
>
> return 0;
> }
>
>
> To run the test, a simple .bat file is used:
>
> test.bat:
>
> @echo off
> setlocal
>
> :loop
> echo test...
> test-exit-code.exe
> if %ERRORLEVEL% NEQ 1 (
> echo test-exit-code.exe returned %ERRORLEVEL%
> exit /B 1
> )
> goto loop
>
>
> test.bat should run indefinitely. The amount of time it takes to fail
> on my machine (64-bit Windows 7 running in a VMware Workstation 8 VM
> under Kubuntu 12.04 on a Lenovo T420 Intel i7-2640M 2 processor laptop)
> varies considerably. I had one run fail in less than 10 iterations, but
> most of the time it has taken upwards of 5 minutes to get a failure.
>
> The workaround I implemented within Cygwin was simple and sloppy. I
> added a call to Sleep(1000) immediately before the call to ExitThread()
> in wait_sig() in winsup/cygwin/sigproc.cc. Since this thread (probably)
> doesn't exit until the process is exiting anyway, the call to Sleep()
> does not adversely affect shutdown. The thread just gets terminated
> while in the call to Sleep() instead of exiting before the process is
> terminated or getting terminated while still in the call to
> ExitThread(). A better solution might be to avoid the thread exiting at
> all (so long as it can't get terminated while holding critical
> resources), or to have the process exiting thread wait on it. Neither
> of these is ideal. Orderly shutdown of multi-threaded processes is
> really hard to do correctly on Windows.
>
> Since the exit code for the signal processing thread is not used, having
> the wait_sig() thread (and any other threads that could potentially
> concurrently exit with another thread) exit with a special status value
> such as STATUS_THREAD_IS_TERMINATING (0xC000004BL) would enable
> diagnosis of this issue as any process exit code matching this would be
> a likely indicator that this issue was encountered.
>
> As is, when this race condition results in the undesirable outcome,
> since the signal processing thread exits with a status of 0, the exit
> status of the process is 0. This explains why false.exe works so well
> to reproduce the issue. It would be impossible to produce a negative
> test using true.exe.
>
> Tom.
Time passes...
I worked with some former colleagues to report this issue to Microsoft.
Windows 8.1 and Windows Server 2012 R2 contain a fix that addresses
the test case above. A hotfix has been made available for Windows 7 SP1
and Windows Server 2008 R2. Should anyone desire a hotfix for other
versions of Windows, it will be necessary to open a case with Microsoft
to request it.
http://support.microsoft.com/kb/2875501
Tom.
--
Problem reports: http://cygwin.com/problems.html
FAQ: http://cygwin.com/faq/
Documentation: http://cygwin.com/docs.html
Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
| webmaster | delorie software privacy |
| Copyright © 2019 by DJ Delorie | Updated Jul 2019 |