Mailing-List: contact cygwin-developers-help AT cygwin DOT com; run by ezmlm List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-developers-owner AT cygwin DOT com Delivered-To: mailing list cygwin-developers AT cygwin DOT com Date: Thu, 20 Feb 2003 15:15:39 +0100 From: Corinna Vinschen To: Cygwin-Developers Subject: Re: Threaded socket hang in 1.3.20 Message-ID: <20030220141539.GE2467@cygbert.vinschen.de> Reply-To: cygwin-developers AT cygwin DOT com Mail-Followup-To: Cygwin-Developers References: <20030218222746 DOT GD2404 AT tishler DOT net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20030218222746.GD2404@tishler.net> User-Agent: Mutt/1.4i On Tue, Feb 18, 2003 at 05:27:47PM -0500, Jason Tishler wrote: > The attached C++ testcase demonstrates the problem. In 1.3.20-1, the > program hangs in the call to socket() in the second thread: > > Creating thread for fn1 > fn1 begin > fn1: calling accept()... > Creating thread for fn2 > fn2 begin > fn2: calling socket()... > > I'm not sure why connect() fails, because a "telnet localhost 54321" > works just fine. I'm probably demonstrating my sockets ignorance. I looked into this problem and it turns out to be a non-socket specific problem but instead a deadlock problem in cygheap: When accept is called, it creates a new file descriptor by calling cygheap_fdnew res_fd; before calling winsock's accept(). This in turn creates an exclusive lock in cygheap_fdnew(): cygheap_fdnew (int seed_fd = -1, bool lockit = true) { if (lockit) SetResourceLock (LOCK_FD_LIST, WRITE_LOCK | READ_LOCK, "cygheap_fdnew"); [...] which is not unlocked as long as the function isn't left. Since accept hangs until a connection is actually made (on blocking sockets), the lock persists. The next socket() call also creates a new file descriptor the same way. Since the above lock still applies, this time the creation of the file descriptor hangs in the call to SetResourceLock(). Looking through our sources, I found some places where cygheap_fdnew could possible cause a hang or where the return value isn't tested or where the lock is unnecessary long due to calling cygheap_fdnew too early. I've cleaned that up a bit and commited the changes. Now back to the test case. With these changes the socket() call doesn't hang but now connect() is in trouble. It hangs for a while until it returns with error 116, Connection timeout. I must admit, that I didn't find the cause so far. Help in debugging this is appreciated. Corinna -- Corinna Vinschen Please, send mails regarding Cygwin to Cygwin Developer mailto:cygwin AT cygwin DOT com Red Hat, Inc.