ADFS losing connection to DC

  • 318 Views
  • Last Post 20 June 2018
kool posted this 13 June 2018

Hey folks,

We just upgraded to ADFS 4.0 from 2.x. Now in the space of 3 weeks we've had two different ADFS servers lose their NetLogon secure channel session with a DC. The symptoms as found in the event log:
In the System log, a Tcpip event 4227 warning:
TCP/IP failed to establish an outgoing connection because the selected local endpoint was recently used to connect to the same remote endpoint. This error typically occurs when outgoing connections are opened and closed at a high rate, causing all available local ports to be used and forcing TCP/IP to reuse a local port for an outgoing connection. To minimize the risk of data corruption, the TCP/IP standard requires a minimum time period to elapse between successive connections from a given local endpoint to a given remote endpoint.

Then a minute later, also in the System log, a NETLOGON 5719 error:
This computer was not able to set up a secure session with a domain controller in domain NETID due to the following:
The RPC server is unavailable.

And immediately after, in the "AD FS/Admin" log event 342 errors of the form:
@uw.edu-There are currently no logon servers available to service the logon request

I am sending some of the perf counters to our Graphite system and one is "\TCPv4\Connections Established." I see nothing unusual with this count around the time that the Tcpip event 4277 is recorded. Maybe that event is a red herring.

NetLogon debug logging was not enabled so nothing to check there. I can turn up the debug log level so we have something the next time this happens, but I first wanted to ask if anyone else has seen this. My initial searching did not turn up much.

Thanks,

Eric

Forum info: http://www.activedir.org
Problems unsubscribing? Email admin@xxxxxxxxxxxxxxxx

Order By: Standard | Newest | Votes
GuyTe posted this 18 June 2018

Was MaxConcurrentApi altered on the ADFS server?
"nltest /dbflag:0x2080ffff" should be enough to catch MaxConcurrentApi related issues.

And if it's at the TCP layer, you might want to reduce the default TcpTimedWaitDelay from 120 seconds to somewhere around 30-60 seconds:
https://docs.microsoft.com/en-us/biztalk/technical-guides/settings-that-can-be-modified-to-improve-network-performance

Guy

show

kool posted this 19 June 2018

Hi Guy,

Thanks for your response. Yes, I've set the NetLogon debug flags on all of the servers in the farm. I haven't made any changes to MaxConcurrentApi though.

I did see articles on changing the TcpTImedWaitDelay but I don't want to mess with that unless the problem reoccurs and the NetLogon logs indicate that would help.

Thanks again,

Eric

show

bhopkins posted this 20 June 2018

Have you got Firepower or Palo Alto on the edge? Or some other filter?  I haven’t seen this, but my gut is telling me this is a network device killing traffic because it’s classifying as a DOS attack or something. I would make sure everything is whitelisted to the Office 365 IPs just to rule that out first.


show

GuyTe posted this 20 June 2018

Eric,
Just to be clear: TcpTimedWaitDelay is not related to MaxConcurrentApi and in the past I have witnessed some application servers (Exchange) running into sockets exhaustion due to idle TCP sessions not being closed fast enough.

Guy

show

Close