Unable to recover after URLSession encounters the SSL error -9816

For years our iOS apps have experienced a networking problem, which blocks them connecting to our servers via their API endpoint domains.

How can we recover after the scenario described below?

Using 3rd party error logging solutions, which have different endpoint domains, we can record the error:

NSUnderlyingError": Error Domain=kCFErrorDomainCFNetwork Code=-1200 "(null)" UserInfo={_kCFStreamPropertySSLClientCertificateState=0, _kCFNetworkCFStreamSSLErrorOriginalValue=-9816, _kCFStreamErrorDomainKey=3, _kCFStreamErrorCodeKey=-9816, _NSURLErrorNWPathKey=satisfied (Path is satisfied), viable, interface: pdp_ip0[lte], ipv4, dns, expensive, uses cell}, "_NSURLErrorFailingURLSessionTaskErrorKey": LocalDataTask <DEDBFA4D-810D-4438-A6A0-95E3B9668B9E>.<308>, "_kCFStreamErrorDomainKey": 3, "_NSURLErrorRelatedURLSessionTaskErrorKey": <__NSSingleObjectArrayI 0x301f82e60>(
LocalDataTask <DEDBFA4D-810D-4438-A6A0-95E3B9668B9E>.<308>
)
"NSLocalizedDescription": An SSL error has occurred and a secure connection to the server cannot be made., "NSLocalizedRecoverySuggestion": Would you like to connect to the server anyway?

-9816 is the "server closed session with no notification" error based on comments in CoreFoundation source files. Subsequent API endpoint calls to the same domain return the same error.

The SSL error occurs most prevalently after a server outage. However, despite our best efforts, we have been unable to replicate triggering the problem for development purposes via experiments with our server.

When the error occurs the users report that:

  1. Fully closing (i.e. not just sending to background) and reopening the app does NOT clear connectivity to our server being blocked.
  2. Problem seems more prevalent when using mobile/cell data.
  3. Switching from mobile/cell data to WIFI resolves the connection problem and then switching back to mobile/cell data shows the problem again. So the underlying problem is not cleared.
  4. All other apps on the same device and mobile/cell data or WIFI connection, like Safari, have no problems connecting to the Internet.
  5. Deleting and reinstalling, or updating (when an update is available) resolves the problem.
  6. Or after waiting a few days the problem seems to resolve itself.

The last two point above suggest that something is persisted/cached in the app preventing it from connecting properly with subsequent network attempts.

Notes:

  • We have one shared instance of the URLSession in the app for its networking because we are aware of the perils of multiple URLSession instances.
  • We recently added conditions to call the URLSession await reset() method when detecting the SLL errors before repeating the request. It is debatable whether this reduces the problem as we still see logged cases with the subsequent requests hitting the same -9816 error.

URLSession configuration:

let config = URLSessionConfiguration.default
config.timeoutIntervalForResource = 22
config.timeoutIntervalForRequest = 20
config.requestCachePolicy = .reloadIgnoringLocalCacheData
config.urlCache = nil
Answered by DTS Engineer in 800637022

-9816 is the "server closed session with no notification" error based on comments in CoreFoundation source files.

Correct. More specifically, in looks like it occurs when BoringSSL returns "SSL_ERROR_ZERO_RETURN" (meaning, the connection is "done") without having completed the TLS handshake process. Unfortunately, the most likely explanation would be a problem at the network level, not necessarily the "app" itself.

A few things that caught my eye in your message:

  1. Problem seems more prevalent when using mobile/cell data.

This always makes me think that IPv6/NAT64 could be involved, as that's the biggest difference in mobile networking.

Similarly this:

3.Switching from mobile/cell data to WIFI resolves the connection problem and then switching back to mobile/cell data shows the problem again.

...implies that the issue is with the cell network itself (or the data it's working off of), not the on device data.

There might be an interesting hint in these two data points:

  • Deleting and reinstalling, or updating (when an update is available) resolves the problem.
  • Or after waiting a few days the problem seems to resolve itself.

... config.requestCachePolicy = .reloadIgnoringLocalCacheData

What if the problem is the remote/intermediate network layer caching , not the local cache? Have you tried "NSURLRequest.CachePolicy.reloadIgnoringLocalAndRemoteCacheData"?

__
Kevin Elliott
DTS Engineer, CoreOS/Hardware

-9816 is the "server closed session with no notification" error based on comments in CoreFoundation source files.

Correct. More specifically, in looks like it occurs when BoringSSL returns "SSL_ERROR_ZERO_RETURN" (meaning, the connection is "done") without having completed the TLS handshake process. Unfortunately, the most likely explanation would be a problem at the network level, not necessarily the "app" itself.

A few things that caught my eye in your message:

  1. Problem seems more prevalent when using mobile/cell data.

This always makes me think that IPv6/NAT64 could be involved, as that's the biggest difference in mobile networking.

Similarly this:

3.Switching from mobile/cell data to WIFI resolves the connection problem and then switching back to mobile/cell data shows the problem again.

...implies that the issue is with the cell network itself (or the data it's working off of), not the on device data.

There might be an interesting hint in these two data points:

  • Deleting and reinstalling, or updating (when an update is available) resolves the problem.
  • Or after waiting a few days the problem seems to resolve itself.

... config.requestCachePolicy = .reloadIgnoringLocalCacheData

What if the problem is the remote/intermediate network layer caching , not the local cache? Have you tried "NSURLRequest.CachePolicy.reloadIgnoringLocalAndRemoteCacheData"?

__
Kevin Elliott
DTS Engineer, CoreOS/Hardware

Hi Kevin,

Thank you for your response. We released a new version of our app with the following change as per your suggestion:

urlRequest.cachePolicy = .reloadIgnoringLocalAndRemoteCacheData

We are still seeing the -9816 error in our logs for the updated app.

It is difficult to make any definitive conclusions whether this cache policy change combined with the urlSession.reset() has improved the ability of the app to recover. In some cases after the reset we see the logged SSL errors abruptly stop for the app session, which would suggest either a successful resolution (most likely) or the app being completely terminated by the user. However, we are seeing instances where the -9816 error still persists after the reset.

There is not much more we can add other than our server engineers confirming TLS 1.2 as a minimum.

Although reinstalling/updating the app seems to restore the ability to connect to the domain, your response leaned towards this potentially being more of a network side problem rather than app side. Is there anything else we could try in either the server side configuration or within the app?

Unable to recover after URLSession encounters the SSL error -9816
 
 
Q