For our iOS app, we're switching WebSocket client implementation from one using Poco project to one based on NSURLSessionWebSocketTask. We are observing one regression which prevents us from shipping this solution though:
5-7% of calls to -[NSURLSessionWebSocketTask cancelWithCloseCode:reason:] fail to deliver given close code to server and 1006 close code (AbnormalClosure) is used instead.
Our test consist of following steps:
Create WebSocket using NSURLSessionWebSocketTask
Send "Hello" message
Close WebSocket with 1011 close code
Destroy NSURLSessionWebSocketTask instance
Check whether server received 1011 close code
We do not perform those steps line by line, but rather use delays (~100-200ms) between each step.
We perform such test 1000 times and observe 5-7% failure rate consistently.
Distribution of failures is random.
We did perform test on both iOS simulator and iOS devices with no observable differences.
We did perform test on multiple server implementations (tornado- and node.js based ones) running on local machine (same as Xcode) and remote one
We always receive "Hello" message on server
1011 close code is arbitrarily chosen for testing, but results are same for different ones, too.
We did setup mitmproxy and for failed tests see following log:
Code Block Error in WebSocket connection to 20.16.12.131:8080: WebSocket connection closed unexpectedly by client: TcpDisconnect(None)
We did enable CFNetwork diagnostic using CFNETWORK_DIAGNOSTICS and found one difference between success and failure cases. For successful cases we see log which looks like this:
Code Block default 16:27:15.831193+0200 Playground-ObjC tcp_close [C7.1:2] TCP Packets: snd 0.000s seq 1092086878:1092086879 ack 0 win 65535 len 0 [SEC] rcv 0.004s seq 3101059099:3101059100 ack 1092086879 win 65535 len 0 [S.] snd 0.000s seq 1092086879:1092086879 ack 3101059100 win 4117 len 0 [.] snd 0.002s seq 1092086879:1092087279 ack 3101059100 win 4117 len 400 [P.] rcv 0.001s seq 3101059100:3101059100 ack 1092086879 win 2058 len 0 [.] rcv 0.000s seq 3101059100:3101059100 ack 1092087279 win 2052 len 0 [.] rcv 0.015s seq 3101059100:3101059229 ack 1092087279 win 2052 len 129 [P.] ECT0 snd 0.000s seq 1092087279:1092087279 ack 3101059229 win 4113 len 0 [.] snd 0.003s seq 1092087279:1092087289 ack 3101059229 win 4113 len 10 [P.] rcv 0.002s seq 3101059229:3101059229 ack 1092087289 win 2052 len 0 [.] snd 0.017s seq 1092087289:1092087297 ack 3101059229 win 4113 len 8 [P.] snd 0.001s seq 1092087297:1092087298 ack 3101059229 win 4113 len 0 [F.] rcv 0.000s seq 3101059229:3101059229 ack 1092087297 win 2052 len 0 [.] rcv 0.001s seq 3101059229:3101059233 ack 1092087297 win 2052 len 4 [P.] ECT0 Last packet 0ms ago.
So it's something related to "tcp_close" and seems to log whole TCP packets that were exchanged during connection. For failed cases such log is not present.
One additional log we see potentially interesting (but were not able to confirm its meaning or relation to failed cases) comes from runningboardd:
Code Block default 16:27:15.830447+0200 runningboardd Invalidating assertion 33-1364-23682 (target:[application<org.example.app>:1364]) from originator [application<org.example.app>:1364]
So that's where we are right now and we have no idea where to dig next. We wonder if anyone else have seen such problem or can point us to any direction we could try next.
Thanks much,
Damian & Maciek