Bad Access using `nw_connection_send`

Hi all,

I'm developing an TCP socket SDK in C. The SDK is using Apple Network Framework and encountered some wired bad access issue occasionally on function nw_connection_send.
Looking into the trace stack, it was bad access issue in nw_write_request_create, when it is trying to release a reference. However, I could not found more doc/source code details about nw_write_request_create.

// on socket destroy, we will release the related nw_connection.
increase_ref_count(socket)
nw_connection_t nw_connection = socket->nw_connection;
dispatch_data_t data = dispatch_data_create(message_ptr->ptr, message_ptr->len, dispath_event_loop, DISPATCH_DATA_DESTRUCTOR_FREE); 

    //  > Bad Access here < 
    // While I check `nw_connection` and `data`, both seems available while the function get called. I tried to call dispatch_retain on `data`, but it was not helpful. 
    nw_connection_send( nw_connection, data, NW_CONNECTION_DEFAULT_MESSAGE_CONTEXT, false, ^(nw_error_t error) {
          // process the message, we will release message_buf in this function. 
          completed_fn(message_buf);
          
          reduce_ref_count(socket)
    }

While I check nw_connection and data, both seems available while the function get called. I tried to call dispatch_retain on data, but it was not helpful. Is there any way to narrow down which object is releasing?

As the issue happened occasionally (9 failure out of 10 attempts when I run multiple unit tests at the same time, and I rarely see it when I ran a single unit test).

I would assume it was actually a race condition here. Is there a way to track down which object is released?

I do understand it would be hard to track without knowing more design details of my SDK, but any related suggestions or ideas would be appreciated. Thanks in advance.

More related source code:

struct nw_socket{
   nw_connection_t nw_connection;
   nw_parameters_t socket_options_to_params;
   dispatch_queue_t event_loop;
   // ... bunch of other parameters... 
   struct ref_count ref_count;
}


static int s_socket_connect_fn(
    const struct socket_endpoint *remote_endpoint,
    struct dispatch_queue_t event_loop)
{
     nw_socket = /*new socket memory allocation, increasing ref count*/
     nw_endpoint_t endpoint = nw_endpoint_create_address(/* process remote_endpoint */);
     nw_socket->nw_connection =  nw_connection_create(endpoint, nw_socket >socket_options_to_params);
    nw_release(endpoint);
    
    nw_socket->nw_connection->set_queue(nw_socket->nw_connection, event_loop);
    nw_socket->event_loop = event_loop;
    
     nw_connection_set_state_changed_handler(nw_socket->nw_connection, ^(nw_connection_state_t state, nw_error_t error) {
        // setup connection handler
    }
    nw_connection_start(nw_socket->nw_connection);
    nw_retain(nw_socket->nw_connection);
   
}

// nw_socket is ref counted, call the destroy function on ref_count reduced to 0
static void s_socket_impl_destroy(void *sock_ptr) {
    struct nw_socket *nw_socket = sock_ptr;
    /* Network Framework cleanup */
    if (nw_socket->socket_options_to_params) {
        nw_release(nw_socket->socket_options_to_params);
        nw_socket->socket_options_to_params = NULL;
    }

    if (nw_socket->nw_connection) {
        nw_release(nw_socket->nw_connection);
        // Print here, to make sure the nw_connection was not released before nw_connection_send call. 
        nw_socket->nw_connection = NULL;
    }

    // releasing memory and other parameters
}


static int s_socket_write_fn(
    struct nw_socket *socket,
    const struct bytePtr* message_ptr,  // message_ptr is a pointer to allocated message_buf
    socket_on_write_completed_fn *completed_fn,
    void *message_buf) {
   
 
    // Ideally nw_connection would not be released, as socket ref_count is retained here. 
    increase_ref_count(socket->ref_count);
    nw_connection_t nw_connection = socket->nw_connection;
    struct dispatch_queue_t dispatch_event_loop = socket->event_loop;

    dispatch_data_t data = dispatch_data_create(message_ptr->ptr, message_ptr->len, dispath_event_loop, DISPATCH_DATA_DESTRUCTOR_FREE); 

    //  > Bad Access here < 
    // While I check `nw_connection` and `data`, both seems available while the function get called. I tried to call dispatch_retain on `data`, but it is not helpful. 
    nw_connection_send( nw_connection, data, NW_CONNECTION_DEFAULT_MESSAGE_CONTEXT, false, ^(nw_error_t error) {
          // process the message, we will release message_buf in this function. 
          completed_fn(message_buf);
          reduce_ref_count(socket)
    }

}
Answered by DTS Engineer in 803894022

To start, I recommend you run your test under the standard memory debugging tools. I strongly suspect you have a memory management issue here.

Regarding this line:

dispatch_data_t data = dispatch_data_create(message_ptr->ptr, message_ptr->len, dispath_event_loop, DISPATCH_DATA_DESTRUCTOR_FREE); 

This means that:

  1. Dispatch will create a data object with a no-copy reference to the buffer described by message_ptr->ptr and message_ptr->len.

  2. When the data object is deallocated — that is, when its ref count hits zero — Dispatch will free that buffer by calling free.

Is that what you’re expecting to happen? Because it doesn’t really gel with the comment in this code:

// process the message, we will release message_buf in this function. 
completed_fn(message_buf);

Share and Enjoy

Quinn “The Eskimo!” @ Developer Technical Support @ Apple
let myEmail = "eskimo" + "1" + "@" + "apple.com"

To start, I recommend you run your test under the standard memory debugging tools. I strongly suspect you have a memory management issue here.

Regarding this line:

dispatch_data_t data = dispatch_data_create(message_ptr->ptr, message_ptr->len, dispath_event_loop, DISPATCH_DATA_DESTRUCTOR_FREE); 

This means that:

  1. Dispatch will create a data object with a no-copy reference to the buffer described by message_ptr->ptr and message_ptr->len.

  2. When the data object is deallocated — that is, when its ref count hits zero — Dispatch will free that buffer by calling free.

Is that what you’re expecting to happen? Because it doesn’t really gel with the comment in this code:

// process the message, we will release message_buf in this function. 
completed_fn(message_buf);

Share and Enjoy

Quinn “The Eskimo!” @ Developer Technical Support @ Apple
let myEmail = "eskimo" + "1" + "@" + "apple.com"

It’s better if you a reply as a reply; if you reply in the comments I may not see it. See Quinn’s Top Ten DevForums Tips this and other hints.

I tried DISPATCH_DATA_DESTRUCTOR_DEFAULT, but, it was the same issue.

Fair enough. However, that doesn’t mean it’s the wrong thing to do, just that you have multiple issues (-:

DISPATCH_DATA_DESTRUCTOR_DEFAULT should create a copy of the message_ptr.

Hmmm. It definitely creates a copy of the buffer, but there’s a question about what it creates a copy of. In the code you posted message_ptr isn’t the data buffer, message_ptr->ptr is. In your example, the buffer described by message_ptr->ptr and message_ptr->len is what would get copied.

Share and Enjoy

Quinn “The Eskimo!” @ Developer Technical Support @ Apple
let myEmail = "eskimo" + "1" + "@" + "apple.com"

Bad Access using `nw_connection_send`
 
 
Q