Using URLCache subclasses with URLSession

I have an app which uses URLSession-based networking and URLCache for storing network requests on disk. I noticed that when the storage size of URLCache reaches the diskCapacity, the eviction strategy seems to be to remove all entries, which is a problem in my use case. So I decided to write an URLCache subclass which would provide a custom storage for cached responses and which would implement LRU eviction strategy with better control.

As URLCache's documentation states, subclassing for this purpose should be supported:

The URLCache class is meant to be used as-is, but you can subclass it when you have specific needs. For example, you might want to screen which responses are cached, or reimplement the storage mechanism for security or other reasons.

However, I ran into problems with trying to use this new URLCache subclass with URLSession networking.

I have a test resource which I fetch using HTTP GET. The response headers contain:

  • Cache-Control: public, max-age=30
  • Etag: <some-value>

When using the standard, non-subclassed URLCache, the first request loads the data from network as expected (verified with HTTP proxy). The second request doesn't go to the network, if done within first 30 seconds, as expected. Subsequent requests after 30 seconds cause conditional GETs with Etag, as expected.

When using a URLCache subclass, all requests load the data from network - max-age doesn't seem to matter, and no conditional GETs are made.

It seems that the URLCache does something special to the CachedURLResponse instances after they're loaded from its internal storage, and this something affects how URLSession handles the HTTP caching logic. What am I missing here?

I've written a very minimal URLCache implementation to demonstrate this problem. This class stores and loads CachedURLResponse instances using NSKeyedArchiver / NSKeyedUnarchiver, and it supports only zero or one response. Note that there are no calls to super - this is by design, since I want to use my own storage.

Here's the implementation:


class CustomURLCache: URLCache {
    let cachedResponseFileURL = URL(filePath: NSTemporaryDirectory().appending("entry.data"))

    // MARK: Internal storage
    func read() -> CachedURLResponse? {
        guard let data = try? Data(contentsOf: cachedResponseFileURL) else { return nil }
        return try! NSKeyedUnarchiver.unarchiveTopLevelObjectWithData(data) as! CachedURLResponse
    }

    func store(_ cachedResponse: CachedURLResponse) {
        try! (try! NSKeyedArchiver.archivedData(withRootObject: cachedResponse, requiringSecureCoding: false)).write(to: cachedResponseFileURL)
    }

    // MARK: URLCache Overrides
    override func cachedResponse(for request: URLRequest) -> CachedURLResponse? {
        read()
    }

    override func getCachedResponse(for dataTask: URLSessionDataTask, completionHandler: @escaping (CachedURLResponse?) -> Void) {
        guard let response = read() else {
            completionHandler(nil)
            return
        }
        completionHandler(response)
    }

    override func storeCachedResponse(_ cachedResponse: CachedURLResponse, for request: URLRequest) {
        store(cachedResponse)
    }

    override func storeCachedResponse(_ cachedResponse: CachedURLResponse, for dataTask: URLSessionDataTask) {
        store(cachedResponse)
    }
}

My test case:


    func test() {
        let useEvictingCache = false
        let config = URLSessionConfiguration.default

        if useEvictingCache {
            config.urlCache = CustomURLCache()
        } else {
            config.urlCache = URLCache(memoryCapacity: 0, diskCapacity: 1024 * 1024 * 100)
        }

        self.urlSession = URLSession(configuration: config)

        let url = URL(string: "https://example.com/my-test-resource")!
        self.urlSession?.dataTask(with: URLRequest(url: url), completionHandler: { data, response, error in
            if let data {
                print("GOT DATA with \(data.count) bytes")
            } else if let error {
                print("GOT ERROR \(error)")
            }
        }).resume()
    }


My experience is that subclassing Foundation’s URL loading system classes puts you on a path of pain [1]. If I were in your shoes, I’d do your custom caching above the Foundation URL loading system layer.

Share and Enjoy

Quinn “The Eskimo!” @ Developer Technical Support @ Apple
let myEmail = "eskimo" + "1" + "@" + "apple.com"

[1] Way back in the day the Foundation URL loading system was implemented in Objective-C and existed within the Foundation framework. In that world, subclasses mostly worked. Shortly thereafter — and I’m talking before the introduction of NSURLSession here — the core implementation changed languages and moved to CFNetwork. Since then, the classes you see in Foundation are basically thin wrappers around (private) CFNetwork types. That’s generally OK, except for the impact on subclassing.

@eskimo much as I expect you are right that sub-classing URLCache could result in a path of pain, implementing one's own cache could result in an equal path of pain. Some considerations:

URLCache uses an SQL DB under the hood - this is useful because it means that transactions can be used to roll back if both data and metadata fail to be written atomically, which the implementer of said custom cache implementation would also need to do. Additionally, the implementer would need to take care of eviction when the disk becomes full or when there's high memory pressure...If developers could focus on solving problems in their own domain instead of the domain that Apple ought to solve for us (providing an RFC-9111 compliant cache), we'd be more productive...it would be great if you could raise this with the relevant teams...🙏

implementing one's own cache could result in an equal path of pain.

Yeah, it is, as I like to say, a balance of pain (-: And that’s not unique to the URL cache; it applies to most engineering endeavours.

Some considerations

One thing to keep in mind is that not all apps needs to cache at the HTTP layer. That’s important because a generalised HTTP cache is tricky. If your app has specific requirements, you can create a cache that’s tailored to its needs, which may well result in code that’s easier to write and caches better.

it would be great if you could raise this with the relevant teams

If you have specific requirements here, I encourage you to file a bug describing them.

Share and Enjoy

Quinn “The Eskimo!” @ Developer Technical Support @ Apple
let myEmail = "eskimo" + "1" + "@" + "apple.com"

I'm facing a similar issue as posted here. I have been monitoring my currentDiskUsage on URLCache. I've noticed that even with just a single url getting cached that the disk utilization just continues to increase. Am i don't something wrong. I was under the impression that if you cache the same url with same headers it should overwrite the last entry. I'm not seeing that. The disk usage just climbs and climbs. It is a slow ascent but i wouldn't expect this. Is this expected behavior? That the URLCache continues to grow at all times. can we not have 5 endpoints in the app and then those 5 endpoints store their responses and don't take up anymore space. future responses just overwrite the past responses. What i'm witnessing is that the disk growth behaves almost like a memory leak. Any input would be appreciated.

Using URLCache subclasses with URLSession
 
 
Q