Strange behavior with 100k+ records in NSPersistentCloudKitContainer

I have been using the basic NSPersistentContainer with 100k+ records for a while now with no issues. The database size can fluctuate a bit but on average it takes up about 22mb on device.

When I switch the container to NSPersistentCloudKitContainer, I see a massive increase in size to ~150mb initially. As the sync engine uploads records to iCloud it has ballooned to over 600mb on device. On top of that, the user's iCloud usage in settings reports that it takes up 1.7gb in the cloud. I understand new tables are added and history tracking is enabled but the size increase seems a bit drastic. I'm not sure how we got from 22mb to 1.7gb with the exact same data.

A few other things that are important to note:

  • I import all the 100k+ records at once when testing the different containers. At the time of the initial import there is only 1 relation (an import group record) that all the records are attached to.
  • I save the background context only once after all the records and the import group have been made and added to the context.
  • After the initial import, some of these records may have a few new relations added to them over time. I suppose this could be causing some of the size increase, but its only about 20,000 records that are updated.
  • None of the records include files/ large binary data.
  • Most of the attributes are encrypted.
  • I'm syncing to the dev iCloud environment.
  • When I do make a change to a single attribute in a record, CloudKit reports that every attribute has been modified (not sure if this is normal or not )

Also, When syncing to a new device, the sync can take hours - days. I'm guessing it's having to sync both the new records and the changes, but it exponentially gets slower as more records are downloaded. The console will show syncing activity, but new records are being added at a slower rate as more records are added. After about 50k records, it grinds to a halt and while the console still shows sync activity, only about 100 records are added every hour.

All this to say i'm very confused where these issues are coming from. I'm sure its a combination of how i've setup my code and the vast record count, record history, etc.

If anyone has any ideas it would be much appreciated.

There are a number of reasons for these different behaviors. I would start by examining your schema and how it's being translated to instances of CKRecord.

https://developer.apple.com/documentation/coredata/mirroring_a_core_data_store_with_cloudkit/reading_cloudkit_records_for_core_data?language=objc

You can examine the structure of the store which may be informative with sqlite3_analyzer: https://www.sqlite.org/download.html

Finally, profiling an import with instruments may also be informative as to where there are performance bottlenecks in your code or the framework. But without a representative dataset and a reproducible test to analyze it is difficult to make any specific recommendations.

I ended up hacking together some code to aggregate local records into files and then upload them as a CKAsset. It reduces the actual CKRecord count drastically and in turn, the storage usage. While this works, it’s not ideal for all my data types. And doesn’t completely eliminate the storage increase issue. If you have any updates to why this is happening please let me know.

Again, I am happy to provide the sample project that I mentioned in my direct replies, If needed let me know where to send it.

Strange behavior with 100k+ records in NSPersistentCloudKitContainer
 
 
Q