How Can I Access The Secondary MV-HEVC Frame

Question

Created Dec ’23

Replies 4

Boosts 0

Views 1k

Participants 4

I’m working with the Spatial Video related APIs in AVFoundation, and while I can create an AVAssetReader that reads an AVAssetTrack that reports a .containsStereoMultiviewVideo media characteristic (on a spatial video recorded by an iPhone 15 Pro), the documentation doesn’t make it clear how I can obtain the secondary video frame from that track.

Does anyone know where to look? I've scoured the forums, documentation, and other resources, and I've had no luck.

Thanks!

Answered by DTS Engineer in 776248022

Hello,

When you read that video track with an AVAssetReaderTrackOutput, the CMSampleBuffers that are output should have an array of taggedBuffers, giving you access to the left and right views.

Boost

Answer 1

MrBenj4min OP

Dec ’23

I'm trying to do the same.

As far as I understand we have to use Video Toolbox APIs (VTDecompressionSessionSetMultiImageCallback) to get the second frame. So far I haven't figured out what pixel format to use when creating a AVAssetReaderTrackOutput instance and/or the matching CMFormatDescription for the VTDecompressionSession. I've only done simple AVFoundation transcoding in the past so I'm not even sure I'm on the right track :)

Cheers!

0

Answer 2

DTS Engineer OP

Apple

Jan ’24

Accepted Answer

Hello,

When you read that video track with an AVAssetReaderTrackOutput, the CMSampleBuffers that are output should have an array of taggedBuffers, giving you access to the left and right views.

1

Answer 3

DTS Engineer OP

Apple

Jan ’24

Oh, a very important addendum to my answer:

By default, AVAssetReaderTrackOutput will not give you the array of tagged buffers (it assumes you just want the primary view). Here is how you can ask that it give you both layers (as an array of tagged buffers):

// The outputSettings dictionary for the AVAssetReaderTrackOutput.
var outputSettings: [String: Any] = [:]
                
// The decompressionProperties dictionary for the outputSettings.
var decompressionProperties: [String: Any] = [:]
                
// Specify that you want to read both layers.
decompressionProperties[kVTDecompressionPropertyKey_RequestedMVHEVCVideoLayerIDs as String] = [0, 1]
                
// Set the decompressionProperties.
outputSettings[AVVideoDecompressionPropertiesKey] = decompressionProperties
                
// Create your output with the outputSettings and the video track (you can inspect the format description of the video track to make sure it contains multiple layers).
let output = AVAssetReaderTrackOutput(track: videoTracks.first!, outputSettings: outputSettings)

Now, when you call copyNextSampleBuffer(), that sample buffer should have a non-nil array of taggedBuffers.

1

Answer 4

DTS Engineer OP

Apple

Jan ’24

Addendum 2:

In the snippet above, there is an assumption that the layer IDs will be 0 and 1. This does not have to always be the case, so to be safe you should query for the layer IDs dynamically:

private func loadVideoLayerIdsForTrack(_ videoTrack: AVAssetTrack) async throws -> [Int64]? {
    let formatDescriptions = try await videoTrack.load(.formatDescriptions)
    var tags = [Int64]()
    if let tagCollections = formatDescriptions.first?.tagCollections {
        tags = tagCollections.flatMap({ $0 }).compactMap { tag in
            tag.value(onlyIfMatching: .videoLayerID)
        }
    }
    return tags
}

0