Need Assistance with Projecting 3D World Points to 2D Screen Coordinates Using ARKit

Steps to Reproduce:

  1. Create a SwiftUI view that initializes an ARKit session and a camera frame provider.

  2. Attempt to run the ARKit session and retrieve camera frames.

  3. Extract the intrinsics and extrinsics matrices from the camera frame’s sample data.

  4. Attempt to project a 3D point from the world space onto the 2D screen using the retrieved camera parameters.

  5. Encounter issues due to lack of detailed documentation on the correct usage and structure of the intrinsics and extrinsics matrices.

struct CodeLevelSupportView: View {
    @State
    private var vm = CodeLevelSupportViewModel()
    var body: some View {
        RealityView { realityViewContent in }
        .onAppear {
            vm.receiveCamera()
        }
    }
}


@MainActor
@Observable
class CodeLevelSupportViewModel {
    let cameraSession = CameraFrameProvider()
    let arSession = ARKitSession()
    init() {
        Task {
            await arSession.requestAuthorization(for: [.cameraAccess])
        }
    }
    func receiveCamera() {
        Task {
            do {
                try await arSession.run([cameraSession])
                guard let sequence = cameraSession.cameraFrameUpdates(for: .supportedVideoFormats(for: .main, cameraPositions: [.left])[0]) else {
                    print("failed to get cameraAccess authorization")
                    return
                }
                for try await frame in sequence {
                    guard let sample = frame.sample(for: .left) else {
                        print("failed to get camera sample")
                        return
                    }
                    let leftEyeScreenImage:CVPixelBuffer = sample.pixelBuffer
                    let leftEyeViewportWidth:Int = CVPixelBufferGetWidth(leftEyeScreenImage)
                    let leftEyeViewportHeight:Int = CVPixelBufferGetHeight(leftEyeScreenImage)
                    let intrinsics = sample.parameters.intrinsics
                    let extrinsics = sample.parameters.extrinsics
                    let oneMeterInFront:SIMD3<Float> = .init(x: 0, y: 0, z: -1)
                    projectWorldLocationToLeftEyeScreen(worldLocation: oneMeterInFront, intrinsics: intrinsics, extrinsics: extrinsics, viewportSize: (leftEyeViewportWidth,leftEyeViewportHeight))
                }
            } catch {
               
            }
        }
    }
    
    //After the function implementation is completed, it should return a CGPoint?, representing the point of this worldLocation in the LeftEyeViewport. If this worldLocation is not visible in the LeftEyeViewport (out of bounds), return nil.
    func projectWorldLocationToLeftEyeScreen(worldLocation:SIMD3<Float>,intrinsics:simd_float3x3,extrinsics:simd_float4x4,viewportSize:(width:Int,height:Int)) {
        //The API documentation does not provide the structure of intrinsics and extrinsics, making it hard to done this function.
    }
}

I need to obtain the 2D position of a 3D point on the left eye screen, which will be used for running a specific machine learning algorithm on the left eye's camera frame, not for rendering content. Therefore, Reality Composer Pro is not helpful for this use case.

Hello @Shengjiang,

I recommend that you take a look at the documentation for AVCameraCalibration's intrinsic and extrinsic matrices.

https://developer.apple.com/documentation/avfoundation/avcameracalibrationdata/2881135-intrinsicmatrix

https://developer.apple.com/documentation/avfoundation/avcameracalibrationdata/2881130-extrinsicmatrix

That documentation may give you some conceptual insight on how you can make use of the same properties on a CameraFrame.Sample to convert a world point to an image point.

Best regards,

Greg

Need Assistance with Projecting 3D World Points to 2D Screen Coordinates Using ARKit
 
 
Q