Almost all the functions in Accelerate are for single precision (Float) and double precision (Double) operations. However, I stumbled upon three integer arithmetic functions which operate on Int32 values. Are there any more functions in Accelerate that operate on integer values? If not, then why aren't there more functions that work with integers?
Accelerate
RSS for tagMake large-scale mathematical computations and image calculations with high-performance, energy-efficient computation using Accelerate.
Posts under Accelerate tag
20 Posts
Sort by:
Post
Replies
Boosts
Views
Activity
I can use BLAS and LAPACK functions via the Accelerate framework to perform vector and matrix arithmetic and linear algebra calculations. But do these functions take advantage of Apple Silicon features?
So I get JPEG data in my app. Previously I was using the higher level NSBitmapImageRep API and just feeding the JPEG data to it.
But now I've noticed on Sonoma If I get a JPEG in the CMYK color space the NSBitmapImageRep renders mostly black and is corrupted. So I'm trying to drop down to the lower level APIs. Specifically I grab a CGImageRef and and trying to use the Accelerate API to convert it to another format (to hopefully workaround the issue...
CGImageRef sourceCGImage = `CGImageCreateWithJPEGDataProvider(jpegDataProvider,`
NULL,
shouldInterpolate,
kCGRenderingIntentDefault);
Now I use vImageConverter_CreateWithCGImageFormat... with the following values for source and destination formats:
Source format: (derived from sourceCGImage)
bitsPerComponent = 8
bitsPerPixel = 32
colorSpace = (kCGColorSpaceICCBased; kCGColorSpaceModelCMYK; Generic CMYK Profile)
bitmapInfo = kCGBitmapByteOrderDefault
version = 0
decode = 0x000060000147f780
renderingIntent = kCGRenderingIntentDefault
Destination format:
bitsPerComponent = 8
bitsPerPixel = 24
colorSpace = (DeviceRBG)
bitmapInfo = 8197
version = 0
decode = 0x0000000000000000
renderingIntent = kCGRenderingIntentDefault
But vImageConverter_CreateWithCGImageFormat fails with kvImageInvalidImageFormat. Now if I change the destination format to use 32 bitsPerpixel and use alpha in the bitmap info the vImageConverter_CreateWithCGImageFormat does not return an error but I get a black image just like NSBitmapImageRep
Hello! I’m making an app which will have a waveform of the frequency of what’s playing on a Mac. The question is whether it is possible to have access to the signal of the media and use it with the FFT?
func testMLTensor() {
let t1 = MLTensor(shape: [2000, 1], scalars: [Float](repeating: Float.random(in: 0.0...10.0), count: 2000), scalarType: Float.self)
let t2 = MLTensor(shape: [1, 3000], scalars: [Float](repeating: Float.random(in: 0.0...10.0), count: 3000), scalarType: Float.self)
for _ in 0...50 {
let t = Date()
let x = (t1 * t2)
print("MLTensor", t.timeIntervalSinceNow * 1000, "ms")
}
}
testMLTensor()
The above code took more time than expected, especially in the early stage of iteration.
func testMLTensor() {
let t1 = MLTensor(shape: [2000, 1], scalars: [Float](repeating: Float.random(in: 0.0...10.0), count: 2000), scalarType: Float.self)
let t2 = MLTensor(shape: [1, 3000], scalars: [Float](repeating: Float.random(in: 0.0...10.0), count: 3000), scalarType: Float.self)
for _ in 0...50 {
let t = Date()
let x = (t1 * t2)
print("MLTensor", t.timeIntervalSinceNow * 1000, "ms")
}
}
testMLTensor()
The above code took more time than expected, especially in the early stage of iteration.
func testMLTensor() {
let t1 = MLTensor(shape: [2000, 1], scalars: [Float](repeating: Float.random(in: 0.0...10.0), count: 2000), scalarType: Float.self)
let t2 = MLTensor(shape: [1, 3000], scalars: [Float](repeating: Float.random(in: 0.0...10.0), count: 3000), scalarType: Float.self)
for _ in 0...50 {
let t = Date()
let x = (t1 * t2)
print("MLTensor", t.timeIntervalSinceNow * 1000, "ms")
}
}
testMLTensor()
The above code took more time than expected, especially in the early stage of iteration.
Hello everybody,
I am running into an error with BNNS.NormalizationLayer. It appears to only work with .vector, and matrix shapes throws layerApplyFail during training. Inference doesn't throw but the output stays the same.
How to correctly use BNNS.NormalizationLayer with matrix shapes? How to debug layerApplyFail exception?
Thanks
let array: [Float32] = [
01, 02, 03, 04, 05, 06,
07, 08, 09, 10, 11, 12,
13, 14, 15, 16, 17, 18,
]
// let inputShape: BNNS.Shape = .vector(6 * 3) // works
let inputShape: BNNS.Shape = .matrixColumnMajor(6, 3)
let input = BNNSNDArrayDescriptor.allocateUninitialized(scalarType: Float32.self, shape: inputShape)
let output = BNNSNDArrayDescriptor.allocateUninitialized(scalarType: Float32.self, shape: inputShape)
let beta = BNNSNDArrayDescriptor.allocate(repeating: Float32(0), shape: inputShape, batchSize: 1)
let gamma = BNNSNDArrayDescriptor.allocate(repeating: Float32(1), shape: inputShape, batchSize: 1)
let activation: BNNS.ActivationFunction = .identity
let layer = BNNS.NormalizationLayer(type: .layer(normalizationAxis: 0), input: input, output: output, beta: beta, gamma: gamma, epsilon: 1e-12, activation: activation)!
let layerInput = BNNSNDArrayDescriptor.allocate(initializingFrom: array, shape: inputShape)
let layerOutput = BNNSNDArrayDescriptor.allocateUninitialized(scalarType: Float32.self, shape: inputShape)
// try layer.apply(batchSize: 1, input: layerInput, output: layerOutput, for: .inference) // No throw
try layer.apply(batchSize: 1, input: layerInput, output: layerOutput, for: .training)
_ = layerOutput.makeArray(of: Float32.self) // All zeros when .inference
Hey, I’m building a camera app where I am applying real time effects to the view finder. One of those effects is a variable blur, so to improve performance I am scaling down the input image using CIFilter.lanczosScaleTransform(). This works fine and runs at 30FPS, but when running the metal profiler I can see that the scaling transforms use a lot of GPU time, almost as much as the variable blur. Is there a more efficient way to do this?
The simplified chain is like this:
Scale down viewFinder CVPixelBuffer (CIFilter.lanczosScaleTransform)
Scale up depthMap CVPixelBuffer to match viewFinder size (CIFilter.lanczosScaleTransform)
Create CIImages from both CVPixelBuffers
Apply VariableDepthBlur (CIFilter.maskedVariableBlur)
Scale up final image to metal view size (CIFilter.lanczosScaleTransform)
Render CIImage to a MTKView using CIRenderDestination
From some research, I wonder if scaling the CVPixelBuffer using the accelerate framework would be faster? Also, Instead of scaling the final image, perhaps I could offload this to the metal view?
Any pointers greatly appreciated!
Helo all,
Currently, I'm working on an iOS app that performs measurement and shows the results to the user in a graph. I use a Savitzky-Golay filter to filter out noise, so that the graph is nice and smooth. However, the code that calculates the Savitzky-Golay coefficients using sparse matrices crashes sometimes, throwing an EXC_BAD_ACCESS. I tried to find out what the problem is by turning on Address Sanitizer and Thread Sanitizer, but, for some reason, the bad access exception isn't thrown when either of these is on. What else could I try to trace back the problem?
Thanks in advance,
CaS
To reproduce the error, run the following:
import SwiftUI
import Accelerate
struct ContentView: View {
var body: some View {
VStack {
Button("Try", action: test)
}
.padding()
}
func test() {
for windowLength in 3...100 {
let coeffs = SavitzkyGolay.coefficients(windowLength: windowLength, polynomialOrder: 2)
print(coeffs)
}
}
}
class SavitzkyGolay {
static func coefficients(windowLength: Int, polynomialOrder: Int, derivativeOrder: Int = 0, delta: Int = 1) -> [Double] {
let (halfWindow, remainder) = windowLength.quotientAndRemainder(dividingBy: 2)
var pos = Double(halfWindow)
if remainder == 0 {
pos -= 0.5
}
let X = [Double](stride(from: Double(windowLength) - pos - 1, through: -pos, by: -1))
let P = [Double](stride(from: 0, through: Double(polynomialOrder), by: 1))
let A = P.map { exponent in
X.map {
pow($0, exponent)
}
}
var B = [Double](repeating: 0, count: polynomialOrder + 1)
B[derivativeOrder] = Double(factorial(derivativeOrder)) / pow(Double(delta), Double(derivativeOrder))
return leastSquaresSolution(A: A, B: B)
}
static func leastSquaresSolution(A: [[Double]], B: [Double]) -> [Double] {
let sparseA = A.sparseMatrix()
var sparseAValuesCopy = sparseA.values
var xValues = [Double](repeating: 0, count: A.transpose().count)
var bValues = B
sparseAValuesCopy.withUnsafeMutableBufferPointer { valuesPtr in
let a = SparseMatrix_Double(
structure: sparseA.structure,
data: valuesPtr.baseAddress!
)
bValues.withUnsafeMutableBufferPointer { bPtr in
xValues.withUnsafeMutableBufferPointer { xPtr in
let b = DenseVector_Double(
count: Int32(B.count),
data: bPtr.baseAddress!
)
let x = DenseVector_Double(
count: Int32(A.transpose().count),
data: xPtr.baseAddress!
)
#warning("EXC_BAD_ACCESS is thrown below")
print("This code is executed...")
let status = SparseSolve(SparseLSMR(), a, b, x, SparsePreconditionerDiagScaling)
print("...but, if an EXC_BAD_ACCESS is thrown, this code isn't")
if status != SparseIterativeConverged {
fatalError("Failed to converge. Returned with error \(status).")
}
}
}
}
return xValues
}
}
func factorial(_ n: Int) -> Int {
n < 2 ? 1 : n * factorial(n - 1)
}
extension Array where Element == [Double] {
func sparseMatrix() -> (structure: SparseMatrixStructure, values: [Double]) {
let columns = self.transpose()
var rowIndices: [Int32] = columns.map { column in
column.indices.compactMap { indexInColumn in
if column[indexInColumn] != 0 {
return Int32(indexInColumn)
}
return nil
}
}.reduce([], +)
let sparseColumns = columns.map { column in
column.compactMap {
if $0 != 0 {
return $0
}
return nil
}
}
var counter = 0
var columnStarts = [Int]()
for sparseColumn in sparseColumns {
columnStarts.append(counter)
counter += sparseColumn.count
}
let reducedSparseColumns = sparseColumns.reduce([], +)
columnStarts.append(reducedSparseColumns.count)
let structure: SparseMatrixStructure = rowIndices.withUnsafeMutableBufferPointer { rowIndicesPtr in
columnStarts.withUnsafeMutableBufferPointer { columnStartsPtr in
let attributes = SparseAttributes_t()
return SparseMatrixStructure(
rowCount: Int32(self.count),
columnCount: Int32(columns.count),
columnStarts: columnStartsPtr.baseAddress!,
rowIndices: rowIndicesPtr.baseAddress!,
attributes: attributes,
blockSize: 1
)
}
}
return (structure, reducedSparseColumns)
}
func transpose() -> Self {
let columns = self.count
let rows = self.reduce(0) { Swift.max($0, $1.count) }
return (0 ..< rows).reduce(into: []) { result, row in
result.append((0 ..< columns).reduce(into: []) { result, column in
result.append(row < self[column].count ? self[column][row] : 0)
})
}
}
}
I have a Matrix structure as defined below for working with 2D numerical data in Accelerate. The underlying numerical data in this Matrix struct is stored as an Array.
struct Matrix<T> {
let rows: Int
let columns: Int
var data: [T]
init(rows: Int, columns: Int, fill: T) {
self.rows = rows
self.columns = columns
self.data = Array(repeating: fill, count: rows * columns)
}
init(rows: Int, columns: Int, source: (inout UnsafeMutableBufferPointer<T>) -> Void) {
self.rows = rows
self.columns = columns
self.data = Array(unsafeUninitializedCapacity: rows * columns) { buffer, initializedCount in
source(&buffer)
initializedCount = rows * columns
}
}
subscript(row: Int, column: Int) -> T {
get { return self.data[(row * self.columns) + column] }
set { self.data[(row * self.columns) + column] = newValue }
}
}
Multiplication is implemented by the functions shown below.
import Accelerate
infix operator .*
func .* (lhs: Matrix<Double>, rhs: Matrix<Double>) -> Matrix<Double> {
precondition(lhs.rows == rhs.rows && lhs.columns == rhs.columns, "Matrices must have same dimensions")
let result = Matrix<Double>(rows: lhs.rows, columns: rhs.columns) { buffer in
vDSP.multiply(lhs.data, rhs.data, result: &buffer)
}
return result
}
func * (lhs: Matrix<Double>, rhs: Matrix<Double>) -> Matrix<Double> {
precondition(lhs.columns == rhs.rows, "Number of columns in left matrix must equal number of rows in right matrix")
var a = lhs.data
var b = rhs.data
let m = lhs.rows // number of rows in matrices A and C
let n = rhs.columns // number of columns in matrices B and C
let k = lhs.columns // number of columns in matrix A; number of rows in matrix B
let alpha = 1.0
let beta = 0.0
// matrix multiplication where C ← αAB + βC
let c = Matrix<Double>(rows: lhs.rows, columns: rhs.columns) { buffer in
cblas_dgemm(CblasRowMajor, CblasNoTrans, CblasNoTrans, m, n, k, alpha, &a, k, &b, n, beta, buffer.baseAddress, n)
}
return c
}
I can also define a Matrix structure where the underlying data is an UnsafeMutableBufferPointer. The buffer is handled by the MatrixData class.
struct Matrix<T> {
let rows: Int
let columns: Int
var data: MatrixData<T>
init(rows: Int, columns: Int, fill: T) {
self.rows = rows
self.columns = columns
self.data = MatrixData(count: rows * columns, fill: fill)
}
init(rows: Int, columns: Int) {
self.rows = rows
self.columns = columns
self.data = MatrixData(count: rows * columns)
}
subscript(row: Int, column: Int) -> T {
get { return self.data.buffer[(row * self.columns) + column] }
set { self.data.buffer[(row * self.columns) + column] = newValue }
}
}
class MatrixData<T> {
var buffer: UnsafeMutableBufferPointer<T>
var baseAddress: UnsafeMutablePointer<T> {
get { self.buffer.baseAddress! }
}
init(count: Int, fill: T) {
let start = UnsafeMutablePointer<T>.allocate(capacity: count)
self.buffer = UnsafeMutableBufferPointer(start: start, count: count)
self.buffer.initialize(repeating: fill)
}
init(count: Int) {
let start = UnsafeMutablePointer<T>.allocate(capacity: count)
self.buffer = UnsafeMutableBufferPointer(start: start, count: count)
}
deinit {
self.buffer.deinitialize()
self.buffer.deallocate()
}
}
Multiplication for this approach is implemented by the functions shown here.
import Accelerate
infix operator .*
func .* (lhs: Matrix<Double>, rhs: Matrix<Double>) -> Matrix<Double> {
precondition(lhs.rows == rhs.rows && lhs.columns == rhs.columns, "Matrices must have same dimensions")
let result = Matrix<Double>(rows: lhs.rows, columns: lhs.columns)
vDSP.multiply(lhs.data.buffer, rhs.data.buffer, result: &result.data.buffer)
return result
}
func * (lhs: Matrix<Double>, rhs: Matrix<Double>) -> Matrix<Double> {
precondition(lhs.columns == rhs.rows, "Number of columns in left matrix must equal number of rows in right matrix")
let a = lhs.data.baseAddress
let b = rhs.data.baseAddress
let m = lhs.rows // number of rows in matrices A and C
let n = rhs.columns // number of columns in matrices B and C
let k = lhs.columns // number of columns in matrix A; number of rows in matrix B
let alpha = 1.0
let beta = 0.0
// matrix multiplication where C ← αAB + βC
let c = Matrix<Double>(rows: lhs.rows, columns: rhs.columns)
cblas_dgemm(CblasRowMajor, CblasNoTrans, CblasNoTrans, m, n, k, alpha, a, k, b, n, beta, c.data.baseAddress, n)
return c
}
Both of these approaches give me similar performance. The only difference that I have noticed is the matrix buffer approach allows for reference semantics. For example, the code below uses half the memory with the matrix buffer approach compared to the matrix array approach. This is because b acts as a reference to a using the matrix buffer approach; otherwise, the matrix array approach makes a full copy of a.
let n = 10_000
let a = Matrix<Double>(rows: n, columns: n, fill: 0)
var b = a
b[0, 0] = 99
b[0, 1] = 22
Other than reference semantics, are there any reasons to use one of these approaches over the other?
I am looking for code that computes the Eigenvalues and Eigenvectors using the Accelerate Sparse Matrix library.
In our app we use the following function for inverting a CGImageRef using vImage. The workflow is a obj-c version of the code in the AdjustingTheBrightnessAndContrastOfAnImage sample from Apple:
CGImageRef InvertImage( CGImageRef frameImageRef )
{
CGImageRef resultImage = nil;
CGBitmapInfo imgBitmapInfo = CGImageGetBitmapInfo( frameImageRef );
size_t img_bPC = CGImageGetBitsPerComponent( frameImageRef );
size_t img_bPP = CGImageGetBitsPerPixel( frameImageRef );
vImage_CGImageFormat invIFormat;
invIFormat.bitsPerComponent = img_bPC;
invIFormat.bitsPerPixel = img_bPP;
invIFormat.colorSpace = (img_bPP == 8) ? gDeviceGrayColorSpaceRef : gDeviceRGBColorSpaceRef;
invIFormat.bitmapInfo = imgBitmapInfo;
invIFormat.version = 0;
invIFormat.decode = 0;
invIFormat.renderingIntent = kCGRenderingIntentDefault;
vImage_Buffer sourceVImageBuffer;
vImage_Error viErr = vImageBuffer_InitWithCGImage( &sourceVImageBuffer, &invIFormat, nil, frameImageRef, kvImageNoFlags );
if (viErr == kvImageNoError)
{
vImage_Buffer destinationVImageBuffer;
viErr = vImageBuffer_Init( &destinationVImageBuffer, sourceVImageBuffer.height, sourceVImageBuffer.width, img_bPP, kvImageNoFlags );
if (viErr == kvImageNoError)
{
float linearCoeffs[2] = { -1.0, 1.0 };
float expoCoeffs[3] = { 1.0, 0.0, 0.0 };
float gamma = 0.0;
Pixel_8 boundary = 255;
viErr = vImagePiecewiseGamma_Planar8( &sourceVImageBuffer, &destinationVImageBuffer, expoCoeffs, gamma, linearCoeffs, boundary, kvImageNoFlags );
if (viErr == kvImageNoError)
{
CGImageRef newImgRef = vImageCreateCGImageFromBuffer( &destinationVImageBuffer, &invIFormat, nil, nil, kvImageNoFlags, &viErr );
if (viErr == kvImageNoError)
resultImage = newImgRef;
}
free( destinationVImageBuffer.data );
}
free( sourceVImageBuffer.data );
}
return resultImage;
}
The function works fine for 8-bit monochrome images. When I try it with 24-bit RGB images, although I get no errors from any of the calls, the output shows only the 1/3 of the image inverted as expected.
What am I missing? I suspect I might have to use a different function for 24-bit images (instead of the vImagePiecewiseGamma_Planar8) but I cannot find which one in the headers.
Thanks.
I very much love the performance of AppleArchive and how approachable it is, and believe it to be one of the most underrated frameworks in the SDK. In a scenario quite typical, I need to compress files and submit them to a back end, where the server handling the files is not an Apple platform. Obviously, individual files compressed with AA will not be compatible with other systems out of the box, but there are compatible compression algorithms.
ZLIB is recommended for cases where cross-platform compatibility is necessary. As I understand it, AA adds additional headers to files in order to support preservation of file attributes, ownership and other data. Following the steps outlined in the docs, I've written code to compress single files. I can easily compress and decompress using AA without issue.
To create a proof-of-concept, I've written some code in python using its zlib module. In order to get to the compressed data, it's necessary to handle the AA header fields. The first 64 bytes of a compressed file appear as follows:
AA documentation states that ZLIB Level 5 compression is used, and comes in the form of raw DEFLATE data prefixed with two header bytes. In this case, these bytes are 78 5e, which begin at the 28th byte and appear as x^ above. My hope was that seeking to the start of the compressed data, then passing what remains to a decompressor object initialized with the correct WBITS would work.
It works fantastically for files 1MB or less in size. Files which are larger only decompress the first megabyte. The decompressor object is reaching EOF, and I've tried various ways of attempting to seek to and concatenate the other blocks, but to no avail.
Using the older Compression framework and the method specified here, with the same algorithm, yields different results. I can decompress files of any size using python's zlib module. My assumption is that AppleArchive is doing something differently in order to support its multithreading capabilities, perhaps even with asymmetric encoding where the blocks are not ordered.
Is there a solution to this problem? If not, why would one ever use ZLIB versus the much more efficient LZFSE? I could use the older Compression API, but it is significantly slower compressing synchronously, and performance is critical with the application I am adding this feature to.
so, my app needs to find the dominant palette and the position in the image of the k-most dominant colors. I followed the very useful sample project from the vImage documentation
https://developer.apple.com/documentation/accelerate/bnns/calculating_the_dominant_colors_in_an_image
and the algorithm works fine although I can't wrap my head around how should I go on about and linking said colors with a point in the image. Since the algorithm works by filling storages first, I tried also filling an array of CGPoints called LocationStorage and working with that
//filling the array
for i in 0...width {
for j in 0...height {
locationStorage.append(
CGPoint(x: i, y: j))
}
.
.
.
//working with the array
let randomIndex = Int.random(in: 0 ..< width * height)
centroids.append(Centroid(red: redStorage[randomIndex],
green: greenStorage[randomIndex],
blue: blueStorage[randomIndex],
position: locationStorage[randomIndex]))
}
struct Centroid {
/// The red channel value.
var red: Float
/// The green channel value.
var green: Float
/// The blue channel value.
var blue: Float
/// The number of pixels assigned to this cluster center.
var pixelCount: Int = 0
var position: CGPoint = CGPointZero
init(red: Float, green: Float, blue: Float, position: CGPoint) {
self.red = red
self.green = green
self.blue = blue
self.position = position
}
}
although it's not accurate.
I also tried force trying every pixel in the image to get as close to each color but I think it's too slow.
What do you think my approach should be?
Let me know if you need additional info
Please be kind I'm learning Swift.
Hi!
We're trying to calculate the delay between two audio inputs, represented by float arrays, by getting their maximum correlation, using vDSP_conv. Our solution is very similar to the one in the first answer here, only we are looking at a 0..5000 radius to find the delay in ms:
https://stackoverflow.com/questions/65571299/swift-read-two-audio-files-and-calculate-their-cross-correlation
The problem is that we had mixed results, sometimes the calculated delay is ok, but other times it isn't. Our best guess is that there is some overflow error happening, since the arrays we're working with can be pretty large (they can have around 4-5 million values). If we use a simple foreach to calculate these correlations we get good results, but obviously the is quite slow. Did anyone have similar problems?
Hi.
I want to implement the code below using vDSP.
for i in a.indices {
a[i] = n[i] == 0.0 ? 0.0 : b[i] / n[i]
}
This code is slow.
Are there any good implementation using Accelerate framework?
I am asking this more in hope than expectation, but would greatly appreciate any help or suggestions (with apologies for a rather lengthy post). The problem I have with my existing OpenCL code is, quite simply, that I am unable to get it to build in Xcode (I have always used Xcode without problems in the past). So my question, quite simply, is:
Can anyone advise how to configure and use Xcode in order to successfully build OpenCL code for Apple Silicon?
Background:
Having just received a shiny new M3 MacBook Pro, I would really like to try out one or two of my GPU programs. They were all written several years using OpenCL, before Apple decided to give up on it in favour of Metal. (In fact, I have since converted one of them to use CUDA, but that is not useful here.) Now, I completely understand that the right thing to do is to convert them to use Metal directly, and will do this when I have time, but I suspect that it will take me several days, if not weeks (I have never had reason to use Metal until now, so I will also have to learn how to convert my code; there are quite a few kernels). I don’t have time to do that at the moment. Meanwhile, I would very much like to try the programs right now, using OpenCL, simply to find out how they run on Apple Silicon (I have previously only used them on older, Intel Macs with AMD GPUs). It would be great to see my code running on the M3’s GPU!
The reasons I think this must still be possible are (a) there are plenty of Geekbench OpenCL results for the M3 chips; and (b) I have managed to compile and run a really trivial OpenCL program (but only using clang from the command line; I have been unable to work out how to compile individual .cl files containing OpenCL kernels).
The problem I am getting is that, having cloned one of my sets of programs into Xcode on my new M3 Mac, I am unable to get any of the kernels even to build. The failure I’m getting is that Xcode is trying to run a version of openclc in the directory /System/Library/Frameworks/OpenCL.framework/Libraries/, which gives the error condition Bad CPU type in executable when Xcode tries to use it. It seems that this is an x86_64 version of openclc. There is a universal binary version in /System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/Libraries/, but I have been unable to find a way to configure (or force ….) Xcode to use that one.
It may well be, of course, that if I manage to get past this problem, another one will present itself. Nonetheless, if any of you can suggest anything that I might try, I would be most grateful.
One secondary question, if I may:
Using openclc to compile a .cl file (containing a kernel) from the command line, is there a parameter (e.g. a value to specify with -arch) or combination of parameters that will cause it to produce a .bc file for an Apple Silicon GPU and also the .cl.h header file that has to be #included in the C or C++ code that will dispatch the kernel?
Thanks ….
Andrew
PS. I’ve also posted this question on MacRumors, because there seem to be quite a number of people there who understand Apple Silicon, but I rather suspect there’s a better chance of getting getting the help I need here ….
I Instrument's CPU Profiling tool I've noticed that a significant portion (22.5%) of the CPU-side overhead related to MPS matrix multiplication (GEMM) is in a call to getenv(). Please see attached screenshot.
It seems unnecessary to perform this same check over and over, as whatever hack that needs this should be able to perform the getenv() only once and cache the result for future use.
Hello
I'm using functions from the Accelerate framework in my app as mentioned in this developer documentation:
https://developer.apple.com/documentation/accelerate/solving_systems_of_linear_equations_with_lapack
I've built the app and tested it and I get no errors, but when I try to upload to the App Store Connect I get the error:
he app references non-public symbols in Payload/***.app/Frameworks/Matrix.framework/Matrix: _dgeev$NEWLAPACK$ILP64, _dposv$NEWLAPACK$ILP64, _dsyev$NEWLAPACK$ILP64, _dsysv$NEWLAPACK$ILP64
Please advise on how to resolve this issue.
Thank you.