I am currently facing a performance issue while using CoreML on iOS 16+ devices to run a simple grid_sample model. When profiling the model using xcode Profiler, I noticed that before each NPU computation, there is a significant delay caused by the "input copy" and "neural engine-data copy" operations.I have specified that both the input and output of the model are of type float16, there shouldn't be any data type convert.
I would appreciate any insights or suggestions regarding the reasons behind this delay and possible solutions
My simple model is
class GridSample(torch.nn.Module):
def __init__(
self,
):
super().__init__()
def forward(self, input: torch.Tensor, grid: torch.Tensor) -> torch.Tensor:
output = F.grid_sample(
input, grid.to(input), mode='nearest', padding_mode='zeros', align_corners=True,
)
return output
tr_input = torch.randn((8, 64, 512, 512)
tr_grid = torch.randn((8, 256, 256, 2)
simple_model = GridSample()
simple_model.eval()
traced_model = torch.jit.trace(simple_model, [tr_input, tr_grid])
coreml_input = [coremltools.TensorType(name="image_input", shape=tr_input.shape, dtype=np.float16), coremltools.TensorType(name="warp_grid", shape=tr_grid.shape, dtype=np.float16)]
mlmodel = coremltools.converters.convert(traced_model, inputs=coreml_input,
convert_to="mlprogram",
minimum_deployment_target=coremltools.target.iOS16,
compute_units=coremltools.ComputeUnit.ALL,
compute_precision = coremltools.precision.FLOAT16,
outputs=[ct.TensorType(name="x0", dtype=np.float16)],
debug=False)
mlmodel.save("./grid_sample.mlpackage")
os.system(f"xcrun coremlcompiler compile './grid_sample.mlpackage' './')
General
RSS for tagExplore the power of machine learning within apps. Discuss integrating machine learning features, share best practices, and explore the possibilities for your app.
Post
Replies
Boosts
Views
Activity
On an Apple M1 with Ventura 13.6.
I followed the steps on the Get started with tensorflow-metal page here:
https://developer.apple.com/metal/tensorflow-plugin/
python3 -m venv ~/venv-metal
source ~/venv-metal/bin/activate
python -m pip install -U pip
python -m pip install tensorflow
python -m pip install tensorflow-metal
With a clean start I also tried a pinning
python -m pip install tensorflow==2.13.0
Where Successfully installed tensorflow-metal-1.0.0
The table here suggested this should work.
https://pypi.org/project/tensorflow-metal/
But I got the same error...
Running Python code without the tensorflow import was not a problem. I found forums with similar error on Mac 1 but none of the proposed solution worked.
Is there suggested steps to get the `get started tutorial working?
Is there a way to extract the list of words recognized by the Speech framework?
I'm trying to filter out words that won't appear in the transcription output, but to do that I'll need a list of words that can appear. SFSpeechLanguageModel.Configuration can be initialized with a vocabulary, but there doesn't seem to be a way to read it, and while there are ways to create custom vocabularies, I have yet to find a way to retrieve it.
I added the Natural Language tag in case the framework might contribute to a solution
Hello! I'm implementing cropping an object from an image mechanism.
@MainActor static func detectObjectOnImage(image: UIImage) async throws -> UIImage {
let analyser = ImageAnalyzer()
let interaction = ImageAnalysisInteraction()
let configuration = ImageAnalyzer.Configuration([.visualLookUp])
let analysis = try await analyser.analyze(image, configuration: configuration)
interaction.analysis = analysis
return try await interaction.image(for: interaction.subjects)
}
My app supports iOS 16 and a compiler doesn't complain about the code.However when I run it on simulator with iOS 16, I'm getting "symbol not found" error on the app launch. Does anybody know what can be the issue?
After training my dataset, the training, validation, and testing sets all show 0% in detection accuracy and all my test photos show false negative. The dataset has 1032 photos and 2 classes, and I used Roboflow for the image annotation. For network, I choose full network. If there is any way to fix this?
Kia ora,
Been having heaps of trouble recently trying to get TensorFlow working, it just suddenly stopped and the kernel would just crash every time I try to import tf.
I've tried just about everything eg. fresh install of python, reinstalling Xcode dev tools
Below is the relevant lines of pip freeze, using python 1.10.13 btw
tensorboard==2.15.1
tensorboard-data-server==0.7.2
tensorboard-plugin-wit==1.8.1
tensorflow==2.15.0
tensorflow-estimator==2.15.0
tensorflow-io-gcs-filesystem==0.34.0
tensorflow-macos==2.15.0
tensorflow-metal==0.5.0
Below is the cell in question that is killing the kernal
import tensorflow as tf import matplotlib.pyplot as plt
import tensorflow_datasets as tfds
from tensorflow.keras.layers import Conv2D, MaxPool2D, Dense, Flatten, InputLayer, BatchNormalization, Dropout
from tensorflow.keras.losses import BinaryCrossentropy
from tensorflow.keras.optimizers.legacy import Adam
I'll be around all day so if you have anything that can help, I'll be sure to give it a go as soon as you post it and get back to you!
Looking forward to your replies.
Nga mihi,
Kane
WWDC22 video "Explore the machine learning development experience" provides Python code for an interesting application (real-time ML image colorization), but doesn't provide the complete Xcode project, and assumes viewer knows how to do Python in Xcode (haven't heard of such in 10 years of iOS development!).
Any pointers to either the video's example Xcode project, or how to create a suitable Xcode project capable of running Python code?
I did a clean install of Python (v. 3.10), then Tensorflow & Tensorflow-Metal following exactly the process stated in Apple's plugin support page. Now, every time I run ANY python code with Tensorflow it crashes in the model.fit instruction. It does not matter what I feed into it, even code that used to run perfectly on my previous MacBook (Intel)... I've researched ad-vomitum for answers but Apple washes it's hands stating that is Tensorflow and Tensorflow does the same. Fact is that exactly the same code runs flawlessly on my Windows NVIDIA PC setup.
I purchased the m3 laptop with the hope of having the possibility to train my neural networks "on the go"... now I lost $5,000 usd, I can't make it work, and is a total disaster.
I am extremely competent in Python development and have been developing neural networks for years. So if you are going to comment, please avoid suggestions like "check your Python version" etc. - This is DEFINITIVELY due to the m3 Mac. Exact same setup is working OK on an M1-Ultra Mac Studio. It is just not portable...
Does anyone have any specific advice on how to make a proper setup of Tensorflow for the Mac M3??
I have a neural network that should run on my device with 3 different input shapes. When converting it to mlmodel or mlpackage files with fixed input size it runs on ANE.
But when converted it with EnumeratedShape it runs only on CPU.
Why?
I think that the problematic layer is the slice (which converted in the flexible model to SliceStatic), but don't understand why and if there is any way to solve it and run the Enumerated model on ANE.
Here is my code
class TestModel(torch.nn.Module):
def __init__(self):
super(TestModel, self).__init__()
self.dw1 = torch.nn.Conv2d(in_channels=641, out_channels=641, kernel_size=(5,4), groups=641)
self.pw1 = torch.nn.Conv2d(in_channels=641, out_channels=512, kernel_size=(1,1))
self.relu = torch.nn.ReLU()
self.pw2 = torch.nn.Conv2d(in_channels=512, out_channels=641, kernel_size=(1,1))
self.dw2 = torch.nn.Conv2d(in_channels=641, out_channels=641, kernel_size=(5,1), groups=641)
self.pw3 = torch.nn.Conv2d(in_channels=641, out_channels=512, kernel_size=(1,1))
self.block1_dw = torch.nn.Conv2d(in_channels=512, out_channels=512, kernel_size=(5,1), groups=512)
self.block1_pw = torch.nn.Conv2d(in_channels=512, out_channels=512, kernel_size=(1,1))
def forward(self, inputs):
x = self.dw1(inputs)
x = self.pw1(x)
x = self.relu(x)
x = self.pw2(x)
x = self.dw2(x)
x = self.pw3(x)
x = self.relu(x)
y = self.block1_dw(x)
y = self.block1_pw(y)
y = self.relu(y)
z = x[:,:,4:,:] + y
return z
ex_input = torch.rand(1, 641, 44, 4)
traced_model = torch.jit.trace(TestModel().eval(), [ex_input,])
ct_enum_inputs = [ct.TensorType(name='inputs', shape=enum_shape)]
ct_outputs = [ct.TensorType(name='out')]
mlmodel_enum = ct.convert(traced_model, inputs=ct_enum_inputs, outputs=ct_outputs, convert_to="neuralnetwork")
mlmodel.save(...)
Thanks.
Running grouped convolutions on an M2 with the metal plugin I get an error. Example code:
Using TF2.11 and no metal plugin I get
import tensorflow as tf
tf.keras.layers.Conv1D(5,1,padding="same", kernel_initializer="ones", groups=5)(tf.ones((1,1,5)))
# displays
<tf.Tensor: shape=(1, 1, 5), dtype=float32, numpy=array([[[1., 1., 1., 1., 1.]]], dtype=float32)>
On TF2.14 with the plugin I received
import tensorflow as tf
tf.keras.layers.Conv1D(5,1,padding="same", kernel_initializer="ones", groups=5)(tf.ones((1,1,5)))
# displays
...
NotFoundError: Exception encountered when calling layer 'conv1d_3' (type Conv1D).
could not find registered platform with id: 0x104d8f6f0 [Op:__inference__jit_compiled_convolution_op_78]
Call arguments received by layer 'conv1d_3' (type Conv1D):
• inputs=tf.Tensor(shape=(1, 1, 5), dtype=float32)
could not find registered platform with id
Hy,
I'm French developer and I downloaded the Recognizing Speech in live Audio sample code from Developer Apple website. I tried to execute data generator command after changing the local identifier from 'en_US' to 'fr' in data generator main file , but when I ran the command in Xcode, I had this error message : " Identifier 'fr' does not parse into two elements."
I checked the xml files associated to the bin archive file and the identifiers are no correct (they keep 'en-US' value).
Thanks for your help !
I created a new environment on Conda and then installed TensorFlow using the command "pip install TensorFlow" on my Mac M1 Pro machine.
But TensorFlow is not working.
I'm currently building an iOS app that requires the ability to detect a person's height with a live video stream. The new VNDetectHumanBodyPose3DRequest is exactly what I need but the observations I'm getting back are very inconsistent and unreliable. When I say inconsistent, I mean the values never seem to settle and they can fluctuate anywhere from 5 '4" to 10'1" (I'm about 6'0"). In terms of unreliable, I have once seen a value that closely matches my height but I rarely see any values that are close enough (within an inch) of the ground truth.
In terms of my code, I'm not doing any fancy. I'm first opening a LiDAR stream on my iPhone Pro 14:
guard let videoDevice = AVCaptureDevice.default(.builtInLiDARDepthCamera, for: .video, position: .back) else { return }
guard let videoDeviceInput = try? AVCaptureDeviceInput(device: videoDevice) else { return }
guard captureSession.canAddInput(videoDeviceInput) else { return }
captureSession.addInput(videoDeviceInput)
I'm then creating an output synchronizer so I can get both image and depth data at the same time:
videoDataOutput = AVCaptureVideoDataOutput()
captureSession.addOutput(videoDataOutput)
depthDataOutput = AVCaptureDepthDataOutput()
depthDataOutput.isFilteringEnabled = true
captureSession.addOutput(depthDataOutput)
outputVideoSync = AVCaptureDataOutputSynchronizer(dataOutputs: [depthDataOutput, videoDataOutput])
Finally, my delegate function that handles the synchronizer is roughly:
fileprivate func perform3DPoseRequest(cmSampleBuffer: CMSampleBuffer, depthData: AVDepthData) {
let imageRequestHandler = VNImageRequestHandler(cmSampleBuffer: cmSampleBuffer, depthData: depthData, orientation: .up)
let request = VNDetectHumanBodyPose3DRequest()
do {
// Perform the body pose request.
try imageRequestHandler.perform([request])
if let observation = request.results?.first {
if (observation.heightEstimation == .measured) {
print("Body height (ft) \(formatter.string(fromMeters: Double(observation.bodyHeight))) (m): \(observation.bodyHeight)")
...
I'd appreciate any help determining how to get accurate results from the observation's bodyHeight. Thanks!
I am using NLTagger to tag lexical classes of words, but it suddenly just stopped working. I boiled my code down to the most basic version, but it's never executing the closure of the enumerateTags() function. What do I have to change or what should I try?
for e in sentenceArray {
let cupcake = "I like you, have a cupcake"
tagger.string = cupcake
tagger.enumerateTags(in: cupcake.startIndex..<cupcake.endIndex, unit: .word, scheme: .nameTypeOrLexicalClass) { tag, range in
print("TAG")
return true
}
Tensorflow-Metal training got an increasing loss in CNN.
But same codes run correctly after pip uninstall tensorflow-metal
I see a lot of crashes on iOS 17 beta regarding some problem of "Text To Speech". Does anybody has a clue why TTS crashes? Anybody else seeing the same problem?
Exception Type: EXC_BAD_ACCESS (SIGSEGV)
Exception Subtype: KERN_INVALID_ADDRESS at 0x000000037f729380
Exception Codes: 0x0000000000000001, 0x000000037f729380
VM Region Info: 0x37f729380 is not in any region. Bytes after previous region: 3748828033 Bytes before following region: 52622617728
REGION TYPE START - END [ VSIZE] PRT/MAX SHRMOD REGION DETAIL
MALLOC_NANO 280000000-2a0000000 [512.0M] rw-/rwx SM=PRV
---> GAP OF 0xd20000000 BYTES
commpage (reserved) fc0000000-1000000000 [ 1.0G] ---/--- SM=NUL ...(unallocated)
Termination Reason: SIGNAL 11 Segmentation fault: 11
Terminating Process: exc handler [36389]
Triggered by Thread: 9
.....
Thread 9 name:
Thread 9 Crashed:
0 libobjc.A.dylib 0x000000019eeff248 objc_retain_x8 + 16
1 AudioToolboxCore 0x00000001b2da9d80 auoop::RenderPipeUser::~RenderPipeUser() + 112 (AUOOPRenderPipePool.mm:400)
2 AudioToolboxCore 0x00000001b2e110b4 -[AUAudioUnit_XPC internalDeallocateRenderResources] + 92 (AUAudioUnit_XPC.mm:904)
3 AVFAudio 0x00000001bfa4cc04 AUInterfaceBaseV3::Uninitialize() + 60 (AUInterface.mm:524)
4 AVFAudio 0x00000001bfa894bc AVAudioEngineGraph::PerformCommand(AUGraphNodeBaseV3&, AVAudioEngineGraph::ENodeCommand, void*, unsigned int) const + 772 (AVAudioEngineGraph.mm:3317)
5 AVFAudio 0x00000001bfa93550 AVAudioEngineGraph::_Uninitialize(NSError**) + 132 (AVAudioEngineGraph.mm:1469)
6 AVFAudio 0x00000001bfa4b50c AVAudioEngineImpl::Stop(NSError**) + 396 (AVAudioEngine.mm:1081)
7 AVFAudio 0x00000001bfa4b094 -[AVAudioEngine stop] + 48 (AVAudioEngine.mm:193)
8 TextToSpeech 0x00000001c70b3c5c __55-[TTSSynthesisProviderAudioEngine renderSpeechRequest:]_block_invoke + 1756 (TTSSynthesisProviderAudioEngine.m:613)
9 libdispatch.dylib 0x00000001ae4b0740 _dispatch_call_block_and_release + 32 (init.c:1519)
10 libdispatch.dylib 0x00000001ae4b2378 _dispatch_client_callout + 20 (object.m:560)
11 libdispatch.dylib 0x00000001ae4b990c _dispatch_lane_serial_drain + 748 (queue.c:3885)
12 libdispatch.dylib 0x00000001ae4ba470 _dispatch_lane_invoke + 432 (queue.c:3976)
13 libdispatch.dylib 0x00000001ae4c5074 _dispatch_root_queue_drain_deferred_wlh + 288 (queue.c:6913)
14 libdispatch.dylib 0x00000001ae4c48e8 _dispatch_workloop_worker_thread + 404 (queue.c:6507)
...
Thread 9 crashed with ARM Thread State (64-bit):
x0: 0x0000000283309360 x1: 0x0000000000000000 x2: 0x0000000000000000 x3: 0x00000002833093c0
x4: 0x00000002833093c0 x5: 0x0000000101737740 x6: 0x0000000000000013 x7: 0x00000000ffffffff
x8: 0x0000000283309360 x9: 0x3c788942d067009a x10: 0x0000000101547000 x11: 0x0000000000000000
x12: 0x00000000000007fb x13: 0x00000000000007fd x14: 0x000000001ee24020 x15: 0x0000000000000020
x16: 0x0000b1037f729360 x17: 0x000000037f729360 x18: 0x0000000000000000 x19: 0x0000000000000000
x20: 0x00000001016a8de8 x21: 0x0000000283e21d00 x22: 0x0000000283b3f1f8 x23: 0x0000000283098000
x24: 0x00000001bfb4fc35 x25: 0x00000001bfb4fc43 x26: 0x000000028033a688 x27: 0x0000000280c93090
x28: 0x0000000000000000 fp: 0x000000016fc86490 lr: 0x00000001b2da9d80
sp: 0x000000016fc863e0 pc: 0x000000019eeff248 cpsr: 0x1000
esr: 0x92000006 (Data Abort) byte read Translation fault
I'm going to the U.S. to buy a vision pro, does anyone have any information about where they sell it? Will it be sold in Hawaii by any chance? For now, I'm thinking about New York.
Hi,
there seems to be a difference in behavior when running inference on a trained Keras model using the model __call__ method vs. using the predict or predict_on_batch methods. This only happens when using the GPU for inference and it seems that for certain sequence of operations and float types the 'relu' activation doesn't work as expected and seems to do nothing.
I can replicate the problem with the following code (it would only fail with 'relu' activation and tf.float16 and tf.float32 types, while it works fine with tf.float64).
import tensorflow as tf
import numpy as np
DATA_LENGTH = 16
DENSE_WIDTH = 16
BATCH_SIZE = 8
DTYPE = tf.float32
ACTIVATION = 'relu'
def TestModel():
inputs = tf.keras.Input(DATA_LENGTH, dtype=DTYPE)
u = tf.keras.layers.Dense(DENSE_WIDTH, activation=ACTIVATION, dtype=DTYPE)(inputs)
# u = tf.maximum(u, 0.0)
output = u*tf.constant(1.0, dtype=DTYPE)
model = tf.keras.Model(inputs, output, name="TestModel")
return model
model = TestModel()
model.compile()
x = np.random.uniform(size=(BATCH_SIZE, DATA_LENGTH)).astype(DTYPE.as_numpy_dtype)
with tf.device('/GPU:0'):
out_gpu_call = model(x, training=False)
out_gpu_predict = model.predict_on_batch(x)
with tf.device('/CPU:0'):
out_cpu_call = model(x, training=False)
out_cpu_predict= model.predict_on_batch(x)
print(f'\nDTYPE {DTYPE}, ACTIVATION: {ACTIVATION}')
print("\tMean Abs. Difference GPU (__call__ vs. predict):", np.mean(np.abs(out_gpu_call - out_gpu_predict)))
print("\tMean Abs. Difference CPU (__call__ vs. predict):", np.mean(np.abs(out_cpu_call - out_cpu_predict)))
print("\tMean Abs. Difference GPU-CPU __call__:", np.mean(np.abs(out_gpu_call - out_cpu_call)))
print("\tMean Abs. Difference GPU-CPU predict():", np.mean(np.abs(out_gpu_predict - out_cpu_predict)))
The code above produces for example the following output:
DTYPE <dtype: 'float32'>, ACTIVATION: relu
Mean Abs. Difference GPU (__call__ vs. predict): 0.1955472
Mean Abs. Difference CPU (__call__ vs. predict): 0.0
Mean Abs. Difference GPU-CPU __call__: 1.3573299e-08
Mean Abs. Difference GPU-CPU predict(): 0.1955472
And the results for the GPU are:
out_gpu_call
<tf.Tensor: shape=(8, 16), dtype=float32, numpy=
array([[0.1496982 , 0. , 0. , 0.73772687, 0.26131183,
0.27757105, 0. , 0. , 0. , 0. ,
0. , 0.4164225 , 1.0367445 , 0. , 0.5860609 ,
0. ], ...
out_gpu_predict
array([[ 1.49698198e-01, -3.48425686e-01, -2.44667321e-01,
7.37726867e-01, 2.61311829e-01, 2.77571052e-01,
-2.26729304e-01, -1.06500387e-01, -3.66294265e-01,
-2.93850392e-01, -4.51043218e-01, 4.16422486e-01,
1.03674448e+00, -1.39347658e-01, 5.86060882e-01,
-2.05334812e-01], ...
Upon inspection of the results it seems that the problem is that the 'relu' activation is not setting the values < 0 to 0 when calling predict_on_batch.
When uncommenting the # u = tf.maximum(u, 0.0) line after the Dense layer there is no difference between the two calls (as should be expected).
It also happens that removing the multiplication by a constant after the Dense layer, output = u*tf.constant(1.0, dtype=DTYPE) makes the problem dissappear (even when leaving the # u = tf.maximum(u, 0.0) line commented).
This is running with the following setup:
MacBook Pro, Apple M2 Max chip, macOS Sonoma 14.2
tf version 2.15.0
tensorflow-metal 1.1.0
Python 3.10.13
I am working on a design that requires connecting an ios device to two audio output devices specifically headphones and a speaker. I want the audio driver to switch output device without user action. Is this manageable via ios SDK?
Hello
I use Mac Pro M2 16GB
This is my code. It is very basic code.
`model = Sequential()
model.add(LSTM(units=50, input_shape=(X_train.shape[1], X_train.shape[2])))
model.add(Dense(units=1))
model.compile(optimizer='adam', loss='mse')
model.fit(X_train, y_train, epochs=50, batch_size=16)
train_predict = model.predict(X_train)
test_predict = model.predict(X_test)
train_predict = scaler.inverse_transform(train_predict)
y_train = scaler.inverse_transform(y_train)
test_predict = scaler.inverse_transform(test_predict)
y_test = scaler.inverse_transform(y_test)
When I try to execute this code, anaconda gives the following error
I metal_plugin/src/device/metal_device.cc:1154] Metal device set to: Apple M2
I metal_plugin/src/device/metal_device.cc:296] systemMemory: 16.00 GB
I metal_plugin/src/device/metal_device.cc:313] maxCacheSize: 5.33 GB
I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:306] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:272] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: )
I can't find any solution, could you help me
Thank you