Apple Developer Forums

why there's nerual engine-data copy in coreml npu prediction

I am currently facing a performance issue while using CoreML on iOS 16+ devices to run a simple grid_sample model. When profiling the model using xcode Profiler, I noticed that before each NPU computation, there is a significant delay caused by the "input copy" and "neural engine-data copy" operations.I have specified that both the input and output of the model are of type float16, there shouldn't be any data type convert. I would appreciate any insights or suggestions regarding the reasons behind this delay and possible solutions My simple model is class GridSample(torch.nn.Module): def __init__( self, ): super().__init__() def forward(self, input: torch.Tensor, grid: torch.Tensor) -> torch.Tensor: output = F.grid_sample( input, grid.to(input), mode='nearest', padding_mode='zeros', align_corners=True, ) return output tr_input = torch.randn((8, 64, 512, 512) tr_grid = torch.randn((8, 256, 256, 2) simple_model = GridSample() simple_model.eval() traced_model = torch.jit.trace(simple_model, [tr_input, tr_grid]) coreml_input = [coremltools.TensorType(name="image_input", shape=tr_input.shape, dtype=np.float16), coremltools.TensorType(name="warp_grid", shape=tr_grid.shape, dtype=np.float16)] mlmodel = coremltools.converters.convert(traced_model, inputs=coreml_input, convert_to="mlprogram", minimum_deployment_target=coremltools.target.iOS16, compute_units=coremltools.ComputeUnit.ALL, compute_precision = coremltools.precision.FLOAT16, outputs=[ct.TensorType(name="x0", dtype=np.float16)], debug=False) mlmodel.save("./grid_sample.mlpackage") os.system(f"xcrun coremlcompiler compile './grid_sample.mlpackage' './')

Machine Learning & AI General ML Compute Core ML

0

731

Feb ’24

zsh: illegal hardware instruction on following tensorflow-metal tutorial

On an Apple M1 with Ventura 13.6. I followed the steps on the Get started with tensorflow-metal page here: https://developer.apple.com/metal/tensorflow-plugin/ python3 -m venv ~/venv-metal source ~/venv-metal/bin/activate python -m pip install -U pip python -m pip install tensorflow python -m pip install tensorflow-metal With a clean start I also tried a pinning python -m pip install tensorflow==2.13.0 Where Successfully installed tensorflow-metal-1.0.0 The table here suggested this should work. https://pypi.org/project/tensorflow-metal/ But I got the same error... Running Python code without the tensorflow import was not a problem. I found forums with similar error on Mac 1 but none of the proposed solution worked. Is there suggested steps to get the `get started tutorial working?

Machine Learning & AI General tensorflow-metal

0

1.6k

Feb ’24

Getting a list of words recognized by Speech

Is there a way to extract the list of words recognized by the Speech framework? I'm trying to filter out words that won't appear in the transcription output, but to do that I'll need a list of words that can appear. SFSpeechLanguageModel.Configuration can be initialized with a vocabulary, but there doesn't seem to be a way to read it, and while there are ways to create custom vocabularies, I have yet to find a way to retrieve it. I added the Natural Language tag in case the framework might contribute to a solution

Machine Learning & AI General Speech Natural Language

0

704

Feb ’24

Symbol Not found - VisionKit

Hello! I'm implementing cropping an object from an image mechanism. @MainActor static func detectObjectOnImage(image: UIImage) async throws -> UIImage { let analyser = ImageAnalyzer() let interaction = ImageAnalysisInteraction() let configuration = ImageAnalyzer.Configuration([.visualLookUp]) let analysis = try await analyser.analyze(image, configuration: configuration) interaction.analysis = analysis return try await interaction.image(for: interaction.subjects) } My app supports iOS 16 and a compiler doesn't complain about the code.However when I run it on simulator with iOS 16, I'm getting "symbol not found" error on the app launch. Does anybody know what can be the issue?

Machine Learning & AI General VisionKit

2

0

579

Jan ’24

CreateML producing 0% accuracy in testing, Help needed!!

After training my dataset, the training, validation, and testing sets all show 0% in detection accuracy and all my test photos show false negative. The dataset has 1032 photos and 2 classes, and I used Roboflow for the image annotation. For network, I choose full network. If there is any way to fix this?

Machine Learning & AI General Machine Learning Create ML

0

1

608

Feb ’24

TensorFlow on M1, kernel crashing in notebooks

Kia ora, Been having heaps of trouble recently trying to get TensorFlow working, it just suddenly stopped and the kernel would just crash every time I try to import tf. I've tried just about everything eg. fresh install of python, reinstalling Xcode dev tools Below is the relevant lines of pip freeze, using python 1.10.13 btw tensorboard==2.15.1 tensorboard-data-server==0.7.2 tensorboard-plugin-wit==1.8.1 tensorflow==2.15.0 tensorflow-estimator==2.15.0 tensorflow-io-gcs-filesystem==0.34.0 tensorflow-macos==2.15.0 tensorflow-metal==0.5.0 Below is the cell in question that is killing the kernal import tensorflow as tf import matplotlib.pyplot as plt import tensorflow_datasets as tfds from tensorflow.keras.layers import Conv2D, MaxPool2D, Dense, Flatten, InputLayer, BatchNormalization, Dropout from tensorflow.keras.losses import BinaryCrossentropy from tensorflow.keras.optimizers.legacy import Adam I'll be around all day so if you have anything that can help, I'll be sure to give it a go as soon as you post it and get back to you! Looking forward to your replies. Nga mihi, Kane

Machine Learning & AI General tensorflow-metal

0

1

597

Jan ’24

Python Xcode project package for ML demo video wwdc2022-10017?

WWDC22 video "Explore the machine learning development experience" provides Python code for an interesting application (real-time ML image colorization), but doesn't provide the complete Xcode project, and assumes viewer knows how to do Python in Xcode (haven't heard of such in 10 years of iOS development!). Any pointers to either the video's example Xcode project, or how to create a suitable Xcode project capable of running Python code?

Machine Learning & AI General Machine Learning

0

519

Jan ’24

Tensorflow crashes every time I fit a model

I did a clean install of Python (v. 3.10), then Tensorflow & Tensorflow-Metal following exactly the process stated in Apple's plugin support page. Now, every time I run ANY python code with Tensorflow it crashes in the model.fit instruction. It does not matter what I feed into it, even code that used to run perfectly on my previous MacBook (Intel)... I've researched ad-vomitum for answers but Apple washes it's hands stating that is Tensorflow and Tensorflow does the same. Fact is that exactly the same code runs flawlessly on my Windows NVIDIA PC setup. I purchased the m3 laptop with the hope of having the possibility to train my neural networks "on the go"... now I lost $5,000 usd, I can't make it work, and is a total disaster. I am extremely competent in Python development and have been developing neural networks for years. So if you are going to comment, please avoid suggestions like "check your Python version" etc. - This is DEFINITIVELY due to the m3 Mac. Exact same setup is working OK on an M1-Ultra Mac Studio. It is just not portable... Does anyone have any specific advice on how to make a proper setup of Tensorflow for the Mac M3??

Machine Learning & AI General tensorflow-metal

1

0

655

Jan ’24

Cannot run CoreML model with Enumerated shape on ANE

I have a neural network that should run on my device with 3 different input shapes. When converting it to mlmodel or mlpackage files with fixed input size it runs on ANE. But when converted it with EnumeratedShape it runs only on CPU. Why? I think that the problematic layer is the slice (which converted in the flexible model to SliceStatic), but don't understand why and if there is any way to solve it and run the Enumerated model on ANE. Here is my code class TestModel(torch.nn.Module): def __init__(self): super(TestModel, self).__init__() self.dw1 = torch.nn.Conv2d(in_channels=641, out_channels=641, kernel_size=(5,4), groups=641) self.pw1 = torch.nn.Conv2d(in_channels=641, out_channels=512, kernel_size=(1,1)) self.relu = torch.nn.ReLU() self.pw2 = torch.nn.Conv2d(in_channels=512, out_channels=641, kernel_size=(1,1)) self.dw2 = torch.nn.Conv2d(in_channels=641, out_channels=641, kernel_size=(5,1), groups=641) self.pw3 = torch.nn.Conv2d(in_channels=641, out_channels=512, kernel_size=(1,1)) self.block1_dw = torch.nn.Conv2d(in_channels=512, out_channels=512, kernel_size=(5,1), groups=512) self.block1_pw = torch.nn.Conv2d(in_channels=512, out_channels=512, kernel_size=(1,1)) def forward(self, inputs): x = self.dw1(inputs) x = self.pw1(x) x = self.relu(x) x = self.pw2(x) x = self.dw2(x) x = self.pw3(x) x = self.relu(x) y = self.block1_dw(x) y = self.block1_pw(y) y = self.relu(y) z = x[:,:,4:,:] + y return z ex_input = torch.rand(1, 641, 44, 4) traced_model = torch.jit.trace(TestModel().eval(), [ex_input,]) ct_enum_inputs = [ct.TensorType(name='inputs', shape=enum_shape)] ct_outputs = [ct.TensorType(name='out')] mlmodel_enum = ct.convert(traced_model, inputs=ct_enum_inputs, outputs=ct_outputs, convert_to="neuralnetwork") mlmodel.save(...) Thanks.

Machine Learning & AI General ML Compute

0

619

Jan ’24

tensorflow-metal plugin problem with grouped convolutions

Running grouped convolutions on an M2 with the metal plugin I get an error. Example code: Using TF2.11 and no metal plugin I get import tensorflow as tf tf.keras.layers.Conv1D(5,1,padding="same", kernel_initializer="ones", groups=5)(tf.ones((1,1,5))) # displays <tf.Tensor: shape=(1, 1, 5), dtype=float32, numpy=array([[[1., 1., 1., 1., 1.]]], dtype=float32)> On TF2.14 with the plugin I received import tensorflow as tf tf.keras.layers.Conv1D(5,1,padding="same", kernel_initializer="ones", groups=5)(tf.ones((1,1,5))) # displays ... NotFoundError: Exception encountered when calling layer 'conv1d_3' (type Conv1D). could not find registered platform with id: 0x104d8f6f0 [Op:__inference__jit_compiled_convolution_op_78] Call arguments received by layer 'conv1d_3' (type Conv1D): • inputs=tf.Tensor(shape=(1, 1, 5), dtype=float32) could not find registered platform with id

Machine Learning & AI General tensorflow-metal

0

538

Jan ’24

Recognizing Speech in live Audio ; impossible to run data generator with 'fr' local identifier

Hy, I'm French developer and I downloaded the Recognizing Speech in live Audio sample code from Developer Apple website. I tried to execute data generator command after changing the local identifier from 'en_US' to 'fr' in data generator main file , but when I ran the command in Xcode, I had this error message : " Identifier 'fr' does not parse into two elements." I checked the xml files associated to the bin archive file and the identifiers are no correct (they keep 'en-US' value). Thanks for your help !

Machine Learning & AI General Speech

1

0

567

Jan ’24

Tensorflow not working on Mac M1 pro

I created a new environment on Conda and then installed TensorFlow using the command "pip install TensorFlow" on my Mac M1 Pro machine. But TensorFlow is not working.

Machine Learning & AI General ML Compute

0

575

Jan ’24

VNDetectHumanBodyPose3DRequest Gives Inconsistent Results

I'm currently building an iOS app that requires the ability to detect a person's height with a live video stream. The new VNDetectHumanBodyPose3DRequest is exactly what I need but the observations I'm getting back are very inconsistent and unreliable. When I say inconsistent, I mean the values never seem to settle and they can fluctuate anywhere from 5 '4" to 10'1" (I'm about 6'0"). In terms of unreliable, I have once seen a value that closely matches my height but I rarely see any values that are close enough (within an inch) of the ground truth. In terms of my code, I'm not doing any fancy. I'm first opening a LiDAR stream on my iPhone Pro 14: guard let videoDevice = AVCaptureDevice.default(.builtInLiDARDepthCamera, for: .video, position: .back) else { return } guard let videoDeviceInput = try? AVCaptureDeviceInput(device: videoDevice) else { return } guard captureSession.canAddInput(videoDeviceInput) else { return } captureSession.addInput(videoDeviceInput) I'm then creating an output synchronizer so I can get both image and depth data at the same time: videoDataOutput = AVCaptureVideoDataOutput() captureSession.addOutput(videoDataOutput) depthDataOutput = AVCaptureDepthDataOutput() depthDataOutput.isFilteringEnabled = true captureSession.addOutput(depthDataOutput) outputVideoSync = AVCaptureDataOutputSynchronizer(dataOutputs: [depthDataOutput, videoDataOutput]) Finally, my delegate function that handles the synchronizer is roughly: fileprivate func perform3DPoseRequest(cmSampleBuffer: CMSampleBuffer, depthData: AVDepthData) { let imageRequestHandler = VNImageRequestHandler(cmSampleBuffer: cmSampleBuffer, depthData: depthData, orientation: .up) let request = VNDetectHumanBodyPose3DRequest() do { // Perform the body pose request. try imageRequestHandler.perform([request]) if let observation = request.results?.first { if (observation.heightEstimation == .measured) { print("Body height (ft) \(formatter.string(fromMeters: Double(observation.bodyHeight))) (m): \(observation.bodyHeight)") ... I'd appreciate any help determining how to get accurate results from the observation's bodyHeight. Thanks!

Machine Learning & AI General Vision

0

350

Jan ’24

NLTagger does not enumerate anymore?

I am using NLTagger to tag lexical classes of words, but it suddenly just stopped working. I boiled my code down to the most basic version, but it's never executing the closure of the enumerateTags() function. What do I have to change or what should I try? for e in sentenceArray { let cupcake = "I like you, have a cupcake" tagger.string = cupcake tagger.enumerateTags(in: cupcake.startIndex..<cupcake.endIndex, unit: .word, scheme: .nameTypeOrLexicalClass) { tag, range in print("TAG") return true }

Machine Learning & AI General Natural Language

5

1

2.1k

Nov ’21

Tensorflow-Metal Training got an increasing loss in CNN

Tensorflow-Metal training got an increasing loss in CNN. But same codes run correctly after pip uninstall tensorflow-metal

Machine Learning & AI General tensorflow-metal

0

1

564

Jan ’24

TTS problem iOS 17 beta

I see a lot of crashes on iOS 17 beta regarding some problem of "Text To Speech". Does anybody has a clue why TTS crashes? Anybody else seeing the same problem? Exception Type: EXC_BAD_ACCESS (SIGSEGV) Exception Subtype: KERN_INVALID_ADDRESS at 0x000000037f729380 Exception Codes: 0x0000000000000001, 0x000000037f729380 VM Region Info: 0x37f729380 is not in any region. Bytes after previous region: 3748828033 Bytes before following region: 52622617728 REGION TYPE START - END [ VSIZE] PRT/MAX SHRMOD REGION DETAIL MALLOC_NANO 280000000-2a0000000 [512.0M] rw-/rwx SM=PRV ---> GAP OF 0xd20000000 BYTES commpage (reserved) fc0000000-1000000000 [ 1.0G] ---/--- SM=NUL ...(unallocated) Termination Reason: SIGNAL 11 Segmentation fault: 11 Terminating Process: exc handler [36389] Triggered by Thread: 9 ..... Thread 9 name: Thread 9 Crashed: 0 libobjc.A.dylib 0x000000019eeff248 objc_retain_x8 + 16 1 AudioToolboxCore 0x00000001b2da9d80 auoop::RenderPipeUser::~RenderPipeUser() + 112 (AUOOPRenderPipePool.mm:400) 2 AudioToolboxCore 0x00000001b2e110b4 -[AUAudioUnit_XPC internalDeallocateRenderResources] + 92 (AUAudioUnit_XPC.mm:904) 3 AVFAudio 0x00000001bfa4cc04 AUInterfaceBaseV3::Uninitialize() + 60 (AUInterface.mm:524) 4 AVFAudio 0x00000001bfa894bc AVAudioEngineGraph::PerformCommand(AUGraphNodeBaseV3&, AVAudioEngineGraph::ENodeCommand, void*, unsigned int) const + 772 (AVAudioEngineGraph.mm:3317) 5 AVFAudio 0x00000001bfa93550 AVAudioEngineGraph::_Uninitialize(NSError**) + 132 (AVAudioEngineGraph.mm:1469) 6 AVFAudio 0x00000001bfa4b50c AVAudioEngineImpl::Stop(NSError**) + 396 (AVAudioEngine.mm:1081) 7 AVFAudio 0x00000001bfa4b094 -[AVAudioEngine stop] + 48 (AVAudioEngine.mm:193) 8 TextToSpeech 0x00000001c70b3c5c __55-[TTSSynthesisProviderAudioEngine renderSpeechRequest:]_block_invoke + 1756 (TTSSynthesisProviderAudioEngine.m:613) 9 libdispatch.dylib 0x00000001ae4b0740 _dispatch_call_block_and_release + 32 (init.c:1519) 10 libdispatch.dylib 0x00000001ae4b2378 _dispatch_client_callout + 20 (object.m:560) 11 libdispatch.dylib 0x00000001ae4b990c _dispatch_lane_serial_drain + 748 (queue.c:3885) 12 libdispatch.dylib 0x00000001ae4ba470 _dispatch_lane_invoke + 432 (queue.c:3976) 13 libdispatch.dylib 0x00000001ae4c5074 _dispatch_root_queue_drain_deferred_wlh + 288 (queue.c:6913) 14 libdispatch.dylib 0x00000001ae4c48e8 _dispatch_workloop_worker_thread + 404 (queue.c:6507) ... Thread 9 crashed with ARM Thread State (64-bit): x0: 0x0000000283309360 x1: 0x0000000000000000 x2: 0x0000000000000000 x3: 0x00000002833093c0 x4: 0x00000002833093c0 x5: 0x0000000101737740 x6: 0x0000000000000013 x7: 0x00000000ffffffff x8: 0x0000000283309360 x9: 0x3c788942d067009a x10: 0x0000000101547000 x11: 0x0000000000000000 x12: 0x00000000000007fb x13: 0x00000000000007fd x14: 0x000000001ee24020 x15: 0x0000000000000020 x16: 0x0000b1037f729360 x17: 0x000000037f729360 x18: 0x0000000000000000 x19: 0x0000000000000000 x20: 0x00000001016a8de8 x21: 0x0000000283e21d00 x22: 0x0000000283b3f1f8 x23: 0x0000000283098000 x24: 0x00000001bfb4fc35 x25: 0x00000001bfb4fc43 x26: 0x000000028033a688 x27: 0x0000000280c93090 x28: 0x0000000000000000 fp: 0x000000016fc86490 lr: 0x00000001b2da9d80 sp: 0x000000016fc863e0 pc: 0x000000019eeff248 cpsr: 0x1000 esr: 0x92000006 (Data Abort) byte read Translation fault

Machine Learning & AI General Speech Accessibility wwdc2023-10033

21

2

6.8k

Jun ’23

A place that sells vision pro

I'm going to the U.S. to buy a vision pro, does anyone have any information about where they sell it? Will it be sold in Hawaii by any chance? For now, I'm thinking about New York.

Machine Learning & AI General Vision

0

426

Jan ’24

'ReLU' activation problem when running inference on GPU

Hi, there seems to be a difference in behavior when running inference on a trained Keras model using the model __call__ method vs. using the predict or predict_on_batch methods. This only happens when using the GPU for inference and it seems that for certain sequence of operations and float types the 'relu' activation doesn't work as expected and seems to do nothing. I can replicate the problem with the following code (it would only fail with 'relu' activation and tf.float16 and tf.float32 types, while it works fine with tf.float64). import tensorflow as tf import numpy as np DATA_LENGTH = 16 DENSE_WIDTH = 16 BATCH_SIZE = 8 DTYPE = tf.float32 ACTIVATION = 'relu' def TestModel(): inputs = tf.keras.Input(DATA_LENGTH, dtype=DTYPE) u = tf.keras.layers.Dense(DENSE_WIDTH, activation=ACTIVATION, dtype=DTYPE)(inputs) # u = tf.maximum(u, 0.0) output = u*tf.constant(1.0, dtype=DTYPE) model = tf.keras.Model(inputs, output, name="TestModel") return model model = TestModel() model.compile() x = np.random.uniform(size=(BATCH_SIZE, DATA_LENGTH)).astype(DTYPE.as_numpy_dtype) with tf.device('/GPU:0'): out_gpu_call = model(x, training=False) out_gpu_predict = model.predict_on_batch(x) with tf.device('/CPU:0'): out_cpu_call = model(x, training=False) out_cpu_predict= model.predict_on_batch(x) print(f'\nDTYPE {DTYPE}, ACTIVATION: {ACTIVATION}') print("\tMean Abs. Difference GPU (__call__ vs. predict):", np.mean(np.abs(out_gpu_call - out_gpu_predict))) print("\tMean Abs. Difference CPU (__call__ vs. predict):", np.mean(np.abs(out_cpu_call - out_cpu_predict))) print("\tMean Abs. Difference GPU-CPU __call__:", np.mean(np.abs(out_gpu_call - out_cpu_call))) print("\tMean Abs. Difference GPU-CPU predict():", np.mean(np.abs(out_gpu_predict - out_cpu_predict))) The code above produces for example the following output: DTYPE <dtype: 'float32'>, ACTIVATION: relu Mean Abs. Difference GPU (__call__ vs. predict): 0.1955472 Mean Abs. Difference CPU (__call__ vs. predict): 0.0 Mean Abs. Difference GPU-CPU __call__: 1.3573299e-08 Mean Abs. Difference GPU-CPU predict(): 0.1955472 And the results for the GPU are: out_gpu_call <tf.Tensor: shape=(8, 16), dtype=float32, numpy= array([[0.1496982 , 0. , 0. , 0.73772687, 0.26131183, 0.27757105, 0. , 0. , 0. , 0. , 0. , 0.4164225 , 1.0367445 , 0. , 0.5860609 , 0. ], ... out_gpu_predict array([[ 1.49698198e-01, -3.48425686e-01, -2.44667321e-01, 7.37726867e-01, 2.61311829e-01, 2.77571052e-01, -2.26729304e-01, -1.06500387e-01, -3.66294265e-01, -2.93850392e-01, -4.51043218e-01, 4.16422486e-01, 1.03674448e+00, -1.39347658e-01, 5.86060882e-01, -2.05334812e-01], ... Upon inspection of the results it seems that the problem is that the 'relu' activation is not setting the values < 0 to 0 when calling predict_on_batch. When uncommenting the # u = tf.maximum(u, 0.0) line after the Dense layer there is no difference between the two calls (as should be expected). It also happens that removing the multiplication by a constant after the Dense layer, output = u*tf.constant(1.0, dtype=DTYPE) makes the problem dissappear (even when leaving the # u = tf.maximum(u, 0.0) line commented). This is running with the following setup: MacBook Pro, Apple M2 Max chip, macOS Sonoma 14.2 tf version 2.15.0 tensorflow-metal 1.1.0 Python 3.10.13

Machine Learning & AI General tensorflow-metal

1

2

554

Jan ’24

Dual audio device

I am working on a design that requires connecting an ios device to two audio output devices specifically headphones and a speaker. I want the audio driver to switch output device without user action. Is this manageable via ios SDK?

Machine Learning & AI General Sound Analysis Audio

0

1

664

Jan ’24

Tensorflow metal problem

Hello I use Mac Pro M2 16GB This is my code. It is very basic code. `model = Sequential() model.add(LSTM(units=50, input_shape=(X_train.shape[1], X_train.shape[2]))) model.add(Dense(units=1)) model.compile(optimizer='adam', loss='mse') model.fit(X_train, y_train, epochs=50, batch_size=16) train_predict = model.predict(X_train) test_predict = model.predict(X_test) train_predict = scaler.inverse_transform(train_predict) y_train = scaler.inverse_transform(y_train) test_predict = scaler.inverse_transform(test_predict) y_test = scaler.inverse_transform(y_test) When I try to execute this code, anaconda gives the following error I metal_plugin/src/device/metal_device.cc:1154] Metal device set to: Apple M2 I metal_plugin/src/device/metal_device.cc:296] systemMemory: 16.00 GB I metal_plugin/src/device/metal_device.cc:313] maxCacheSize: 5.33 GB I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:306] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support. : I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:272] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: ) I can't find any solution, could you help me Thank you

Machine Learning & AI General tensorflow-metal

0

696

Jan ’24

General

Post

Replies

Boosts

Views

Activity