tensorflow-metal

RSS for tag

TensorFlow accelerates machine learning model training with Metal on Mac GPUs.

Posts under tensorflow-metal tag

104 Posts
Sort by:

Post

Replies

Boosts

Views

Activity

Crash when trying to use tensorflow in a macOS VM
I created a macOS 14 VM using https://github.com/s-u/macosvm which uses the Virtualization Framework. I want to check if I can use paravirtualized graphics for tensorflow workloads. I followed the steps from https://developer.apple.com/metal/tensorflow-plugin/ but when I run the script from step 4. Verify, I get a segmentation fault (see below). Did anyone try to get this kind of GPU compute in a VM and succeed? /Users/teuf/venv-metal/lib/python3.9/site-packages/urllib3/__init__.py:34: NotOpenSSLWarning: urllib3 v2 only supports OpenSSL 1.1.1+, currently the 'ssl' module is compiled with 'LibreSSL 2.8.3'. See: https://github.com/urllib3/urllib3/issues/3020 warnings.warn( 2023-11-20 07:41:11.723578: I metal_plugin/src/device/metal_device.cc:1154] Metal device set to: Apple Paravirtual device 2023-11-20 07:41:11.723620: I metal_plugin/src/device/metal_device.cc:296] systemMemory: 10.00 GB 2023-11-20 07:41:11.723626: I metal_plugin/src/device/metal_device.cc:313] maxCacheSize: 0.50 GB 2023-11-20 07:41:11.723700: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:306] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support. 2023-11-20 07:41:11.723968: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:272] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: <undefined>) zsh: segmentation fault python3 ./tensorflow-test.py Thread 0 Crashed:: Dispatch queue: metal gpu stream 0 MPSCore 0x1999598f8 MPSDevice::GetMPSLibrary_DoNotUse(MPSLibraryInfo const*) + 92 1 MPSCore 0x19995c544 0x199927000 + 218436 2 MPSCore 0x19995c908 0x199927000 + 219400 3 MetalPerformanceShadersGraph 0x1fb696a58 0x1fb583000 + 1129048 4 MetalPerformanceShadersGraph 0x1fb6f0cc8 0x1fb583000 + 1498312 5 MetalPerformanceShadersGraph 0x1fb6ef2dc 0x1fb583000 + 1491676 6 MetalPerformanceShadersGraph 0x1fb717ea0 0x1fb583000 + 1658528 7 MetalPerformanceShadersGraph 0x1fb717ce4 0x1fb583000 + 1658084 8 MetalPerformanceShadersGraph 0x1fb6edaac 0x1fb583000 + 1485484 9 MetalPerformanceShadersGraph 0x1fb7a85e0 0x1fb583000 + 2250208 10 MetalPerformanceShadersGraph 0x1fb7a79f0 0x1fb583000 + 2247152 11 MetalPerformanceShadersGraph 0x1fb6602b4 0x1fb583000 + 905908 12 MetalPerformanceShadersGraph 0x1fb65f7b0 0x1fb583000 + 903088 13 libmetal_plugin.dylib 0x1156dfdcc invocation function for block in metal_plugin::runMPSGraph(MetalStream*, MPSGraph*, NSDictionary*, NSDictionary*) + 164 14 libdispatch.dylib 0x18e79b910 _dispatch_client_callout + 20 15 libdispatch.dylib 0x18e7aacc4 _dispatch_lane_barrier_sync_invoke_and_complete + 56 16 libmetal_plugin.dylib 0x1156dfd14 metal_plugin::runMPSGraph(MetalStream*, MPSGraph*, NSDictionary*, NSDictionary*) + 108 17 libmetal_plugin.dylib 0x115606634 metal_plugin::MPSStatelessRandomUniformOp<float>::ProduceOutput(metal_plugin::OpKernelContext*, metal_plugin::Tensor*) + 876 18 libmetal_plugin.dylib 0x115607620 metal_plugin::MPSStatelessRandomOpBase::Compute(metal_plugin::OpKernelContext*) + 620 19 libmetal_plugin.dylib 0x1156061f8 void metal_plugin::ComputeOpKernel<metal_plugin::MPSStatelessRandomUniformOp<float>>(void*, TF_OpKernelContext*) + 44 20 libtensorflow_framework.2.dylib 0x10b807354 tensorflow::PluggableDevice::Compute(tensorflow::OpKernel*, tensorflow::OpKernelContext*) + 148 21 libtensorflow_framework.2.dylib 0x10b7413e0 tensorflow::(anonymous namespace)::SingleThreadedExecutorImpl::Run(tensorflow::Executor::Args const&) + 2100 22 libtensorflow_framework.2.dylib 0x10b70b820 tensorflow::FunctionLibraryRuntimeImpl::RunSync(tensorflow::FunctionLibraryRuntime::Options, unsigned long long, absl::lts_20230125::Span<tensorflow::Tensor const>, std::__1::vector<tensorflow::Tensor, std::__1::allocator<tensorflow::Tensor>>*) + 420 23 libtensorflow_framework.2.dylib 0x10b715668 tensorflow::ProcessFunctionLibraryRuntime::RunMultiDeviceSync(tensorflow::FunctionLibraryRuntime::Options const&, unsigned long long, std::__1::vector<std::__1::variant<tensorflow::Tensor, tensorflow::TensorShape>, std::__1::allocator<std::__1::variant<tensorflow::Tensor, tensorflow::TensorShape>>>*, std::__1::function<absl::lts_20230125::Status (tensorflow::ProcessFunctionLibraryRuntime::ComponentFunctionData const&, tensorflow::ProcessFunctionLibraryRuntime::InternalArgs*)>) const + 1336 24 libtensorflow_framework.2.dylib 0x10b71a8a4 tensorflow::ProcessFunctionLibraryRuntime::RunSync(tensorflow::FunctionLibraryRuntime::Options const&, unsigned long long, absl::lts_20230125::Span<tensorflow::Tensor const>, std::__1::vector<tensorflow::Tensor, std::__1::allocator<tensorflow::Tensor>>*) const + 848 25 libtensorflow_cc.2.dylib 0x2801b5008 tensorflow::KernelAndDeviceFunc::Run(tensorflow::ScopedStepContainer*, tensorflow::EagerKernelArgs const&, std::__1::vector<std::__1::variant<tensorflow::Tensor, tensorflow::TensorShape>, std::__1::allocator<std::__1::variant<tensorflow::Tensor, tensorflow::TensorShape>>>*, tsl::CancellationManager*, std::__1::optional<tensorflow::EagerFunctionParams> const&, std::__1::optional<tensorflow::ManagedStackTrace> const&, tsl::CoordinationServiceAgent*) + 572 26 libtensorflow_cc.2.dylib 0x28016613c tensorflow::EagerKernelExecute(tensorflow::EagerContext*, absl::lts_20230125::InlinedVector<tensorflow::TensorHandle*, 4ul, std::__1::allocator<tensorflow::TensorHandle*>> const&, std::__1::optional<tensorflow::EagerFunctionParams> const&, tsl::core::RefCountPtr<tensorflow::KernelAndDevice> const&, tensorflow::GraphCollector*, tsl::CancellationManager*, absl::lts_20230125::Span<tensorflow::TensorHandle*>, std::__1::optional<tensorflow::ManagedStackTrace> const&) + 452 27 libtensorflow_cc.2.dylib 0x2801708ec tensorflow::ExecuteNode::Run() + 396 28 libtensorflow_cc.2.dylib 0x2801b0118 tensorflow::EagerExecutor::SyncExecute(tensorflow::EagerNode*) + 244 29 libtensorflow_cc.2.dylib 0x280165ac8 tensorflow::(anonymous namespace)::EagerLocalExecute(tensorflow::EagerOperation*, tensorflow::TensorHandle**, int*) + 2580 30 libtensorflow_cc.2.dylib 0x2801637a8 tensorflow::DoEagerExecute(tensorflow::EagerOperation*, tensorflow::TensorHandle**, int*) + 416 31 libtensorflow_cc.2.dylib 0x2801631e8 tensorflow::EagerOperation::Execute(absl::lts_20230125::Span<tensorflow::AbstractTensorHandle*>, int*) + 132
2
0
666
Nov ’23
InvalidArgumentError: Cannot assign a device for operation
tensorflow.python.framework.errors_impl.InvalidArgumentError: Cannot assign a device for operation tf_bert_for_sequence_classification/bert/embeddings/Gather: Could not satisfy explicit device specification '' because the node {{colocation_node tf_bert_for_sequence_classification/bert/embeddings/Gather}} was colocated with a group of nodes that required incompatible device '/job:localhost/replica:0/task:0/device:GPU:0'. All available devices [/job:localhost/replica:0/task:0/device:CPU:0, /job:localhost/replica:0/task:0/device:GPU:0]. Colocation Debug Info: Colocation group had the following types and supported devices: Root Member(assigned_device_name_index_=2 requested_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' assigned_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' resource_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' supported_device_types_=[CPU] possible_devices_=[]
0
0
378
Nov ’23
Issue with Tensorflow 2.14 on MacOS: No registered 'ExpandDims' OpKernel for 'GPU' devices compatible with node {{node StringSplit/stack}}
Working Environment MacBook Pro 14' with M2-Pro chip macOS Sonoma 14.0 Python 3.11.4 tensorflow 2.14.0, tensorflow-macos 2.14.0, tensorflow-metal 1.1.0 Issue Description Hi there! I met an issue when working around with Keras' TextVectorization preprocessing layer. text_vectorization = keras.layers.TextVectorization(output_mode="tf_idf") text_vectorization.adapt(ds.map(lambda x: x['title'])) The inputs are string contents. And here is the trackback: --------------------------------------------------------------------------- NotFoundError Traceback (most recent call last) /Users/ken/Workspaces/MLE101/tfrs101/preprocess.ipynb Cell 13 line 3 1 # with tf.device('/CPU:0'): 2 text_vectorization = keras.layers.TextVectorization(output_mode="tf_idf") ----> 3 text_vectorization.adapt(ds.map(lambda x: x['title'])) File ~/miniconda3/envs/ds-101/lib/python3.11/site-packages/keras/src/layers/preprocessing/text_vectorization.py:473, in TextVectorization.adapt(self, data, batch_size, steps) 423 def adapt(self, data, batch_size=None, steps=None): 424 """Computes a vocabulary of string terms from tokens in a dataset. 425 426 Calling `adapt()` on a `TextVectorization` layer is an alternative to (...) 471 argument is not supported with array inputs. 472 """ --> 473 super().adapt(data, batch_size=batch_size, steps=steps) File ~/miniconda3/envs/ds-101/lib/python3.11/site-packages/keras/src/engine/base_preprocessing_layer.py:258, in PreprocessingLayer.adapt(self, data, batch_size, steps) 256 with data_handler.catch_stop_iteration(): 257 for _ in data_handler.steps(): --> 258 self._adapt_function(iterator) 259 if data_handler.should_sync: 260 context.async_wait() File ~/miniconda3/envs/ds-101/lib/python3.11/site-packages/tensorflow/python/util/traceback_utils.py:153, in filter_traceback.<locals>.error_handler(*args, **kwargs) 151 except Exception as e: 152 filtered_tb = _process_traceback_frames(e.__traceback__) --> 153 raise e.with_traceback(filtered_tb) from None 154 finally: 155 del filtered_tb File ~/miniconda3/envs/ds-101/lib/python3.11/site-packages/tensorflow/python/eager/execute.py:60, in quick_execute(op_name, num_outputs, inputs, attrs, ctx, name) 53 # Convert any objects of type core_types.Tensor to Tensor. 54 inputs = [ 55 tensor_conversion_registry.convert(t) 56 if isinstance(t, core_types.Tensor) 57 else t 58 for t in inputs 59 ] ---> 60 tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name, 61 inputs, attrs, num_outputs) 62 except core._NotOkStatusException as e: 63 if name is not None: NotFoundError: Graph execution error: Detected at node StringSplit/stack defined at (most recent call last): ... No registered 'ExpandDims' OpKernel for 'GPU' devices compatible with node {{node StringSplit/stack}} (OpKernel was found, but attributes didn't match) Requested Attributes: T=DT_STRING, Tdim=DT_INT32, _XlaHasReferenceVars=false, _device="/job:localhost/replica:0/task:0/device:GPU:0" . Registered: device='XLA_CPU_JIT'; Tdim in [DT_INT32, DT_INT64]; T in [DT_FLOAT, DT_DOUBLE, DT_INT32, DT_UINT8, DT_INT16, DT_INT8, DT_COMPLEX64, DT_INT64, DT_BOOL, DT_QINT8, DT_QUINT8, DT_QINT32, DT_BFLOAT16, DT_UINT16, DT_COMPLEX128, DT_HALF, DT_UINT32, DT_UINT64, DT_FLOAT8_E5M2, DT_FLOAT8_E4M3FN] device='DEFAULT'; T in [DT_HALF]; Tdim in [DT_INT32] device='DEFAULT'; T in [DT_HALF]; Tdim in [DT_INT64] device='DEFAULT'; T in [DT_BFLOAT16]; Tdim in [DT_INT32] device='DEFAULT'; T in [DT_BFLOAT16]; Tdim in [DT_INT64] device='DEFAULT'; T in [DT_FLOAT]; Tdim in [DT_INT32] device='DEFAULT'; T in [DT_FLOAT]; Tdim in [DT_INT64] device='DEFAULT'; T in [DT_DOUBLE]; Tdim in [DT_INT32] device='DEFAULT'; T in [DT_DOUBLE]; Tdim in [DT_INT64] device='DEFAULT'; T in [DT_UINT64]; Tdim in [DT_INT32] device='DEFAULT'; T in [DT_UINT64]; Tdim in [DT_INT64] device='DEFAULT'; T in [DT_INT64]; Tdim in [DT_INT32] device='DEFAULT'; T in [DT_INT64]; Tdim in [DT_INT64] device='DEFAULT'; T in [DT_UINT32]; Tdim in [DT_INT32] device='DEFAULT'; T in [DT_UINT32]; Tdim in [DT_INT64] device='DEFAULT'; T in [DT_UINT16]; Tdim in [DT_INT32] device='DEFAULT'; T in [DT_UINT16]; Tdim in [DT_INT64] device='DEFAULT'; T in [DT_INT16]; Tdim in [DT_INT32] device='DEFAULT'; T in [DT_INT16]; Tdim in [DT_INT64] device='DEFAULT'; T in [DT_UINT8]; Tdim in [DT_INT32] device='DEFAULT'; T in [DT_UINT8]; Tdim in [DT_INT64] device='DEFAULT'; T in [DT_INT8]; Tdim in [DT_INT32] device='DEFAULT'; T in [DT_INT8]; Tdim in [DT_INT64] device='DEFAULT'; T in [DT_COMPLEX64]; Tdim in [DT_INT32] device='DEFAULT'; T in [DT_COMPLEX64]; Tdim in [DT_INT64] device='DEFAULT'; T in [DT_COMPLEX128]; Tdim in [DT_INT32] device='DEFAULT'; T in [DT_COMPLEX128]; Tdim in [DT_INT64] device='DEFAULT'; T in [DT_BOOL]; Tdim in [DT_INT32] device='DEFAULT'; T in [DT_BOOL]; Tdim in [DT_INT64] device='DEFAULT'; T in [DT_INT32]; Tdim in [DT_INT32] device='DEFAULT'; T in [DT_INT32]; Tdim in [DT_INT64] device='CPU'; Tdim in [DT_INT32] device='CPU'; Tdim in [DT_INT64] [[StringSplit/stack]] [Op:__inference_adapt_step_71204] I have to explicitly specify to use CPU to make it work - with tf.device('/CPU:0'): text_vectorization = keras.layers.TextVectorization(output_mode="tf_idf") text_vectorization.adapt(ds.map(lambda x: x['title'])) I have referred to this post: https://developer.apple.com/forums/thread/700108
0
0
548
Nov ’23
Tensorflow Autoencoders different results between local (M2 Pro Max) and colab / kaggle
Hi, I've been going over this tutorial of autoencoders https://www.tensorflow.org/tutorials/generative/autoencoder#third_example_anomaly_detection Notebook link https://colab.research.google.com/github/tensorflow/docs/blob/master/site/en/tutorials/generative/autoencoder.ipynb And when I downloaded and ran the notebook locally on my M2 Pro Max - the results were dramatically different and the plots were way off. This is the plot in the working notebook: This is the local plot: I checked every moving piece and the difference seems to be in the output of the autoencoder, these lines: encoded_data = autoencoder.encoder(normal_test_data).numpy() decoded_data = autoencoder.decoder(encoded_data).numpy() The working notebook output is: The local output: And the overall result is notebook: Accuracy = 0.944 Precision = 0.9941176470588236 Recall = 0.9053571428571429 local: Accuracy = 0.44 Precision = 0.0 Recall = 0.0 I'm using Mac M2 Pro Max Python 3.10.12 Tensorflow 2.14.0 Can anyone help? Thanks a lot in advance.
2
1
504
Nov ’23
An error during installing tensorflow
`print("Hello") import tensorflow as tf` I have an error during installing tensorflow "Process finished with exit code 132 (interrupted by signal 4: SIGILL)" Mac air 2022 M2 14.1 | Tensorflow latest version | Python version 3.11.5 Who can help me please? I have tried different variants of tensorflow (for Mac, for cpu and other versions). Also I have tried anaconda and miniconda but I can't. Process finished with exit code 132 (interrupted by signal 4: SIGILL)
0
0
451
Nov ’23
M1 GPU python process stopped?
I've been running tensorflow with python 3.9 to training a CNN model, and this process is accelerated by the GPU. After 80 epochs the process went to sleep (status S) and its GPU usage drops to 0 percent, I am wondering if this traing process crashed the GPU or the OS is mandatating the process to go to sleep because it takes up too much GPU time? Thanks a lot!
1
0
570
Nov ’23
Issues with installing Tensorflow on M1 MacBook Pro
I have been following the instructions here: https://developer.apple.com/metal/tensorflow-plugin/ I manage to execute step 1 set up the environment, step 2 install base Tensorflow but when I try to execute step 3 Install tensorflow-metal plug-in with the line "python -m pip install tensorflow-metal", I get the following messages: "ERROR: Could not find a version that satisfies the requirement tensorflow-metal (from versions: none) ERROR: No matching distribution found for tensorflow-metal" What am I missing here? So the code used are as follows: Step 1 python3 -m venv ~/venv-metal source ~/venv-metal/bin/activate python -m pip install -U pip Step 2 python -m pip install tensorflow Step 3 python -m pip install tensorflow-metal
2
0
557
Nov ’23
Tensorflow-metal training with l2 regularizer much slower than without regularizer
Hi, When I try to train resnet-50 with tensorflow-metal I found the l2 regularizer makes each epoch take almost 4x as long (~220ms instead of 60ms). I'm on a M1 Max 16" MBP. It seems like regularization shouldn't add that much time, is there anything I can do to make it faster? Here's some sample code that reproduces the issue: import tensorflow as tf from tensorflow.keras.layers import Input, Conv2D, MaxPooling2D, ZeroPadding2D,\ Flatten, BatchNormalization, AveragePooling2D, Dense, Activation, Add from tensorflow.keras.regularizers import l2 from tensorflow.keras.models import Model from tensorflow.keras import activations import random import numpy as np random.seed(1234) np.random.seed(1234) tf.random.set_seed(1234) batch_size = 64 (train_im, train_lab), (test_im, test_lab) = tf.keras.datasets.cifar10.load_data() train_im, test_im = train_im/255.0 , test_im/255.0 train_lab_categorical = tf.keras.utils.to_categorical( train_lab, num_classes=10, dtype='uint8') train_DataGen = tf.keras.preprocessing.image.ImageDataGenerator() train_set_data = train_DataGen.flow(train_im, train_lab, batch_size=batch_size, shuffle=False) # Change this to l2 for it to train much slower regularizer = None # l2(0.001) def res_identity(x, filters): x_skip = x f1, f2 = filters x = Conv2D(f1, kernel_size=(1, 1), strides=(1, 1), padding='valid', use_bias=False, kernel_regularizer=regularizer)(x) x = BatchNormalization()(x) x = Activation(activations.relu)(x) x = Conv2D(f1, kernel_size=(3, 3), strides=(1, 1), padding='same', use_bias=False, kernel_regularizer=regularizer)(x) x = BatchNormalization()(x) x = Activation(activations.relu)(x) x = Conv2D(f2, kernel_size=(1, 1), strides=(1, 1), padding='valid', use_bias=False, kernel_regularizer=regularizer)(x) x = BatchNormalization()(x) x = Add()([x, x_skip]) x = Activation(activations.relu)(x) return x def res_conv(x, s, filters): x_skip = x f1, f2 = filters x = Conv2D(f1, kernel_size=(1, 1), strides=(s, s), padding='valid', use_bias=False, kernel_regularizer=regularizer)(x) x = BatchNormalization()(x) x = Activation(activations.relu)(x) x = Conv2D(f1, kernel_size=(3, 3), strides=(1, 1), padding='same', use_bias=False, kernel_regularizer=regularizer)(x) x = BatchNormalization()(x) x = Activation(activations.relu)(x) x = Conv2D(f2, kernel_size=(1, 1), strides=(1, 1), padding='valid', use_bias=False, kernel_regularizer=regularizer)(x) x = BatchNormalization()(x) x_skip = Conv2D(f2, kernel_size=(1, 1), strides=(s, s), padding='valid', use_bias=False, kernel_regularizer=regularizer)(x_skip) x_skip = BatchNormalization()(x_skip) x = Add()([x, x_skip]) x = Activation(activations.relu)(x) return x input = Input(shape=(train_im.shape[1], train_im.shape[2], train_im.shape[3]), batch_size=batch_size) x = ZeroPadding2D(padding=(3, 3))(input) x = Conv2D(64, kernel_size=(7, 7), strides=(2, 2), use_bias=False)(x) x = BatchNormalization()(x) x = Activation(activations.relu)(x) x = MaxPooling2D((3, 3), strides=(2, 2))(x) x = res_conv(x, s=1, filters=(64, 256)) x = res_identity(x, filters=(64, 256)) x = res_identity(x, filters=(64, 256)) x = res_conv(x, s=2, filters=(128, 512)) x = res_identity(x, filters=(128, 512)) x = res_identity(x, filters=(128, 512)) x = res_identity(x, filters=(128, 512)) x = res_conv(x, s=2, filters=(256, 1024)) x = res_identity(x, filters=(256, 1024)) x = res_identity(x, filters=(256, 1024)) x = res_identity(x, filters=(256, 1024)) x = res_identity(x, filters=(256, 1024)) x = res_identity(x, filters=(256, 1024)) x = res_conv(x, s=2, filters=(512, 2048)) x = res_identity(x, filters=(512, 2048)) x = res_identity(x, filters=(512, 2048)) x = AveragePooling2D((2, 2), padding='same')(x) x = Flatten()(x) x = Dense(10, activation='softmax', kernel_initializer='he_normal')(x) model = Model(inputs=input, outputs=x, name='Resnet50') opt = tf.keras.optimizers.legacy.SGD(learning_rate = 0.01) model.compile(loss=tf.keras.losses.CategoricalCrossentropy(reduction=tf.keras.losses.Reduction.SUM_OVER_BATCH_SIZE), optimizer=opt) model.fit(x=train_im, y=train_lab_categorical, batch_size=batch_size, epochs=150, steps_per_epoch=train_im.shape[0]/batch_size)
0
0
524
Nov ’23
unsuccessful importing of tensorflow
Hi. I have followed the instructions here to install tensorflow with GPU support for my 16inch 2019 intel macbook pro (with AMD graphic). The installation process seems to be successful (I get no error) but, when I try to test it, after running import tensorflow as tf I get the following error: Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/Users/mahonik/.virtualenvs/tf-metal-new/lib/python3.11/site-packages/tensorflow/__init__.py", line 445, in <module> _ll.load_library(_plugin_dir) File "/Users/mahonik/.virtualenvs/tf-metal-new/lib/python3.11/site-packages/tensorflow/python/framework/load_library.py", line 151, in load_library py_tf.TF_LoadLibrary(lib) tensorflow.python.framework.errors_impl.NotFoundError: dlopen(/Users/mahonik/.virtualenvs/tf-metal-new/lib/python3.11/site-packages/tensorflow-plugins/libmetal_plugin.dylib, 0x0006): Symbol not found: __ZN10tensorflow16TensorShapeProtoC1ERKS0_ Referenced from: <C62E0AB4-567E-3E14-8F96-9F07A746C4DC> /Users/mahonik/.virtualenvs/tf-metal-new/lib/python3.11/site-packages/tensorflow-plugins/libmetal_plugin.dylib Expected in: <0B1F231A-6766-3F61-81D9-6782129807A9> /Users/mahonik/.virtualenvs/tf-metal-new/lib/python3.11/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so My env's packages ... numpy 1.26.1 tensorboard 2.14.1 tensorboard-data-server 0.7.1 tensorflow 2.14.0 tensorflow-estimator 2.14.0 tensorflow-io-gcs-filesystem 0.34.0 tensorflow-metal 1.0.0 ...
4
1
711
Oct ’23
MacOS M2 upgrade Sonoma 14.0 can not train model with tensorflow
I can train a yolov3 at MacOS M2 ventura with tensorflow-macos=2.9.0 and tensorflow-mental=0.5. But when I upgrade the system to Sonoma14.0. I can not train model with below error. I could train MacOS M1 even I upgrade to Sonoma 14.0 although it report - error: 'anec.gain_offset_control' op. But M1 there is no error for last - `MPSKernel MTLComputePipelineStateCache unable to load function ndArrayConvolution2DGradientWithWeightsA14. Compute function exceeds available temporary registers: (null) When I change my optimizer from Adam to SGD. - error: 'anec.gain_offset_control' op will disappear. So this error happen due something in Adam. But for error - `MPSKernel MTLComputePipelineStateCache unable to load function ndArrayConvolution2DGradientWithWeightsA14. Compute function exceeds available temporary registers: (null) I can not resolve it. ERROR Info MetalPerformanceShadersGraph/mpsgraph/MetalPerformanceShadersGraph/Core/Files/MPSGraphUtilities.mm":294:0)): error: 'anec.gain_offset_control' op result #0 must be 4D/5D memref of 16-bit float or 8-bit signed integer or 8-bit unsigned integer values, but got 'memref<1x1x1x1xi1>' loc("mps_select"("(mpsFileLoc): /AppleInternal/Library/BuildRoots/75428952-3aa4-11ee-8b65-46d450270006/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShadersGraph/mpsgraph/MetalPerformanceShadersGraph/Core/Files/MPSGraphUtilities.mm":294:0)): error: 'anec.gain_offset_control' op result #0 must be 4D/5D memref of 16-bit float or 8-bit signed integer or 8-bit unsigned integer values, but got 'memref<1x1x1x1xi1>' loc("mps_select"("(mpsFileLoc): /AppleInternal/Library/BuildRoots/75428952-3aa4-11ee-8b65-46d450270006/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShadersGraph/mpsgraph/MetalPerformanceShadersGraph/Core/Files/MPSGraphUtilities.mm":294:0)): error: 'anec.gain_offset_control' op result #0 must be 4D/5D memref of 16-bit float or 8-bit signed integer or 8-bit unsigned integer values, but got 'memref<1x1x1x1xi1>' loc("mps_select"("(mpsFileLoc): /AppleInternal/Library/BuildRoots/75428952-3aa4-11ee-8b65-46d450270006/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShadersGraph/mpsgraph/MetalPerformanceShadersGraph/Core/Files/MPSGraphUtilities.mm":294:0)): error: 'anec.gain_offset_control' op result #0 must be 4D/5D memref of 16-bit float or 8-bit signed integer or 8-bit unsigned integer values, but got 'memref<1x1x1x1xi1>' loc("mps_select"("(mpsFileLoc): /AppleInternal/Library/BuildRoots/75428952-3aa4-11ee-8b65-46d450270006/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShadersGraph/mpsgraph/MetalPerformanceShadersGraph/Core/Files/MPSGraphUtilities.mm":294:0)): error: 'anec.gain_offset_control' op result #0 must be 4D/5D memref of 16-bit float or 8-bit signed integer or 8-bit unsigned integer values, but got 'memref<1x1x1x1xi1>' /AppleInternal/Library/BuildRoots/90c9c1ae-37b6-11ee-a991-46d450270006/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShaders/MPSCore/Utility/MPSLibrary.mm:550: failed assertion `MPSKernel MTLComputePipelineStateCache unable to load function ndArrayConvolution2DGradientWithWeightsA14. Compute function exceeds available temporary registers: (null)
0
1
622
Oct ’23
Question: Will TensorFlow-Metal and JAX-Metal code be open sourced?
Will TensorFlow-Metal and JAX-Metal code be open sourced? Reasons why I ask: If it is open sourced on GitHub or something it might make it easier for people to find issues and create new ones if necessary, also the open source community might be able to help ;) I'd love to learn about how you guys implement some of these operations :P (I know you guys made an Apple tutorial on how to implement TensorFlow custom op for Metal which was fire https://developer.apple.com/documentation/metal/metal_sample_code_library/customizing_a_tensorflow_operation)
0
1
327
Oct ’23
jax.lax.conv_transpose not correctly implemented
Good evening! Tried to use Flax nn.ConvTranspose which calls jax.lax.conv_transpose but it looks like it isn't implemented correctly for the METAL backend, works fine on CPU. File "/Users/cemlyn/Documents/VCLless/mnist_vae/venv/lib/python3.11/site-packages/flax/linen/linear.py", line 768, in __call__ y = lax.conv_transpose( ^^^^^^^^^^^^^^^^^^^ jaxlib.xla_extension.XlaRuntimeError: UNKNOWN: <unknown>:0: error: type of return operand 0 ('tensor<1x8x8x64xf32>') doesn't match function result type ('tensor<1x14x14x64xf32>') in function @main <unknown>:0: note: see current operation: "func.return"(%0) : (tensor<1x8x8x64xf32>) -> () Versions: pip list | grep jax jax 0.4.11 jax-metal 0.0.4 jaxlib 0.4.11
2
0
436
Oct ’23
jax-metal issues with IDEs
Hi, following instructions at https://developer.apple.com/metal/jax/, jax works fine on M1 pro. However, only in Terminal. If you run Jupyter Notebook or Pycharm, the following always defaults to CPU. from jax.lib import xla_bridge print(xla_bridge.get_backend().platform) I also notice that if you restart the Terminal, jax defaults to CPU only. You need to always set the virtual environment to jax-meta first to get Apple Silicon's GPU work: python3 -m venv ~/jax-metal source ~/jax-metal/bin/activate Is there any way to make sure that Jupyter Notebook and other IDEs default to jax-metal? I'm currently only able to use it in Terminal after each time manually setting the virtual environment to jax-metal, which is annoying.
0
2
406
Oct ’23
not able to setup Tensflow on my mac M1. conda install -c apple tensorflow-dep failing with UnsatisfiableError
Trying to setup Tensorflow on mac M1. conda install -c apple tensorflow-deps throwing following error: UnsatisfiableError: The following specifications were found to be incompatible with each other: Output in format: Requested package -> Available versions following specifications were found to be incompatible with your system: - feature:/osx-arm64::__osx==13.6=0 - tensorflow-deps -> grpcio[version='>=1.37.0,<2.0'] -> __osx[version='>=10.10|>=10.9'] Your installed version is: 13.6 The .condarc as follows: channels: - defaults subdirs: - osx-arm64 - osx-64 - noarch ssl_verify: false subdir: osx-arm64 And conda info: active environment : base active env location : /Users/mdrahman/miniconda3 shell level : 1 user config file : /Users/mdrahman/.condarc populated config files : /Users/mdrahman/.condarc conda version : 23.5.2 conda-build version : not installed python version : 3.11.4.final.0 virtual packages : __archspec=1=arm64 __osx=13.6=0 __unix=0=0 base environment : /Users/mdrahman/miniconda3 (writable) conda av data dir : /Users/mdrahman/miniconda3/etc/conda conda av metadata url : None channel URLs : https://repo.anaconda.com/pkgs/main/osx-arm64 https://repo.anaconda.com/pkgs/main/osx-64 https://repo.anaconda.com/pkgs/main/noarch https://repo.anaconda.com/pkgs/r/osx-arm64 https://repo.anaconda.com/pkgs/r/osx-64 https://repo.anaconda.com/pkgs/r/noarch package cache : /Users/mdrahman/miniconda3/pkgs /Users/mdrahman/.conda/pkgs envs directories : /Users/mdrahman/miniconda3/envs /Users/mdrahman/.conda/envs platform : osx-arm64 user-agent : conda/23.5.2 requests/2.29.0 CPython/3.11.4 Darwin/22.6.0 OSX/13.6 UID:GID : 501:20 netrc file : None offline mode : False``` Looking forward for your support.
1
0
491
Oct ’23
JAX Metal error: failed to legalize operation 'mhlo.scatter'
I only get this error when using the JAX Metal device (CPU is fine). It seems to be a problem whenever I want to modify values of an array in-place using at and set. note: see current operation: %2903 = "mhlo.scatter"(%arg3, %2902, %2893) ({ ^bb0(%arg4: tensor<f32>, %arg5: tensor<f32>): "mhlo.return"(%arg5) : (tensor<f32>) -> () }) {indices_are_sorted = true, scatter_dimension_numbers = #mhlo.scatter<update_window_dims = [0, 1], inserted_window_dims = [1], scatter_dims_to_operand_dims = [1]>, unique_indices = true} : (tensor<10x100x4xf32>, tensor<1xsi32>, tensor<10x4xf32>) -> tensor<10x100x4xf32> blocks = blocks.at[i].set( ...
6
5
1k
Nov ’23