I created a macOS 14 VM using https://github.com/s-u/macosvm which uses the Virtualization Framework. I want to check if I can use paravirtualized graphics for tensorflow workloads.
I followed the steps from https://developer.apple.com/metal/tensorflow-plugin/ but when I run the script from step 4. Verify, I get a segmentation fault (see below).
Did anyone try to get this kind of GPU compute in a VM and succeed?
/Users/teuf/venv-metal/lib/python3.9/site-packages/urllib3/__init__.py:34: NotOpenSSLWarning: urllib3 v2 only supports OpenSSL 1.1.1+, currently the 'ssl' module is compiled with 'LibreSSL 2.8.3'. See: https://github.com/urllib3/urllib3/issues/3020
warnings.warn(
2023-11-20 07:41:11.723578: I metal_plugin/src/device/metal_device.cc:1154] Metal device set to: Apple Paravirtual device
2023-11-20 07:41:11.723620: I metal_plugin/src/device/metal_device.cc:296] systemMemory: 10.00 GB
2023-11-20 07:41:11.723626: I metal_plugin/src/device/metal_device.cc:313] maxCacheSize: 0.50 GB
2023-11-20 07:41:11.723700: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:306] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2023-11-20 07:41:11.723968: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:272] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: <undefined>)
zsh: segmentation fault python3 ./tensorflow-test.py
Thread 0 Crashed:: Dispatch queue: metal gpu stream
0 MPSCore 0x1999598f8 MPSDevice::GetMPSLibrary_DoNotUse(MPSLibraryInfo const*) + 92
1 MPSCore 0x19995c544 0x199927000 + 218436
2 MPSCore 0x19995c908 0x199927000 + 219400
3 MetalPerformanceShadersGraph 0x1fb696a58 0x1fb583000 + 1129048
4 MetalPerformanceShadersGraph 0x1fb6f0cc8 0x1fb583000 + 1498312
5 MetalPerformanceShadersGraph 0x1fb6ef2dc 0x1fb583000 + 1491676
6 MetalPerformanceShadersGraph 0x1fb717ea0 0x1fb583000 + 1658528
7 MetalPerformanceShadersGraph 0x1fb717ce4 0x1fb583000 + 1658084
8 MetalPerformanceShadersGraph 0x1fb6edaac 0x1fb583000 + 1485484
9 MetalPerformanceShadersGraph 0x1fb7a85e0 0x1fb583000 + 2250208
10 MetalPerformanceShadersGraph 0x1fb7a79f0 0x1fb583000 + 2247152
11 MetalPerformanceShadersGraph 0x1fb6602b4 0x1fb583000 + 905908
12 MetalPerformanceShadersGraph 0x1fb65f7b0 0x1fb583000 + 903088
13 libmetal_plugin.dylib 0x1156dfdcc invocation function for block in metal_plugin::runMPSGraph(MetalStream*, MPSGraph*, NSDictionary*, NSDictionary*) + 164
14 libdispatch.dylib 0x18e79b910 _dispatch_client_callout + 20
15 libdispatch.dylib 0x18e7aacc4 _dispatch_lane_barrier_sync_invoke_and_complete + 56
16 libmetal_plugin.dylib 0x1156dfd14 metal_plugin::runMPSGraph(MetalStream*, MPSGraph*, NSDictionary*, NSDictionary*) + 108
17 libmetal_plugin.dylib 0x115606634 metal_plugin::MPSStatelessRandomUniformOp<float>::ProduceOutput(metal_plugin::OpKernelContext*, metal_plugin::Tensor*) + 876
18 libmetal_plugin.dylib 0x115607620 metal_plugin::MPSStatelessRandomOpBase::Compute(metal_plugin::OpKernelContext*) + 620
19 libmetal_plugin.dylib 0x1156061f8 void metal_plugin::ComputeOpKernel<metal_plugin::MPSStatelessRandomUniformOp<float>>(void*, TF_OpKernelContext*) + 44
20 libtensorflow_framework.2.dylib 0x10b807354 tensorflow::PluggableDevice::Compute(tensorflow::OpKernel*, tensorflow::OpKernelContext*) + 148
21 libtensorflow_framework.2.dylib 0x10b7413e0 tensorflow::(anonymous namespace)::SingleThreadedExecutorImpl::Run(tensorflow::Executor::Args const&) + 2100
22 libtensorflow_framework.2.dylib 0x10b70b820 tensorflow::FunctionLibraryRuntimeImpl::RunSync(tensorflow::FunctionLibraryRuntime::Options, unsigned long long, absl::lts_20230125::Span<tensorflow::Tensor const>, std::__1::vector<tensorflow::Tensor, std::__1::allocator<tensorflow::Tensor>>*) + 420
23 libtensorflow_framework.2.dylib 0x10b715668 tensorflow::ProcessFunctionLibraryRuntime::RunMultiDeviceSync(tensorflow::FunctionLibraryRuntime::Options const&, unsigned long long, std::__1::vector<std::__1::variant<tensorflow::Tensor, tensorflow::TensorShape>, std::__1::allocator<std::__1::variant<tensorflow::Tensor, tensorflow::TensorShape>>>*, std::__1::function<absl::lts_20230125::Status (tensorflow::ProcessFunctionLibraryRuntime::ComponentFunctionData const&, tensorflow::ProcessFunctionLibraryRuntime::InternalArgs*)>) const + 1336
24 libtensorflow_framework.2.dylib 0x10b71a8a4 tensorflow::ProcessFunctionLibraryRuntime::RunSync(tensorflow::FunctionLibraryRuntime::Options const&, unsigned long long, absl::lts_20230125::Span<tensorflow::Tensor const>, std::__1::vector<tensorflow::Tensor, std::__1::allocator<tensorflow::Tensor>>*) const + 848
25 libtensorflow_cc.2.dylib 0x2801b5008 tensorflow::KernelAndDeviceFunc::Run(tensorflow::ScopedStepContainer*, tensorflow::EagerKernelArgs const&, std::__1::vector<std::__1::variant<tensorflow::Tensor, tensorflow::TensorShape>, std::__1::allocator<std::__1::variant<tensorflow::Tensor, tensorflow::TensorShape>>>*, tsl::CancellationManager*, std::__1::optional<tensorflow::EagerFunctionParams> const&, std::__1::optional<tensorflow::ManagedStackTrace> const&, tsl::CoordinationServiceAgent*) + 572
26 libtensorflow_cc.2.dylib 0x28016613c tensorflow::EagerKernelExecute(tensorflow::EagerContext*, absl::lts_20230125::InlinedVector<tensorflow::TensorHandle*, 4ul, std::__1::allocator<tensorflow::TensorHandle*>> const&, std::__1::optional<tensorflow::EagerFunctionParams> const&, tsl::core::RefCountPtr<tensorflow::KernelAndDevice> const&, tensorflow::GraphCollector*, tsl::CancellationManager*, absl::lts_20230125::Span<tensorflow::TensorHandle*>, std::__1::optional<tensorflow::ManagedStackTrace> const&) + 452
27 libtensorflow_cc.2.dylib 0x2801708ec tensorflow::ExecuteNode::Run() + 396
28 libtensorflow_cc.2.dylib 0x2801b0118 tensorflow::EagerExecutor::SyncExecute(tensorflow::EagerNode*) + 244
29 libtensorflow_cc.2.dylib 0x280165ac8 tensorflow::(anonymous namespace)::EagerLocalExecute(tensorflow::EagerOperation*, tensorflow::TensorHandle**, int*) + 2580
30 libtensorflow_cc.2.dylib 0x2801637a8 tensorflow::DoEagerExecute(tensorflow::EagerOperation*, tensorflow::TensorHandle**, int*) + 416
31 libtensorflow_cc.2.dylib 0x2801631e8 tensorflow::EagerOperation::Execute(absl::lts_20230125::Span<tensorflow::AbstractTensorHandle*>, int*) + 132
tensorflow-metal
RSS for tagTensorFlow accelerates machine learning model training with Metal on Mac GPUs.
Posts under tensorflow-metal tag
104 Posts
Sort by:
Post
Replies
Boosts
Views
Activity
https://developer.apple.com/metal/tensorflow-plugin/
In verify, I have put the code into the terminal, but it gives the following error:
zsh: parse error near`,'
tensorflow.python.framework.errors_impl.InvalidArgumentError: Cannot assign a device for operation tf_bert_for_sequence_classification/bert/embeddings/Gather: Could not satisfy explicit device specification '' because the node {{colocation_node tf_bert_for_sequence_classification/bert/embeddings/Gather}} was colocated with a group of nodes that required incompatible device '/job:localhost/replica:0/task:0/device:GPU:0'. All available devices [/job:localhost/replica:0/task:0/device:CPU:0, /job:localhost/replica:0/task:0/device:GPU:0].
Colocation Debug Info:
Colocation group had the following types and supported devices:
Root Member(assigned_device_name_index_=2 requested_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' assigned_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' resource_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' supported_device_types_=[CPU] possible_devices_=[]
Working Environment
MacBook Pro 14' with M2-Pro chip
macOS Sonoma 14.0
Python 3.11.4
tensorflow 2.14.0, tensorflow-macos 2.14.0, tensorflow-metal 1.1.0
Issue Description
Hi there! I met an issue when working around with Keras' TextVectorization preprocessing layer.
text_vectorization = keras.layers.TextVectorization(output_mode="tf_idf")
text_vectorization.adapt(ds.map(lambda x: x['title']))
The inputs are string contents. And here is the trackback:
---------------------------------------------------------------------------
NotFoundError Traceback (most recent call last)
/Users/ken/Workspaces/MLE101/tfrs101/preprocess.ipynb Cell 13 line 3
1 # with tf.device('/CPU:0'):
2 text_vectorization = keras.layers.TextVectorization(output_mode="tf_idf")
----> 3 text_vectorization.adapt(ds.map(lambda x: x['title']))
File ~/miniconda3/envs/ds-101/lib/python3.11/site-packages/keras/src/layers/preprocessing/text_vectorization.py:473, in TextVectorization.adapt(self, data, batch_size, steps)
423 def adapt(self, data, batch_size=None, steps=None):
424 """Computes a vocabulary of string terms from tokens in a dataset.
425
426 Calling `adapt()` on a `TextVectorization` layer is an alternative to
(...)
471 argument is not supported with array inputs.
472 """
--> 473 super().adapt(data, batch_size=batch_size, steps=steps)
File ~/miniconda3/envs/ds-101/lib/python3.11/site-packages/keras/src/engine/base_preprocessing_layer.py:258, in PreprocessingLayer.adapt(self, data, batch_size, steps)
256 with data_handler.catch_stop_iteration():
257 for _ in data_handler.steps():
--> 258 self._adapt_function(iterator)
259 if data_handler.should_sync:
260 context.async_wait()
File ~/miniconda3/envs/ds-101/lib/python3.11/site-packages/tensorflow/python/util/traceback_utils.py:153, in filter_traceback.<locals>.error_handler(*args, **kwargs)
151 except Exception as e:
152 filtered_tb = _process_traceback_frames(e.__traceback__)
--> 153 raise e.with_traceback(filtered_tb) from None
154 finally:
155 del filtered_tb
File ~/miniconda3/envs/ds-101/lib/python3.11/site-packages/tensorflow/python/eager/execute.py:60, in quick_execute(op_name, num_outputs, inputs, attrs, ctx, name)
53 # Convert any objects of type core_types.Tensor to Tensor.
54 inputs = [
55 tensor_conversion_registry.convert(t)
56 if isinstance(t, core_types.Tensor)
57 else t
58 for t in inputs
59 ]
---> 60 tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
61 inputs, attrs, num_outputs)
62 except core._NotOkStatusException as e:
63 if name is not None:
NotFoundError: Graph execution error:
Detected at node StringSplit/stack defined at (most recent call last):
...
No registered 'ExpandDims' OpKernel for 'GPU' devices compatible with node {{node StringSplit/stack}}
(OpKernel was found, but attributes didn't match) Requested Attributes: T=DT_STRING, Tdim=DT_INT32, _XlaHasReferenceVars=false, _device="/job:localhost/replica:0/task:0/device:GPU:0"
. Registered: device='XLA_CPU_JIT'; Tdim in [DT_INT32, DT_INT64]; T in [DT_FLOAT, DT_DOUBLE, DT_INT32, DT_UINT8, DT_INT16, DT_INT8, DT_COMPLEX64, DT_INT64, DT_BOOL, DT_QINT8, DT_QUINT8, DT_QINT32, DT_BFLOAT16, DT_UINT16, DT_COMPLEX128, DT_HALF, DT_UINT32, DT_UINT64, DT_FLOAT8_E5M2, DT_FLOAT8_E4M3FN]
device='DEFAULT'; T in [DT_HALF]; Tdim in [DT_INT32]
device='DEFAULT'; T in [DT_HALF]; Tdim in [DT_INT64]
device='DEFAULT'; T in [DT_BFLOAT16]; Tdim in [DT_INT32]
device='DEFAULT'; T in [DT_BFLOAT16]; Tdim in [DT_INT64]
device='DEFAULT'; T in [DT_FLOAT]; Tdim in [DT_INT32]
device='DEFAULT'; T in [DT_FLOAT]; Tdim in [DT_INT64]
device='DEFAULT'; T in [DT_DOUBLE]; Tdim in [DT_INT32]
device='DEFAULT'; T in [DT_DOUBLE]; Tdim in [DT_INT64]
device='DEFAULT'; T in [DT_UINT64]; Tdim in [DT_INT32]
device='DEFAULT'; T in [DT_UINT64]; Tdim in [DT_INT64]
device='DEFAULT'; T in [DT_INT64]; Tdim in [DT_INT32]
device='DEFAULT'; T in [DT_INT64]; Tdim in [DT_INT64]
device='DEFAULT'; T in [DT_UINT32]; Tdim in [DT_INT32]
device='DEFAULT'; T in [DT_UINT32]; Tdim in [DT_INT64]
device='DEFAULT'; T in [DT_UINT16]; Tdim in [DT_INT32]
device='DEFAULT'; T in [DT_UINT16]; Tdim in [DT_INT64]
device='DEFAULT'; T in [DT_INT16]; Tdim in [DT_INT32]
device='DEFAULT'; T in [DT_INT16]; Tdim in [DT_INT64]
device='DEFAULT'; T in [DT_UINT8]; Tdim in [DT_INT32]
device='DEFAULT'; T in [DT_UINT8]; Tdim in [DT_INT64]
device='DEFAULT'; T in [DT_INT8]; Tdim in [DT_INT32]
device='DEFAULT'; T in [DT_INT8]; Tdim in [DT_INT64]
device='DEFAULT'; T in [DT_COMPLEX64]; Tdim in [DT_INT32]
device='DEFAULT'; T in [DT_COMPLEX64]; Tdim in [DT_INT64]
device='DEFAULT'; T in [DT_COMPLEX128]; Tdim in [DT_INT32]
device='DEFAULT'; T in [DT_COMPLEX128]; Tdim in [DT_INT64]
device='DEFAULT'; T in [DT_BOOL]; Tdim in [DT_INT32]
device='DEFAULT'; T in [DT_BOOL]; Tdim in [DT_INT64]
device='DEFAULT'; T in [DT_INT32]; Tdim in [DT_INT32]
device='DEFAULT'; T in [DT_INT32]; Tdim in [DT_INT64]
device='CPU'; Tdim in [DT_INT32]
device='CPU'; Tdim in [DT_INT64]
[[StringSplit/stack]] [Op:__inference_adapt_step_71204]
I have to explicitly specify to use CPU to make it work -
with tf.device('/CPU:0'):
text_vectorization = keras.layers.TextVectorization(output_mode="tf_idf")
text_vectorization.adapt(ds.map(lambda x: x['title']))
I have referred to this post: https://developer.apple.com/forums/thread/700108
Hi,
I've been going over this tutorial of autoencoders
https://www.tensorflow.org/tutorials/generative/autoencoder#third_example_anomaly_detection
Notebook link
https://colab.research.google.com/github/tensorflow/docs/blob/master/site/en/tutorials/generative/autoencoder.ipynb
And when I downloaded and ran the notebook locally on my M2 Pro Max - the results were dramatically different and the plots were way off.
This is the plot in the working notebook:
This is the local plot:
I checked every moving piece and the difference seems to be in the output of the autoencoder, these lines:
encoded_data = autoencoder.encoder(normal_test_data).numpy()
decoded_data = autoencoder.decoder(encoded_data).numpy()
The working notebook output is:
The local output:
And the overall result is
notebook:
Accuracy = 0.944
Precision = 0.9941176470588236
Recall = 0.9053571428571429
local:
Accuracy = 0.44
Precision = 0.0
Recall = 0.0
I'm using Mac M2 Pro Max
Python 3.10.12
Tensorflow 2.14.0
Can anyone help?
Thanks a lot in advance.
`print("Hello")
import tensorflow as tf`
I have an error during installing tensorflow "Process finished with exit code 132 (interrupted by signal 4: SIGILL)"
Mac air 2022 M2 14.1 | Tensorflow latest version | Python version 3.11.5
Who can help me please?
I have tried different variants of tensorflow (for Mac, for cpu and other versions). Also I have tried anaconda and miniconda but I can't. Process finished with exit code 132 (interrupted by signal 4: SIGILL)
I have tried different variants of tensorflow (for Mac, for cpu and other versions).
Also I have used anaconda and miniconda but I can't.
Process finished with exit code 132 (interrupted by signal 4: SIGILL)
I have tried too many different variants. I've tried every version of module tensorflow (for Mac, for cpu...)
I have tried anaconda and miniconda. At the result I can't do that. Please help me
I've been running tensorflow with python 3.9 to training a CNN model, and this process is accelerated by the GPU.
After 80 epochs the process went to sleep (status S) and its GPU usage drops to 0 percent, I am wondering if this traing process crashed the GPU or the OS is mandatating the process to go to sleep because it takes up too much GPU time?
Thanks a lot!
I have been following the instructions here: https://developer.apple.com/metal/tensorflow-plugin/
I manage to execute step 1 set up the environment, step 2 install base Tensorflow but when I try to execute step 3 Install tensorflow-metal plug-in with the line "python -m pip install tensorflow-metal", I get the following messages:
"ERROR: Could not find a version that satisfies the requirement tensorflow-metal (from versions: none)
ERROR: No matching distribution found for tensorflow-metal"
What am I missing here?
So the code used are as follows:
Step 1
python3 -m venv ~/venv-metal
source ~/venv-metal/bin/activate
python -m pip install -U pip
Step 2
python -m pip install tensorflow
Step 3
python -m pip install tensorflow-metal
Hi,
When I try to train resnet-50 with tensorflow-metal I found the l2 regularizer makes each epoch take almost 4x as long (~220ms instead of 60ms). I'm on a M1 Max 16" MBP. It seems like regularization shouldn't add that much time, is there anything I can do to make it faster?
Here's some sample code that reproduces the issue:
import tensorflow as tf
from tensorflow.keras.layers import Input, Conv2D, MaxPooling2D, ZeroPadding2D,\
Flatten, BatchNormalization, AveragePooling2D, Dense, Activation, Add
from tensorflow.keras.regularizers import l2
from tensorflow.keras.models import Model
from tensorflow.keras import activations
import random
import numpy as np
random.seed(1234)
np.random.seed(1234)
tf.random.set_seed(1234)
batch_size = 64
(train_im, train_lab), (test_im, test_lab) = tf.keras.datasets.cifar10.load_data()
train_im, test_im = train_im/255.0 , test_im/255.0
train_lab_categorical = tf.keras.utils.to_categorical(
train_lab, num_classes=10, dtype='uint8')
train_DataGen = tf.keras.preprocessing.image.ImageDataGenerator()
train_set_data = train_DataGen.flow(train_im, train_lab, batch_size=batch_size, shuffle=False)
# Change this to l2 for it to train much slower
regularizer = None # l2(0.001)
def res_identity(x, filters):
x_skip = x
f1, f2 = filters
x = Conv2D(f1, kernel_size=(1, 1), strides=(1, 1), padding='valid', use_bias=False, kernel_regularizer=regularizer)(x)
x = BatchNormalization()(x)
x = Activation(activations.relu)(x)
x = Conv2D(f1, kernel_size=(3, 3), strides=(1, 1), padding='same', use_bias=False, kernel_regularizer=regularizer)(x)
x = BatchNormalization()(x)
x = Activation(activations.relu)(x)
x = Conv2D(f2, kernel_size=(1, 1), strides=(1, 1), padding='valid', use_bias=False, kernel_regularizer=regularizer)(x)
x = BatchNormalization()(x)
x = Add()([x, x_skip])
x = Activation(activations.relu)(x)
return x
def res_conv(x, s, filters):
x_skip = x
f1, f2 = filters
x = Conv2D(f1, kernel_size=(1, 1), strides=(s, s), padding='valid', use_bias=False, kernel_regularizer=regularizer)(x)
x = BatchNormalization()(x)
x = Activation(activations.relu)(x)
x = Conv2D(f1, kernel_size=(3, 3), strides=(1, 1), padding='same', use_bias=False, kernel_regularizer=regularizer)(x)
x = BatchNormalization()(x)
x = Activation(activations.relu)(x)
x = Conv2D(f2, kernel_size=(1, 1), strides=(1, 1), padding='valid', use_bias=False, kernel_regularizer=regularizer)(x)
x = BatchNormalization()(x)
x_skip = Conv2D(f2, kernel_size=(1, 1), strides=(s, s), padding='valid', use_bias=False, kernel_regularizer=regularizer)(x_skip)
x_skip = BatchNormalization()(x_skip)
x = Add()([x, x_skip])
x = Activation(activations.relu)(x)
return x
input = Input(shape=(train_im.shape[1], train_im.shape[2], train_im.shape[3]), batch_size=batch_size)
x = ZeroPadding2D(padding=(3, 3))(input)
x = Conv2D(64, kernel_size=(7, 7), strides=(2, 2), use_bias=False)(x)
x = BatchNormalization()(x)
x = Activation(activations.relu)(x)
x = MaxPooling2D((3, 3), strides=(2, 2))(x)
x = res_conv(x, s=1, filters=(64, 256))
x = res_identity(x, filters=(64, 256))
x = res_identity(x, filters=(64, 256))
x = res_conv(x, s=2, filters=(128, 512))
x = res_identity(x, filters=(128, 512))
x = res_identity(x, filters=(128, 512))
x = res_identity(x, filters=(128, 512))
x = res_conv(x, s=2, filters=(256, 1024))
x = res_identity(x, filters=(256, 1024))
x = res_identity(x, filters=(256, 1024))
x = res_identity(x, filters=(256, 1024))
x = res_identity(x, filters=(256, 1024))
x = res_identity(x, filters=(256, 1024))
x = res_conv(x, s=2, filters=(512, 2048))
x = res_identity(x, filters=(512, 2048))
x = res_identity(x, filters=(512, 2048))
x = AveragePooling2D((2, 2), padding='same')(x)
x = Flatten()(x)
x = Dense(10, activation='softmax', kernel_initializer='he_normal')(x)
model = Model(inputs=input, outputs=x, name='Resnet50')
opt = tf.keras.optimizers.legacy.SGD(learning_rate = 0.01)
model.compile(loss=tf.keras.losses.CategoricalCrossentropy(reduction=tf.keras.losses.Reduction.SUM_OVER_BATCH_SIZE), optimizer=opt)
model.fit(x=train_im, y=train_lab_categorical, batch_size=batch_size, epochs=150, steps_per_epoch=train_im.shape[0]/batch_size)
Hi. I have followed the instructions here to install tensorflow with GPU support for my 16inch 2019 intel macbook pro (with AMD graphic). The installation process seems to be successful (I get no error) but, when I try to test it, after running import tensorflow as tf I get the following error:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/mahonik/.virtualenvs/tf-metal-new/lib/python3.11/site-packages/tensorflow/__init__.py", line 445, in <module>
_ll.load_library(_plugin_dir)
File "/Users/mahonik/.virtualenvs/tf-metal-new/lib/python3.11/site-packages/tensorflow/python/framework/load_library.py", line 151, in load_library
py_tf.TF_LoadLibrary(lib)
tensorflow.python.framework.errors_impl.NotFoundError: dlopen(/Users/mahonik/.virtualenvs/tf-metal-new/lib/python3.11/site-packages/tensorflow-plugins/libmetal_plugin.dylib, 0x0006): Symbol not found: __ZN10tensorflow16TensorShapeProtoC1ERKS0_
Referenced from: <C62E0AB4-567E-3E14-8F96-9F07A746C4DC> /Users/mahonik/.virtualenvs/tf-metal-new/lib/python3.11/site-packages/tensorflow-plugins/libmetal_plugin.dylib
Expected in: <0B1F231A-6766-3F61-81D9-6782129807A9> /Users/mahonik/.virtualenvs/tf-metal-new/lib/python3.11/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so
My env's packages
...
numpy 1.26.1
tensorboard 2.14.1
tensorboard-data-server 0.7.1
tensorflow 2.14.0
tensorflow-estimator 2.14.0
tensorflow-io-gcs-filesystem 0.34.0
tensorflow-metal 1.0.0
...
I can train a yolov3 at MacOS M2 ventura with tensorflow-macos=2.9.0 and tensorflow-mental=0.5. But when I upgrade the system to Sonoma14.0. I can not train model with below error.
I could train MacOS M1 even I upgrade to Sonoma 14.0 although it report - error: 'anec.gain_offset_control' op. But M1 there is no error for last - `MPSKernel MTLComputePipelineStateCache unable to load function ndArrayConvolution2DGradientWithWeightsA14. Compute function exceeds available temporary registers: (null)
When I change my optimizer from Adam to SGD. - error: 'anec.gain_offset_control' op will disappear. So this error happen due something in Adam. But for error - `MPSKernel MTLComputePipelineStateCache unable to load function ndArrayConvolution2DGradientWithWeightsA14. Compute function exceeds available temporary registers: (null) I can not resolve it.
ERROR Info
MetalPerformanceShadersGraph/mpsgraph/MetalPerformanceShadersGraph/Core/Files/MPSGraphUtilities.mm":294:0)): error: 'anec.gain_offset_control' op result #0 must be 4D/5D memref of 16-bit float or 8-bit signed integer or 8-bit unsigned integer values, but got 'memref<1x1x1x1xi1>'
loc("mps_select"("(mpsFileLoc): /AppleInternal/Library/BuildRoots/75428952-3aa4-11ee-8b65-46d450270006/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShadersGraph/mpsgraph/MetalPerformanceShadersGraph/Core/Files/MPSGraphUtilities.mm":294:0)): error: 'anec.gain_offset_control' op result #0 must be 4D/5D memref of 16-bit float or 8-bit signed integer or 8-bit unsigned integer values, but got 'memref<1x1x1x1xi1>'
loc("mps_select"("(mpsFileLoc): /AppleInternal/Library/BuildRoots/75428952-3aa4-11ee-8b65-46d450270006/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShadersGraph/mpsgraph/MetalPerformanceShadersGraph/Core/Files/MPSGraphUtilities.mm":294:0)): error: 'anec.gain_offset_control' op result #0 must be 4D/5D memref of 16-bit float or 8-bit signed integer or 8-bit unsigned integer values, but got 'memref<1x1x1x1xi1>'
loc("mps_select"("(mpsFileLoc): /AppleInternal/Library/BuildRoots/75428952-3aa4-11ee-8b65-46d450270006/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShadersGraph/mpsgraph/MetalPerformanceShadersGraph/Core/Files/MPSGraphUtilities.mm":294:0)): error: 'anec.gain_offset_control' op result #0 must be 4D/5D memref of 16-bit float or 8-bit signed integer or 8-bit unsigned integer values, but got 'memref<1x1x1x1xi1>'
loc("mps_select"("(mpsFileLoc): /AppleInternal/Library/BuildRoots/75428952-3aa4-11ee-8b65-46d450270006/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShadersGraph/mpsgraph/MetalPerformanceShadersGraph/Core/Files/MPSGraphUtilities.mm":294:0)): error: 'anec.gain_offset_control' op result #0 must be 4D/5D memref of 16-bit float or 8-bit signed integer or 8-bit unsigned integer values, but got 'memref<1x1x1x1xi1>'
/AppleInternal/Library/BuildRoots/90c9c1ae-37b6-11ee-a991-46d450270006/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShaders/MPSCore/Utility/MPSLibrary.mm:550: failed assertion `MPSKernel MTLComputePipelineStateCache unable to load function ndArrayConvolution2DGradientWithWeightsA14.
Compute function exceeds available temporary registers: (null)
I am trying hard to get some Whisper software running on Mac under jax. However, this requires jaxlib>=0.4.14. The current metal-jax requires jaxlib==0.4.11.
Anyone knows if there is any planned upgrade?
Will TensorFlow-Metal and JAX-Metal code be open sourced?
Reasons why I ask:
If it is open sourced on GitHub or something it might make it easier for people to find issues and create new ones if necessary, also the open source community might be able to help ;)
I'd love to learn about how you guys implement some of these operations :P (I know you guys made an Apple tutorial on how to implement TensorFlow custom op for Metal which was fire https://developer.apple.com/documentation/metal/metal_sample_code_library/customizing_a_tensorflow_operation)
Good evening!
Tried to use Flax nn.ConvTranspose which calls jax.lax.conv_transpose but it looks like it isn't implemented correctly for the METAL backend, works fine on CPU.
File "/Users/cemlyn/Documents/VCLless/mnist_vae/venv/lib/python3.11/site-packages/flax/linen/linear.py", line 768, in __call__
y = lax.conv_transpose(
^^^^^^^^^^^^^^^^^^^
jaxlib.xla_extension.XlaRuntimeError: UNKNOWN: <unknown>:0: error: type of return operand 0 ('tensor<1x8x8x64xf32>') doesn't match function result type ('tensor<1x14x14x64xf32>') in function @main
<unknown>:0: note: see current operation: "func.return"(%0) : (tensor<1x8x8x64xf32>) -> ()
Versions:
pip list | grep jax
jax 0.4.11
jax-metal 0.0.4
jaxlib 0.4.11
Hi,
Are there plans to support complex numbers?
Something simple like this:
def return_complex(x):
return x*1+1.0j
x = jnp.ones((10))
print(return_complex(x))
results in an error.
Hi,
following instructions at https://developer.apple.com/metal/jax/, jax works fine on M1 pro. However, only in Terminal. If you run Jupyter Notebook or Pycharm, the following always defaults to CPU.
from jax.lib import xla_bridge
print(xla_bridge.get_backend().platform)
I also notice that if you restart the Terminal, jax defaults to CPU only. You need to always set the virtual environment to jax-meta first to get Apple Silicon's GPU work:
python3 -m venv ~/jax-metal
source ~/jax-metal/bin/activate
Is there any way to make sure that Jupyter Notebook and other IDEs default to jax-metal? I'm currently only able to use it in Terminal after each time manually setting the virtual environment to jax-metal, which is annoying.
Trying to setup Tensorflow on mac M1.
conda install -c apple tensorflow-deps throwing following error:
UnsatisfiableError: The following specifications were found to be incompatible with each other:
Output in format: Requested package -> Available versions following specifications were found to be incompatible with your system:
- feature:/osx-arm64::__osx==13.6=0
- tensorflow-deps -> grpcio[version='>=1.37.0,<2.0'] -> __osx[version='>=10.10|>=10.9']
Your installed version is: 13.6
The .condarc as follows:
channels:
- defaults
subdirs:
- osx-arm64
- osx-64
- noarch
ssl_verify: false
subdir: osx-arm64
And conda info:
active environment : base
active env location : /Users/mdrahman/miniconda3
shell level : 1
user config file : /Users/mdrahman/.condarc
populated config files : /Users/mdrahman/.condarc
conda version : 23.5.2
conda-build version : not installed
python version : 3.11.4.final.0
virtual packages : __archspec=1=arm64
__osx=13.6=0
__unix=0=0
base environment : /Users/mdrahman/miniconda3 (writable)
conda av data dir : /Users/mdrahman/miniconda3/etc/conda
conda av metadata url : None
channel URLs : https://repo.anaconda.com/pkgs/main/osx-arm64
https://repo.anaconda.com/pkgs/main/osx-64
https://repo.anaconda.com/pkgs/main/noarch
https://repo.anaconda.com/pkgs/r/osx-arm64
https://repo.anaconda.com/pkgs/r/osx-64
https://repo.anaconda.com/pkgs/r/noarch
package cache : /Users/mdrahman/miniconda3/pkgs
/Users/mdrahman/.conda/pkgs
envs directories : /Users/mdrahman/miniconda3/envs
/Users/mdrahman/.conda/envs
platform : osx-arm64
user-agent : conda/23.5.2 requests/2.29.0 CPython/3.11.4 Darwin/22.6.0 OSX/13.6
UID:GID : 501:20
netrc file : None
offline mode : False```
Looking forward for your support.
I only get this error when using the JAX Metal device (CPU is fine). It seems to be a problem whenever I want to modify values of an array in-place using at and set.
note: see current operation:
%2903 = "mhlo.scatter"(%arg3, %2902, %2893) ({
^bb0(%arg4: tensor<f32>, %arg5: tensor<f32>):
"mhlo.return"(%arg5) : (tensor<f32>) -> ()
}) {indices_are_sorted = true, scatter_dimension_numbers = #mhlo.scatter<update_window_dims = [0, 1], inserted_window_dims = [1], scatter_dims_to_operand_dims = [1]>, unique_indices = true} : (tensor<10x100x4xf32>, tensor<1xsi32>, tensor<10x4xf32>) -> tensor<10x100x4xf32>
blocks = blocks.at[i].set(
...