My environment: Tensorflow: 2.14, tf-metal: 1.1, M3 Max
I am working on an GAN full of residual sum and concatenation. It is trained correctly if using CPU only. However, if I enable GPU, it would cause:
oc("mps_slice_1"("(mpsFileLoc): /AppleInternal/Library/BuildRoots/d615290d-668b-11ee-9734-0697ca55970a/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShadersGraph/mpsgraph/MetalPerformanceShadersGraph/Core/Files/MPSGraphUtilities.mm":359:0)): error: 'mps.slice' op failed: length value 32 does not fit within the dimension size (33) with start value (32) /AppleInternal/Library/BuildRoots/d615290d-668b-11ee-9734-0697ca55970a/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShadersGraph/mpsgraph/MetalPerformanceShadersGraph/Core/Files/MPSGraphExecutable.mm:2133: failed assertion `Error: MLIR pass manager failed'
Some customization I guess might be related to the error:
- tf.bitwise.bitwise_xor, tf.concat, tf.pad in custom layers
- numpy.random in train steps.
Another debug hint I found is that the "32" is the number of channel of my models' conv layer, and change as I change the number of channel.
Is there anyone know what is wrong? Thank you so much