Every second Epoch takes zero seconds using imagedatagenerator and tf

Question

Created Apr ’24

Replies 0

Boosts 0

Participants 1

When fitting a CNN model, every second Epoch takes zero seconds and with OUT_OF_RANGE warnings. Im using structured folders of categorical images for training and validation. Here is the warning message that occurs after every second Epoch. The fitting looks like this...

37/37 ━━━━━━━━━━━━━━━━━━━━ 14s 337ms/step - accuracy: 0.5255 - loss: 1.0819 - val_accuracy: 0.2578 - val_loss: 2.4472
Epoch 4/20
37/37 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - accuracy: 0.5312 - loss: 1.1106 - val_accuracy: 0.1250 - val_loss: 3.0711
Epoch 5/20
2024-04-19 09:22:51.673909: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence
     [[{{node IteratorGetNext}}]]
2024-04-19 09:22:51.673928: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence
     [[{{node IteratorGetNext}}]]
     [[IteratorGetNext/_59]]
2024-04-19 09:22:51.673940: I tensorflow/core/framework/local_rendezvous.cc:422] Local rendezvous recv item cancelled. Key hash: 10431687783238222105
2024-04-19 09:22:51.673944: I tensorflow/core/framework/local_rendezvous.cc:422] Local rendezvous recv item cancelled. Key hash: 17360824274615977051
2024-04-19 09:22:51.673955: I tensorflow/core/framework/local_rendezvous.cc:422] Local rendezvous recv item cancelled. Key hash: 10732905483452597729

My setup is..

Tensor Flow Version: 2.16.1

Python 3.9.19 (main, Mar 21 2024, 12:07:41)

[Clang 14.0.6 ]

Pandas 2.2.2 Scikit-Learn 1.4.2 GPU is available

My generator is..

train_generator = datagen.flow_from_directory(
    scalp_dir_train,  #  directory
    target_size=(256, 256),# all images found will be resized
    batch_size=32,
    class_mode='categorical'
    #subset='training'  # Specify the subset as training
)

n_samples = train_generator.samples # gets the number of samples

validation_generator = datagen.flow_from_directory(
    scalp_dir_test,  #  directory path
    target_size=(256, 256),
    batch_size=32,
    class_mode='categorical'
    #subset='validation'  # Specifying the subset as validation

Here is my model.

early_stopping_monitor = EarlyStopping(patience = 10,restore_best_weights=True)

from tensorflow.keras.optimizers import Adam
from tensorflow.keras.optimizers import SGD

optimizer = Adam(learning_rate=0.01)

model = Sequential()

model.add(Conv2D(128, (3, 3), activation='relu',padding='same', input_shape=(256, 256, 3)))
model.add(BatchNormalization())
model.add(MaxPooling2D((2, 2)))
model.add(Dropout(0.3))

model.add(Conv2D(64, (3, 3),padding='same', activation='relu'))
model.add(BatchNormalization())
model.add(MaxPooling2D((2, 2)))
model.add(Dropout(0.3))

model.add(Flatten())

model.add(Dense(512, activation='relu'))
model.add(BatchNormalization())
model.add(Dropout(0.4))

model.add(Dense(256, activation='relu'))
model.add(BatchNormalization())
model.add(Dropout(0.3))


model.add(Dense(4, activation='softmax'))  # Defined by the number of classes

model.compile(optimizer=optimizer, loss='categorical_crossentropy', metrics=['accuracy'])

Here is the fit...

history=model.fit(
    train_generator,
    steps_per_epoch=37,
    epochs=20,
    validation_data=validation_generator,
    validation_steps=12,
    callbacks=[early_stopping_monitor]
    #verbose=2
)

Boost