This problem seems common enough that there are tons of unsolved posts when you google for it. Unfortunately, there doesn’t seem to be a common solution and my guess is that the problem is caused by a multitude of factors that vary from case to case and that is why it is so hard to find a magical solution for it. In this post, I’ll provide the solution to my particular issue, with the hope that it can be useful to somebody else that is in the same boat as me.
The Problem
I have a Keras/Tensorflow LSTM model that was trained with version 2.4 of the Keras library and it works really well in all the machines where that exact same version is installed. This model uses a couple of Lambda layers so it can’t be exported with the usual save_model() utilities, instead, I save only the weights of the model, then on the target machine, I reconstruct the exact same model from the source, and load the saved weights before a prediction. This works really well as long as the version of all the products involved is exactly the same (Keras, Tensorflow, Python, etc.). The problem happens when trying to use this flow in a newer machine with a more recent version of the packages. I migrated the code to a newer server with more up-to-date Keras, Tensorflow, and python versions and as soon as I tried the LSTM model I got completely wrong predictions for a set of known inputs. I double-checked that the layers were exactly the same, I also inspected all of the weights (all of them) and they were exactly the same, and yet the predictions were different.
The Cause
The big break in finding the solution to this problem was the fact that it only happened in my LSTM models. When using CNNs or MLPs the predictions in the new machine worked exactly as before, so it has to be some kind of hidden “state” in the LSTM that wasn’t being preserved in the new version of the libraries in the new machine (a state that wasn’t saved into the weights file). And sure enough, in this particular case the hidden state were the default values to the Keras LSTM layer, one of them had changed from version 2.4 to version 2.8 and it was a major breaking change (I’m not sure in what version it happened and if it was documented or not). The problem was related to the recurrent_activation parameter, in version 2.4 of the Keras library it defaulted to “hard_sigmoid” and in version 2.8 is now “sigmoid”. Of course, this has a massive effect as the constructed new model has a different functional behavior than the original trained one even though the topology of the net is exactly the same, as well as all the weights used. No wonder the results are different.
The solution
I modified the LSTM model to explicitly set recurrent_activation=”hard_sigmoid” and that solved the issue. I got exactly the same predictions than before. Here is an example in R:
model <- keras_model_sequential()
model %>%
layer_lstm(64, activation=”relu”,
input_shape=c(size,channels),
recurrent_activation =”hard_sigmoid”,
return_sequences = FALSE) %>%
…
I know that this is a very particular solution to the problem but hopefully, it can be useful to anyone that is migrating models to a new version of Keras and it is hitting this same issue. It also highlights the issue of hidden states in default parameters, we should really be as explicit as possible when specifying the layers to avoid potential breaking changes in futures versions.