Data Mining Term Project (Fall 2019)

Movie Search and Classifier

Phase 3: Image Search

Demo Link
GitHub Link

Dataset
https://www.kaggle.com/hsankesara/flickr-image-dataset

Pre-processing
The dataset contains 30000 images. For this project, I have selected 2000 images to generate captions.
I am using PorterStemmer and stopwords from nltk package to decreaes the number of unique tokens in the dictionary.
The dictionary is of the format:

Image Captioning
Image captioning is generating textual description or caption for an image. The model architecture being used is similar to Neural Image Caption Generation with Visual Attention. This uses tf.keras and eager execution.

Captions are generated using TensorFlow in google Colab, using this code
The notebook will download the MS-COCO dataset, preprocess and cache a subset of the images using Inception V3, train an encoder-decoder model, and use it to generate captions on new images.

Running Image Captioning Model with CPU hosting platforms
The training of the captioning model takes around 3 to 4 hours when run in Google Colab. We need to save this trained model by saving all the trained model weights. We use tensorflow’s `save_weight(path_to_save, save_format=‘tf’)` method.

We need to save the `encoder`, `decoder` and `decoder.attention` weights.
			
decoder.save_weights('/image_caption_model/decoder.gru', save_format='tf')
encoder.save_weights('/image_caption_model/encoder.gru', save_format='tf')
decoder.attention.save_weights('/image_caption_model/attention.gru', save_format='tf')
			
		
Then, we also need to save the `tokenizer` and other meta-data such as `max_length`, `attention_features_shape`, `embedding_dim`, `units`, `vocab_size` that was changed during preprocessing and training.
We use pickle library in python to save this metadata.
			
with open('/image_caption_model/tokenizer.gru.pickle', 'wb') as handle:
    pickle.dump(tokenizer, handle, protocol=pickle.HIGHEST_PROTOCOL)

meta_dict = {}
meta_dict['max_length'] = max_length
meta_dict['attention_features_shape'] = attention_features_shape
meta_dict['embedding_dim'] = embedding_dim
meta_dict['units'] = units
meta_dict['vocab_size'] = vocab_size

with open('/image_caption_model/meta.gru.pickle', 'wb') as handle:
    pickle.dump(meta_dict, handle, protocol=pickle.HIGHEST_PROTOCOL)
			
		
When generating caption for a new image we create the Encoder, Decoders that were defined in the image captioning models and load these saved weights to those models. Also we load the meta-data using pickle.

When we train our model in google Colab or other computer’s with GPU it uses the CuDNNGRU layer. But when the model is loaded in any other CPU only (or with Cuda incompatible GPUs) machines it uses GRU layer. So our saved weights were based on CuDNNGRU model but loaded on GRU model. So the caption generation will not work.

This can be mitigated by using only GRU layer in Colab when training the model.
			
def gru(units):
	return tf.keras.layers.GRU(units, 
                               return_sequences=True, 
                               return_state=True, 
                               recurrent_activation='sigmoid', 
                               recurrent_initializer='glorot_uniform')
			
		

Contributions
Challenges Faced
References
Links to other sections of the project