Data scientist Prakash Jay introduced the principles of migration learning, the implementation of migration learning based on Keras, and common situations of migration learning.


What is migration learning?

Migration learning problems in machine learning, focusing on how to save the knowledge gained when solving a problem and apply it to another related and different problem.

Why migrate learning?

In practice, few people train a convolutional network from scratch because it is difficult to obtain enough data sets. Using pre-trained networks helps solve most of the problems at hand.

Training deep network is costly. Even with hundreds of machines equipped with expensive GPUs, it takes many weeks to train the most complex models.

The topology/characteristics/training methods/hyperparameters that determine deep learning are dark magics with little theoretical guidance.

my experience

Don't try to be a hero.

- Andrej Karapathy

Most computer vision problems I faced do not have very large data sets (5000-40000 images). Even with extreme data enhancement strategies, it is difficult to achieve decent precision. However, training a network of millions of parameters on a small number of datasets usually leads to overfitting. So migration learning is my savior.

Why is migration learning effective?

Let's take a look at what the deep learning network learns. The front layer tries to detect edges, the middle layer tries to detect shapes, and the back layer tries to detect high-level data features. These trained networks usually help solve other computer vision problems.

Below, let's take a look at how to use Keras to achieve migration learning and common situations of migration learning.

Simple implementation based on Keras

From keras import applications

From keras.preprocessing.image importImageDataGenerator

From keras import optimizers

From keras.models importSequential, Model

From keras.layers importDropout, Flatten, Dense, GlobalAveragePooling2D

From keras import backend as k

From keras.callbacks importModelCheckpoint, LearningRateScheduler, TensorBoard, EarlyStopping

Img_width, img_height = 256, 256

Train_data_dir = "data/train"

Validation_data_dir = "data/val"

Nb_train_samples = 4125

Nb_validation_samples = 466

Batch_size = 16

Epochs = 50

Model = applications.VGG19(weights = "imagenet", include_top=False, input_shape = (img_width, img_height, 3))


Layer (Type) Output Shape Parameter Number

================================================== ===============

Input_1 (InputLayer) (None, 256, 256, 3) 0


Block1_conv1 (Conv2D) (None, 256, 256, 64) 1792


Block1_conv2 (Conv2D) (None, 256, 256, 64) 36928


Block1_pool (MaxPooling2D) (None, 128, 128, 64) 0


Block2_conv1 (Conv2D) (None, 128, 128, 128) 73856


Block2_conv2 (Conv2D) (None, 128, 128, 128) 147584


Block2_pool (MaxPooling2D) (None, 64, 64, 128) 0


Block3_conv1 (Conv2D) (None, 64, 64, 256) 295168


Block3_conv2 (Conv2D) (None, 64, 64, 256) 590080


Block3_conv3 (Conv2D) (None, 64, 64, 256) 590080


Block3_conv4 (Conv2D) (None, 64, 64, 256) 590080


Block3_pool (MaxPooling2D) (None, 32, 32, 256) 0


Block4_conv1 (Conv2D) (None, 32, 32, 512) 1180160


Block4_conv2 (Conv2D) (None, 32, 32, 512) 2359808


Block4_conv3 (Conv2D) (None, 32, 32, 512) 2359808


Block4_conv4 (Conv2D) (None, 32, 32, 512) 2359808


Block4_pool (MaxPooling2D) (None, 16, 16, 512) 0


Block5_conv1 (Conv2D) (None, 16, 16, 512) 2359808


Block5_conv2 (Conv2D) (None, 16, 16, 512) 2359808


Block5_conv3 (Conv2D) (None, 16, 16, 512) 2359808


Block5_conv4 (Conv2D) (None, 16, 16, 512) 2359808


Block5_pool (MaxPooling2D) (None, 8, 8, 512) 0

================================================== ===============

Total parameters: 20,024,384.0

Training parameters: 20,024,384.0

Unable to train parameters: 0.0


# Freeze layers that are not intended to be trained. Here I have frozen the first 5 floors.

For layer in model.layers[:5]:

Layer.trainable = False

# Add custom layer

x = model.output

x = Flatten()(x)

x = Dense(1024, activation="relu")(x)

x = Dropout(0.5)(x)

x = Dense(1024, activation="relu")(x)

Predictions = Dense(16, activation="softmax")(x)

# Create a final model

Model_final = Model(input = model.input, output = predictions)

# Compile the final model

Model_final.compile(loss = "categorical_crossentropy", optimizer = optimizers.SGD(lr=0.0001, momentum=0.9), metrics=["accuracy"])

# Data enhancement

Train_datagen = ImageDataGenerator(

Rescale = 1./255,

Horizontal_flip = True,

Fill_mode = "nearest",

Zoom_range = 0.3,

Width_shift_range = 0.3,



Test_datagen = ImageDataGenerator(

Rescale = 1./255,

Horizontal_flip = True,

Fill_mode = "nearest",

Zoom_range = 0.3,

Width_shift_range = 0.3,



Train_generator = train_datagen.flow_from_directory(


Target_size = (img_height, img_width),

Batch_size = batch_size,

Class_mode = "categorical")

Validation_generator = test_datagen.flow_from_directory(


Target_size = (img_height, img_width),

Class_mode = "categorical")

# Save the model

Checkpoint = ModelCheckpoint("vgg16_1.h5", monitor='val_acc', verbose=1, save_best_only=True, save_weights_only=False, mode='auto', period=1)

Early = EarlyStopping(monitor='val_acc', min_delta=0, patience=10, verbose=1, mode='auto')

# Training model



Samples_per_epoch = nb_train_samples,

Epochs = epochs,

Validation_data = validation_generator,

Nb_val_samples = nb_validation_samples,

Callbacks = [checkpoint, early])

Common scenarios for migration learning

Don't forget that the convolution features in the front layer are more general and the convolution features in the back layer are more specific to the original data set. There are four main scenarios for migration learning:

1. The new data set is small and similar to the original data set

If we try to train the entire network, it can easily lead to overfitting. Because the new data is similar to the original data, we expect that the high-level features in the convolutional network are related to the new data set. Therefore, it is recommended to freeze all convolutional layers and train only classifiers (eg, linear classifiers):

For layer in model.layers:

Layer.trainable = False

2. The new data set is large, similar to the original data set

Since we have more data, we are more confident that if we try to fine-tune the entire network, we will not lead to overfitting.

For layer in model.layers:

Layer.trainable = True

In fact, the default value is True, the above code explicitly specifies that all layers can be trained, in order to more clearly emphasize this point.

Since the first few layers detect edges, you can also choose to freeze these layers. For example, the following code freezes the first 5 layers of VGG19:

For layer in model.layers[:5]:

Layer.trainable = False

3. The new data set is small, but it is very different from the original data

Since the data set is small, we probably want to extract features from the front layer and then train a classifier on it: (assuming you understand h5py)

From keras import applications

From keras.preprocessing.image importImageDataGenerator

From keras import optimizers

From keras.models importSequential, Model

From keras.layers importDropout, Flatten, Dense, GlobalAveragePooling2D

From keras import backend as k

From keras.callbacks importModelCheckpoint, LearningRateScheduler, TensorBoard, EarlyStopping

Img_width, img_height = 256, 256

### Create Network

Img_input = Input(shape=(256, 256, 3))

x = Conv2D(64, (3, 3), activation='relu', padding='same', name='block1_conv1')(img_input)

x = Conv2D(64, (3, 3), activation='relu', padding='same', name='block1_conv2')(x)

x = MaxPooling2D((2, 2), strides=(2, 2), name='block1_pool')(x)

# Block 2

x = Conv2D(128, (3, 3), activation='relu', padding='same', name='block2_conv1')(x)

x = Conv2D(128, (3, 3), activation='relu', padding='same', name='block2_conv2')(x)

x = MaxPooling2D((2, 2), strides=(2, 2), name='block2_pool')(x)

Model = Model(input = img_input, output = x)




Layer (Type) Output Shape Parameter Number

================================================== ===============

Input_1 (InputLayer) (None, 256, 256, 3) 0


Block1_conv1 (Conv2D) (None, 256, 256, 64) 1792


Block1_conv2 (Conv2D) (None, 256, 256, 64) 36928


Block1_pool (MaxPooling2D) (None, 128, 128, 64) 0


Block2_conv1 (Conv2D) (None, 128, 128, 128) 73856


Block2_conv2 (Conv2D) (None, 128, 128, 128) 147584


Block2_pool (MaxPooling2D) (None, 64, 64, 128) 0

================================================== ===============

The total parameter: 260,160.0

Training parameters: 260,160.0

Unable to train parameters: 0.0


Layer_dict = dict([(, layer) for layer in model.layers])

[ for layer in model.layers]










Import h5py

Weights_path = 'vgg19_weights.h5'# ('

f = h5py.File(weights_path)































# List the names of all layers in the model

Layer_names = [ for layer in model.layers]


# Extract model weights for each layer in the `.h5` file

>>> f["model_weights"]["block1_conv1"].attrs["weight_names"]

Array([b'block1_conv1/kernel:0', b'block1_conv1/bias:0'],


# Assign this array to weight_names

>>> f["model_weights"]["block1_conv1"]["block1_conv1/kernel:0]

# List Weights Storage Tier Weights and Offsets



>>> model.layers[1].set_weights(weights)

# Set the weight for a specific layer.

Using the for loop we can set the weight for the entire network.


For i in layer_dict.keys():

Weight_names = f["model_weights"][i].attrs["weight_names"]

Weights = [f["model_weights"][i][j] for j in weight_names]

Index = layer_names.index(i)


Import cv2

Import numpy as np

Import pandas as pd

From tqdm import tqdm

Import itertools

Import glob

Features = []

For i in tqdm(files_location):

Im = cv2.imread(i)

Im = cv2.resize(cv2.cvtColor(im, cv2.COLOR_BGR2RGB), (256, 256)).astype(np.float32) / 255.0

Im = np.expand_dims(im, axis =0)

Outcome = model_final.predict(im)


## Collect these features, create a dataframe and train a classifier on it

The above code extracts the block2_pool feature. In general, because this layer has 64 x 64 x 128 features, training a classifier on it may not help. We can add some fully connected layers and then train neural networks based on them.

Add a small number of fully connected layers and one output layer.

Set the weight for the front layer and freeze it.

Training network.

4. The new data set is very large, very different from the original data

Since you have a large data set, you can design your own network or use an existing network.

You can initialize the training network based on random initialization weights or pre-trained network weights. The latter is generally chosen.

You can use different networks or make changes based on existing networks.

WMS Light Guide Rack

Twinkle System Technology Co Ltd ,