How to train Deeplab on Custom Dataset
Training deeplabv3+ in tensorflow on your own custom dataset for semantic segmentation.
Deeplab is one of the state-of-the-art deep learning models for semantic segmentation. I used deeplab to train cloth segmentation model for CMate and results were very satisfying. In this post, I will show you how to train deeplab for your own semantic segmentation problem, fine-tuning from pretrained deeplab models.
Note: Deeplab is pretrained on various datasets: (1) PASCAL VOC 2012, (2) Cityscapes. I’m taking PASCAL VOC as reference for this guide. Please note that, you may need to structure your dataset differently and use the correct pre-trained checkpoint from model zoo as per your need.
Lets get started…
Installation
First clone models
repo as:
git clone https://github.com/tensorflow/models
Then, install deeplab requirements following instructions here. You need to explicitly install tensorflow 1.x (e.g. 1.15.2). Tip: Use virtualenv for isolation.
apt-get install python-pil python-numpy
pip install --user jupyter matplotlib PrettyTable > /dev/nullpip pip pip install tf_slim
#From tensorflow/models/research/
export PYTHONPATH=$PYTHONPATH:`pwd`:`pwd`/slim
Prepare DataSet
Next step is to prepare your dataset as required by deeplab. Create new folder(e.g. pascal_voc_seg
) and place your dataset inside tensorflow/models/research/deeplab/datasets
. Final directory structure should look like below:
+ datasets
+ pascal_voc_seg
+ VOC2012
+ JPEGImages
+ Segmentation
+ ImageSets
+ tfrecord
+ exp
+ train_on_train_set
+ train
+ eval
+ vis
In above directory structure,
VOC2012
is your dataset directory.JPEGImages
contains.jpg
or.png
RGB images.Segmentation
contains segmentation labels with same name as RGB images. Annotation images should have only one channel (w*h*1) and should be.png
format. If your label image contains RGB values, you should convert to single channel(different integer value for each class) using script like this.ImageSets
contains files for data splits:train.txt
&val.txt
. Each contain image names without extensions. for e.g.train.txt
might look like:
image001
image002
image003
...
tfrecord
andexp
folders will be created later in below steps.
Generate TFRecords
You need to convert above images dataset into tfrecords format in order to train deeplab.
Make a copy of build_voc2012_dataset.py and modify anything if required. Directory structure should now look like this:
+ datasets
+ pascal_voc_seg
- build_data.py
- build_voc2012_dataset.py
- build_voc2012_dataset_copy.py
...
And run following code (taken from download_and_convert_voc2012.sh
).
The converted datatset will be saved at ../datasets/pascal_voc_seg/tfrecord/
.
# run from datasets directory:
export DATATSET_ROOT = ../datasets/pascal_voc_seg/VOC2012/
export OUTPUT_DIR = ../datasets/pascal_voc_seg/tfrecord/
echo "Converting dataset..."
python ./build_voc2012_dataset_copy.py \
--image_folder="${DATATSET_ROOT}/JPEGImages/" \
--semantic_segmentation_folder="${DATATSET_ROOT}/Segmentation/" \
--list_folder="${DATATSET_ROOT}/ImageSets/" \
--image_format="png" \
--output_dir="${OUTPUT_DIR}"
Register Dataset
You need to first register your custom dataset before training. Make following changes to data_generator.py:
# add new dataset description
_CUSTOM_INFORMATION = DatasetDescriptor(
splits_to_sizes={
'train': 20210, # num of samples in train.txt
'val': 2000, # num of samples in val.txt
},
num_classes=15, # classes+bg+ignore_label
ignore_label=255,
)# add new line to register dataset
_DATASETS_INFORMATION = {
'cityscapes': _CITYSCAPES_INFORMATION,
'pascal_voc_seg': _PASCAL_VOC_SEG_INFORMATION,
'custom': _CUSTOM_INFORMATION # custom dataset
}
Setup Working Directory
First setup working directories for train/eval/vis/export. Run following script from tensorflow/models/research/
:
# Move one-level up to tensorflow/models/research directory.
cd ..
# Set up the working environment.CURRENT_DIR=$(pwd)
WORK_DIR="${CURRENT_DIR}/deeplab"
DATASET_DIR="datasets"# Set up the working directories.
PASCAL_FOLDER="custom"
EXP_FOLDER="exp/train_on_trainval_set"
INIT_FOLDER="${WORK_DIR}/${DATASET_DIR}/${PASCAL_FOLDER}/init_models"
TRAIN_LOGDIR="${WORK_DIR}/${DATASET_DIR}/${PASCAL_FOLDER}/${EXP_FOLDER}/train"
EVAL_LOGDIR="${WORK_DIR}/${DATASET_DIR}/${PASCAL_FOLDER}/${EXP_FOLDER}/eval"
VIS_LOGDIR="${WORK_DIR}/${DATASET_DIR}/${PASCAL_FOLDER}/${EXP_FOLDER}/vis"
EXPORT_DIR="${WORK_DIR}/${DATASET_DIR}/${PASCAL_FOLDER}/${EXP_FOLDER}/export"mkdir -p "${INIT_FOLDER}"
mkdir -p "${TRAIN_LOGDIR}"
mkdir -p "${EVAL_LOGDIR}"
mkdir -p "${VIS_LOGDIR}"
mkdir -p "${EXPORT_DIR}"PASCAL_DATASET="${WORK_DIR}/${DATASET_DIR}/${PASCAL_FOLDER}/tfrecord"
Download Checkpoint
One-time download desired checkpoint:
TF_INIT_ROOT="http://download.tensorflow.org/models"
TF_INIT_CKPT="deeplabv3_pascal_train_aug_2018_01_04.tar.gz"
cd "${INIT_FOLDER}"
wget -nd -c "${TF_INIT_ROOT}/${TF_INIT_CKPT}"
tar -xf "${TF_INIT_CKPT}"
Train
Run following script to train.
cd "${CURRENT_DIR}"NUM_ITERATIONS=1000python3 "${WORK_DIR}"/train.py \
--logtostderr \
--train_split="train" \
--model_variant="xception_65" \
--atrous_rates=6 \
--atrous_rates=12 \
--atrous_rates=18 \
--output_stride=16 \
--decoder_output_stride=4 \
--train_crop_size="600,600" \
--train_batch_size=4 \
--training_number_of_steps="${NUM_ITERATIONS}" \
--initialize_last_layer=False \
--last_layers_contain_logits_only=True \
--fine_tune_batch_norm=False \
--tf_initial_checkpoint="${INIT_FOLDER}/deeplabv3_pascal_train_aug/model.ckpt" \
--train_logdir="${TRAIN_LOGDIR}" \
--dataset="custom" \
--dataset_dir="${PASCAL_DATASET}"
Parameters:
train_logdir
: where the checkpoint and logs are stored.dataset_dir
: the path of dataset TFRecord files.dataset
: the name of dataset description indata_generator.py
.train_batch_size
: number of training images used for one iteration.training_number_of_steps
: number of training iterations with given batch size.train_crop_size
: Image crop size [height, width] during training. crop_size ≤ max_size of train images.model_variant
: model architecture.train_split
: one of splits mentioned in dataset description.atrous_rates
: Atrous rates for atrous spatial pyramid pooling.output_stride
: The ration of input image to the output of encoder.
Notes:
- For `xception_65`, use atrous_rates = [12, 24, 36] if output_stride = 8, or rates = [6, 12, 18] if output_stride = 16. For `mobilenet_v2`, use None.
- There is trade off between
crop_size
andbatch_size
. Choose small crop-size and large batch size or vice versa. - Training is effective if you can train on large batch size. Depending on your available memory and use case, you can choose the best value for both params.
- If you have class imbalance problem, assign higher loss weight to class having less samples. Check out this for more.
Parameters specific to transfer learning:
There are model checkpoints pre-trained on various datasets. Download the most relevant model checkpoint foryour task.
tf_initial_checkpoint
: the path of pre-trained weights. Set previous checkpoint to continue training.- If you want to re-use all the trained weights, set i
nitialize_last_layer=True
. - If you want to re-use only the network backbone, set
initialize_last_layer=False
andlast_layers_contain_logits_only=False
. - If you want to re-use all the trained weights except the logits (since the num_classes may be different), set
initialize_last_layer=False
andlast_layers_contain_logits_only=True
.
Recommended:
initialize_last_layer=False
last_layers_contain_logits_only=True
fine_tune_batch_norm=False
Evaluate
Run following script to evaluate on val
set.
# From tensorflow/models/research/
python "${WORK_DIR}"/eval.py \
--logtostderr \
--eval_split="val" \
--model_variant="xception_65" \
--atrous_rates=6 \
--atrous_rates=12 \
--atrous_rates=18 \
--output_stride=16 \
--decoder_output_stride=4 \
--eval_crop_size="600,600" \
--checkpoint_dir="${TRAIN_LOGDIR}" \
--eval_logdir="${EVAL_LOGDIR}" \
--dataset_dir="${PASCAL_DATASET}" \
--dataset="custom" \
--max_number_of_evaluations=1
Visualize
Output segmentations will be written to VIS_LOGDIR
.
# Visualize the results.
# From tensorflow/models/research/
python "${WORK_DIR}"/vis.py \
--logtostderr \
--vis_split="val" \
--model_variant="xception_65" \
--atrous_rates=6 \
--atrous_rates=12 \
--atrous_rates=18 \
--output_stride=16 \
--decoder_output_stride=4 \
--vis_crop_size="600,600" \
--checkpoint_dir="${TRAIN_LOGDIR}" \
--vis_logdir="${VIS_LOGDIR}" \
--dataset_dir="${PASCAL_DATASET}" \
--dataset="custom" \
--max_number_of_iterations=1
Export
Run following to export the trained model for inference.
# Export the trained checkpoint.
# From tensorflow/models/research/CKPT_PATH="${TRAIN_LOGDIR}/model.ckpt-4437"
EXPORT_PATH="${EXPORT_DIR}/frozen_inference_graph.pb"python "${WORK_DIR}"/export_model.py \
--logtostderr \
--checkpoint_path="${CKPT_PATH}" \
--export_path="${EXPORT_PATH}" \
--model_variant="xception_65" \
--atrous_rates=6 \
--atrous_rates=12 \
--atrous_rates=18 \
--output_stride=16 \
--decoder_output_stride=4 \
--num_classes=3 \
--crop_size=600 \
--crop_size=600 \
--inference_scales=1.0
To run inference with the exported model. Refer to the provided deeplab_demo.ipynb
for an example.
Hope it worked for you. Happy Learning !!!
References:
Deeplab installation: https://github.com/tensorflow/models/blob/master/research/deeplab/g3doc/installation.md
Tensorflow deeplab FAQ: https://github.com/tensorflow/models/blob/master/research/deeplab/g3doc/faq.md
Sample imbalance: https://github.com/tensorflow/models/issues/3730#issuecomment-387100419
Training parameters for reusing pre-trained weigths: https://github.com/tensorflow/models/issues/3730#issuecomment-380168917