How to train Deeplab on Custom Dataset

4 min readJul 23, 2021

Training deeplabv3+ in tensorflow on your own custom dataset for semantic segmentation.

Deeplab is one of the state-of-the-art deep learning models for semantic segmentation. I used deeplab to train cloth segmentation model for CMate and results were very satisfying. In this post, I will show you how to train deeplab for your own semantic segmentation problem, fine-tuning from pretrained deeplab models.

Note: Deeplab is pretrained on various datasets: (1) PASCAL VOC 2012, (2) Cityscapes. I’m taking PASCAL VOC as reference for this guide. Please note that, you may need to structure your dataset differently and use the correct pre-trained checkpoint from model zoo as per your need.

Lets get started…

Installation

First clone models repo as:

git clone https://github.com/tensorflow/models

Then, install deeplab requirements following instructions here. You need to explicitly install tensorflow 1.x (e.g. 1.15.2). Tip: Use virtualenv for isolation.

apt-get install python-pil python-numpy
pip install --user jupyter matplotlib PrettyTable > /dev/nullpip pip pip install tf_slim
#From tensorflow/models/research/
export PYTHONPATH=$PYTHONPATH:`pwd`:`pwd`/slim

Prepare DataSet

Next step is to prepare your dataset as required by deeplab. Create new folder(e.g. pascal_voc_seg) and place your dataset inside tensorflow/models/research/deeplab/datasets . Final directory structure should look like below:

+ datasets
  + pascal_voc_seg
    + VOC2012
      + JPEGImages
      + Segmentation
      + ImageSets
    + tfrecord
    + exp
      + train_on_train_set
        + train
        + eval
        + vis

In above directory structure,

VOC2012 is your dataset directory.
JPEGImages contains .jpg or .png RGB images.
Segmentation contains segmentation labels with same name as RGB images. Annotation images should have only one channel (w*h*1) and should be .png format. If your label image contains RGB values, you should convert to single channel(different integer value for each class) using script like this.
ImageSets contains files for data splits: train.txt & val.txt. Each contain image names without extensions. for e.g. train.txt might look like:

image001
image002
image003
...

tfrecord and exp folders will be created later in below steps.

Generate TFRecords

You need to convert above images dataset into tfrecords format in order to train deeplab.

Make a copy of build_voc2012_dataset.py and modify anything if required. Directory structure should now look like this:

+ datasets
  + pascal_voc_seg
  - build_data.py
  - build_voc2012_dataset.py
  - build_voc2012_dataset_copy.py
  ...

And run following code (taken from download_and_convert_voc2012.sh).
The converted datatset will be saved at ../datasets/pascal_voc_seg/tfrecord/.

# run from datasets directory:
export DATATSET_ROOT = ../datasets/pascal_voc_seg/VOC2012/
export OUTPUT_DIR = ../datasets/pascal_voc_seg/tfrecord/
echo "Converting dataset..."
python ./build_voc2012_dataset_copy.py  \
  --image_folder="${DATATSET_ROOT}/JPEGImages/" \
  --semantic_segmentation_folder="${DATATSET_ROOT}/Segmentation/" \
  --list_folder="${DATATSET_ROOT}/ImageSets/" \
  --image_format="png" \
  --output_dir="${OUTPUT_DIR}"

Register Dataset

You need to first register your custom dataset before training. Make following changes to data_generator.py:

# add new dataset description
_CUSTOM_INFORMATION = DatasetDescriptor(
    splits_to_sizes={
        'train': 20210,  # num of samples in train.txt
        'val': 2000,  # num of samples in val.txt
    },
    num_classes=15, # classes+bg+ignore_label
    ignore_label=255,
)# add new line to register dataset
_DATASETS_INFORMATION = {
    'cityscapes': _CITYSCAPES_INFORMATION,
    'pascal_voc_seg': _PASCAL_VOC_SEG_INFORMATION,
    'custom': _CUSTOM_INFORMATION # custom dataset
}

Setup Working Directory

First setup working directories for train/eval/vis/export. Run following script from tensorflow/models/research/ :

# Move one-level up to tensorflow/models/research directory.
cd ..
# Set up the working environment.CURRENT_DIR=$(pwd)
WORK_DIR="${CURRENT_DIR}/deeplab"
DATASET_DIR="datasets"# Set up the working directories.
PASCAL_FOLDER="custom"
EXP_FOLDER="exp/train_on_trainval_set"
INIT_FOLDER="${WORK_DIR}/${DATASET_DIR}/${PASCAL_FOLDER}/init_models"
TRAIN_LOGDIR="${WORK_DIR}/${DATASET_DIR}/${PASCAL_FOLDER}/${EXP_FOLDER}/train"
EVAL_LOGDIR="${WORK_DIR}/${DATASET_DIR}/${PASCAL_FOLDER}/${EXP_FOLDER}/eval"
VIS_LOGDIR="${WORK_DIR}/${DATASET_DIR}/${PASCAL_FOLDER}/${EXP_FOLDER}/vis"
EXPORT_DIR="${WORK_DIR}/${DATASET_DIR}/${PASCAL_FOLDER}/${EXP_FOLDER}/export"mkdir -p "${INIT_FOLDER}"
mkdir -p "${TRAIN_LOGDIR}"
mkdir -p "${EVAL_LOGDIR}"
mkdir -p "${VIS_LOGDIR}"
mkdir -p "${EXPORT_DIR}"PASCAL_DATASET="${WORK_DIR}/${DATASET_DIR}/${PASCAL_FOLDER}/tfrecord"

Download Checkpoint

One-time download desired checkpoint:

TF_INIT_ROOT="http://download.tensorflow.org/models"
TF_INIT_CKPT="deeplabv3_pascal_train_aug_2018_01_04.tar.gz"
cd "${INIT_FOLDER}"
wget -nd -c "${TF_INIT_ROOT}/${TF_INIT_CKPT}"
tar -xf "${TF_INIT_CKPT}"

Train

Run following script to train.

cd "${CURRENT_DIR}"NUM_ITERATIONS=1000python3 "${WORK_DIR}"/train.py \
--logtostderr \
--train_split="train" \
--model_variant="xception_65" \
--atrous_rates=6 \
--atrous_rates=12 \
--atrous_rates=18 \
--output_stride=16 \
--decoder_output_stride=4 \
--train_crop_size="600,600" \
--train_batch_size=4 \
--training_number_of_steps="${NUM_ITERATIONS}" \
--initialize_last_layer=False \
--last_layers_contain_logits_only=True \
--fine_tune_batch_norm=False \
--tf_initial_checkpoint="${INIT_FOLDER}/deeplabv3_pascal_train_aug/model.ckpt" \
--train_logdir="${TRAIN_LOGDIR}" \
--dataset="custom" \
--dataset_dir="${PASCAL_DATASET}"

Parameters:

train_logdir: where the checkpoint and logs are stored.
dataset_dir: the path of dataset TFRecord files.
dataset: the name of dataset description in data_generator.py.
train_batch_size: number of training images used for one iteration.
training_number_of_steps: number of training iterations with given batch size.
train_crop_size: Image crop size [height, width] during training. crop_size ≤ max_size of train images.
model_variant: model architecture.
train_split: one of splits mentioned in dataset description.
atrous_rates: Atrous rates for atrous spatial pyramid pooling.
output_stride: The ration of input image to the output of encoder.

Notes:

For `xception_65`, use atrous_rates = [12, 24, 36] if output_stride = 8, or rates = [6, 12, 18] if output_stride = 16. For `mobilenet_v2`, use None.
There is trade off between crop_size and batch_size. Choose small crop-size and large batch size or vice versa.
Training is effective if you can train on large batch size. Depending on your available memory and use case, you can choose the best value for both params.
If you have class imbalance problem, assign higher loss weight to class having less samples. Check out this for more.

Parameters specific to transfer learning:

There are model checkpoints pre-trained on various datasets. Download the most relevant model checkpoint foryour task.

tf_initial_checkpoint: the path of pre-trained weights. Set previous checkpoint to continue training.
If you want to re-use all the trained weights, set initialize_last_layer=True.
If you want to re-use only the network backbone, set initialize_last_layer=False and last_layers_contain_logits_only=False.
If you want to re-use all the trained weights except the logits (since the num_classes may be different), set initialize_last_layer=False and last_layers_contain_logits_only=True .

Recommended:
initialize_last_layer=False
last_layers_contain_logits_only=True
fine_tune_batch_norm=False

Evaluate

Run following script to evaluate on val set.

# From tensorflow/models/research/
python "${WORK_DIR}"/eval.py \
--logtostderr \
--eval_split="val" \
--model_variant="xception_65" \
--atrous_rates=6 \
--atrous_rates=12 \
--atrous_rates=18 \
--output_stride=16 \
--decoder_output_stride=4 \
--eval_crop_size="600,600" \
--checkpoint_dir="${TRAIN_LOGDIR}" \
--eval_logdir="${EVAL_LOGDIR}" \
--dataset_dir="${PASCAL_DATASET}" \
--dataset="custom" \
--max_number_of_evaluations=1

Visualize

Output segmentations will be written to VIS_LOGDIR.

# Visualize the results.
# From tensorflow/models/research/
python "${WORK_DIR}"/vis.py \
--logtostderr \
--vis_split="val" \
--model_variant="xception_65" \
--atrous_rates=6 \
--atrous_rates=12 \
--atrous_rates=18 \
--output_stride=16 \
--decoder_output_stride=4 \
--vis_crop_size="600,600" \
--checkpoint_dir="${TRAIN_LOGDIR}" \
--vis_logdir="${VIS_LOGDIR}" \
--dataset_dir="${PASCAL_DATASET}" \
--dataset="custom" \
--max_number_of_iterations=1

Export

Run following to export the trained model for inference.

# Export the trained checkpoint.
# From tensorflow/models/research/CKPT_PATH="${TRAIN_LOGDIR}/model.ckpt-4437"
EXPORT_PATH="${EXPORT_DIR}/frozen_inference_graph.pb"python "${WORK_DIR}"/export_model.py \
--logtostderr \
--checkpoint_path="${CKPT_PATH}" \
--export_path="${EXPORT_PATH}" \
--model_variant="xception_65" \
--atrous_rates=6 \
--atrous_rates=12 \
--atrous_rates=18 \
--output_stride=16 \
--decoder_output_stride=4 \
--num_classes=3 \
--crop_size=600 \
--crop_size=600 \
--inference_scales=1.0

To run inference with the exported model. Refer to the provided deeplab_demo.ipynb for an example.

Hope it worked for you. Happy Learning !!!

References:

Deeplab installation: https://github.com/tensorflow/models/blob/master/research/deeplab/g3doc/installation.md
Tensorflow deeplab FAQ: https://github.com/tensorflow/models/blob/master/research/deeplab/g3doc/faq.md
Sample imbalance: https://github.com/tensorflow/models/issues/3730#issuecomment-387100419
Training parameters for reusing pre-trained weigths: https://github.com/tensorflow/models/issues/3730#issuecomment-380168917