Classification and Segmentation 4 Vietnamese Foods using Deep Learning
🍜

Classification and Segmentation 4 Vietnamese Foods using Deep Learning

Tags
Deep learning
Classification
Segmentation
Projects
Published
December 23, 2022
Author

[Code]

1. Summary

With the aid of deep learning, this study classifies and segments 4 classic Vietnamese foods, including Cơm Tấm, Bánh Mì, Phở, and Bánh Tráng Nướng. These foods were selected in a foods dataset called 30VNfoods described in an article called "30VNFoods: A Dataset for Vietnamese Foods Recognition”.
Why choose only 4 foods?
  • We will label and annotate these foods for the Segmentation problem because we don't have a lot of time and there are a lot of individuals who label, so we just select 4 dishes to accomplish this.
  • Additionally, since Colab is the only training environment employed, the model training throughout project execution is constrained.
Model?
  • We will build a basic model from MLP to modern networks like VGG and ResNet in the classification and efficiency comparison problem.
  • We will use U-Net for segmentation problem

2. Processing Datasets

The dataset is divided into 3 parts: train, val and test
Table 1. Number of images per dataset
Train
Val
Test
Bánh mì
935
133
268
Bánh tráng nướng
556
80
159
Phở
564
81
162
Cơm tấm
659
94
189
The images are collected by the author on the internet with many different sources, so the image sizes are different. For the convenience of input training, the dish images are resized 224x224x3 and normalized from 0 to 1

Data Annotations

  • On the web platform, Segments.ai uses data that we label and annotate. This software is plenty of tools that can be used to accentuate the food's edges. The application also has a built-in library. This allows us to alter and reconstruct the data set necessary to solve the segmentation problem.
  • From the dataset used to train the classification model, we randomly select 2824 samples for labeling and annotation.
Figure 1. Statistical chart of the number of samples per class
Figure 1. Statistical chart of the number of samples per class
  Figure 2: Data display and associated mask.
Figure 2: Data display and associated mask.

3. Experiments and Results

Model for classification

  • For classification problem, we use MLP, CNN, miniVgg networks and trained networks on Image-Net dataset (pre-trained model). For MLP, we will experiment by gradually increasing the number of nodes in the hidden layer and increasing the number of hidden layers, if the nodes have good results, they will keep increasing the hidden layer and vice versa.
  • For the CNN network, we build a simple model and a model based on VGG's architecture but more shallow. In addition, we continue to use pre-trained models on the Image-Net set to evaluate

Model for segmentation

  • Next to perform image segmentation, we use the Unet structure as described above. Here the encoder will be reused the pre-trained models on the Image-Net set to get better results. We experiment on different pre-training models: VGG16, Resnet18, Resnet34
Table 2: Segmentation models
Encoder
VGG16
ResNet18
ResNet34
Map
Copy and concatenate
Copy and concatenate
Copy and concatenate
Decoder
Revert VGG16 + Conv1
Rever ResNet18 + Conv1
Revert ResNet34 + Conv1

Metrics

Classification problem:
In there : • True Positive (TP): the number of points of the class Positive that are correctly classified as Positive. • True Negative (TN): the number of points of the negative class that are correctly classified as negative.
Segmentation problem:
In there: • A is the predicted segment • B is Ground truth

Results

Classification
Table 3: Classification results using various models
Methods
Accuracy
Loss
Val_Accuracy
Val_Loss
Test_accuracy
Resnet18_pretrained
99.926
6.78E-05
96.907
0.1106
95.886
Resnet18
99.486
0.0003
80.154
0.7141
78.663
VGG16_pretrained
99.266
0.0005
94.587
0.4035
95.758
VGG16
95.229
0.0030
78.350
0.6939
77.763
miniVGG
99.926
0.0001
82.989
0.6325
87.917
SimpleCNN
99.559
0.0008
86.597
0.3855
86.632
MLP_4hidden512node
53.651
0.0678
45.103
2.8904
47.043
MLP_3hidden1024node
44.403
0.1080
34.278
4.8297
38.946
MLP_3hidden512node
55.486
0.0707
40.721
5.5563
44.987
MLP_4hidden
47.706
0.0583
37.886
2.3706
38.303
MLP_3hidden
49.761
0.0512
36.082
3.0187
41.902
MLP_2hidden
48.844
0.0438
40.979
1.6916
41.516
Table 3 shows that the pre-trained models with the highest results are ResNet18 and VGG16 with over 95% on the test set. As for networks trained from scratch, miniVGG achieved the best results, better than VGG16 and ResNet18 retrained from scratch.
Then we will use the miniVGG network to continue the experiment to get better results:
  • Experiment with different optimization algorithms such as Adam, SGD, RMSProp
  • Use l2 regularization
  • Choose the most optimal algorithm to experiment with different learning rates
  • Method to reduce learning rate
  • Using Augmentation: o RandomHorizontalFlip o RandomGrayscale o RandomAdjustSharpness
Table 4: Experimental results on miniVGG
Methods
Accuracy
Loss
Val_Accuracy
Val_Loss
Test_accuracy
miniVGG_adam_l2_lr_0.0001_aug
92.587
0.0072
89.948
0.3201
86.246
miniVGG_adam_l2_lr_0.0003
95.045
0.0050
91.494
0.2484
88.431
miniVGG_adam_l2_lr_0.0001
98.458
0.0024
87.628
0.3348
88.817
miniVGG_adam_l2_lr_0.001
99.853
0.0004
88.144
0.349
87.917
miniVGG_RMS
86.165
0.0118
77.061
0.6142
74.293
miniVGG_SGD
95.559
0.0054
84.278
0.4104
84.190
miniVGG_adam
95.486
0.0043
87.886
0.3675
86.118
Experimental results on miniVG Through Table 4, it shows that the adam optimization algorithm is the best algorithm for this problem. Then I tested two different learning_rate, both have similar accuracy on the test set, then we apply augmentaion but the result is not better than the original.
Firuge 3: Loss of different parameters on Val set of MiniVGG
Firuge 3: Loss of different parameters on Val set of MiniVGG
Parameters taken in the segmentation problem:
  • The optimal algorithm is Adam
  • Method to reduce learning rate
  • Use Augmentation o RandomBrightnessContrast o HueSaturationValue o Horizontal Flip o IAAAdditiveGaussianNoise
Table 5: Final results on training set of segmentation models
Name
iou/train
iou_banhmi
iou_banhtrang
iou_comtam
iou_pho
iou_clutter
Unet_ResNet34
0.8526
0.8262
0.8207
0.6916
0.7174
0.9037
Unet-ResNet18
0.9158
0.9087
0.8832
0.8760
0.8744
0.9375
Unet-VGG16
0.8818
0.8771
0.8636
0.7854
0.8211
0.9173
Table 6: Final results on validations set of segmentation models
Name
iou/valid
iou_banhmi
iou_banhtrang
iou_comtam
iou_pho
iou_clutter
Unet_ResNet34
0.8625
0.8273
0.8529
0.7083
0.7099
0.9084
Unet-ResNet18
0.8828
0.8655
0.8897
0.7893
0.7571
0.9214
Unet-VGG16
0.8716
0.8627
0.8713
0.7395
0.7463
0.9146
Tables 5 and 6 confirm once again that the Unet model with the encoder ResNet18 trained on the Image-Net dataset has the best results compared to the other two models with the encoder. As for the clutter part is the background, which is segmented easily with IoU above 0.9. Broken rice and pho are two dishes with an IoU threshold between 0.7 and 0.78
Figure 3: IoU of each model during testing on the set Val
Figure 3: IoU of each model during testing on the set Val
Figure 4: Testing Results
Figure 4: Testing Results

Thank you

  • That is the result that we achieved in a short time, thank you for watching and reading the article until the last minute.
  • If you find this post useful, please give me 1 star in my Github repo, I appreciate it
 
Minh-Hai Tran (Harly)