Plant Seedlings Classification(Classification)

대회 소개

png

묘목과 잡초 이미지를 분류하는 대회

Plant Seedlings Classification

Determine the species of a seedling from an image
https://www.kaggle.com/c/plant-seedlings-classification


Description

Can you differentiate a weed from a crop seedling?

The ability to do so effectively can mean better crop yields and better stewardship of the environment.

The Aarhus University Signal Processing group, in collaboration with University of Southern Denmark, has recently released a dataset containing images of approximately 960 unique plants belonging to 12 species at several growth stages.

We’re hosting this dataset as a Kaggle competition in order to give it wider exposure, to give the community an opportunity to experiment with different image recognition techniques, as well to provide a place to cross-pollenate ideas.


Baseline 코드

1
2
3
4
5
6
7
8
9
10
11
import numpy as np
import pandas as pd 
import os
import glob
from PIL import Image
from sklearn.model_selection import train_test_split
from keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras import * 
from tensorflow.keras.layers import * 
from tensorflow.keras.applications.efficientnet import EfficientNetB1 # 거의 B1 모델 사용 / B2로 갈수록 무거워짐
from tensorflow.keras.callbacks import EarlyStopping, ModelCheckpoint

데이터 불러오기

1
2
3
4
path = glob.glob("/kaggle/input/plant-seedlings-classification/train/*/*")
train = pd.DataFrame({"path" : path})
train['label'] = train['path'].apply(lambda x : x.split("/")[-2])
train
path label
0 /kaggle/input/plant-seedlings-classification/t... Scentless Mayweed
1 /kaggle/input/plant-seedlings-classification/t... Scentless Mayweed
2 /kaggle/input/plant-seedlings-classification/t... Scentless Mayweed
3 /kaggle/input/plant-seedlings-classification/t... Scentless Mayweed
4 /kaggle/input/plant-seedlings-classification/t... Scentless Mayweed
... ... ...
4745 /kaggle/input/plant-seedlings-classification/t... Shepherds Purse
4746 /kaggle/input/plant-seedlings-classification/t... Shepherds Purse
4747 /kaggle/input/plant-seedlings-classification/t... Shepherds Purse
4748 /kaggle/input/plant-seedlings-classification/t... Shepherds Purse
4749 /kaggle/input/plant-seedlings-classification/t... Shepherds Purse

4750 rows × 2 columns


데이터 확인

1
Image.open(path[100])

png


Train, valid dataset 만들기

  • Train, Valid Dataset 나누기
1
2
3
4
x_train, x_valid = train_test_split(train, 
                                    test_size = 0.2, 
                                    stratify = train['label'],
                                    random_state=42)

  • ImgaeDataGenerator로 데이터 전처리하기
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
idg = ImageDataGenerator()
idg2 = ImageDataGenerator()

train_generator = idg.flow_from_dataframe(x_train,
                                         x_col = 'path',
                                         y_col = 'label',
                                         target_size = (256,256), 
                                         batch_size= 32) 



valid_generator = idg2.flow_from_dataframe(x_valid,
                                         x_col = 'path',
                                         y_col = 'label',
                                         target_size = (256,256),
                                         batch_size= 32)
1
2
Found 3800 validated image filenames belonging to 12 classes.
Found 950 validated image filenames belonging to 12 classes.

Model 학습

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
# # 모델 선언 
# model = Sequential()

# # 이미지 학습하는 층을 쌓기
# model.add(Conv2D(16, (3,3), activation = 'relu', input_shape = (256,256,3))) 
# model.add(Conv2D(16, (3,3), activation = 'relu')) 
# model.add(MaxPooling2D()) 

# model.add(Conv2D(32, (3,3), activation = 'relu')) 
# model.add(Conv2D(32, (3,3), activation = 'relu')) 
# model.add(MaxPooling2D())

# model.add(Conv2D(64, (3,3), activation = 'relu')) 
# model.add(MaxPooling2D())

# model.add(Conv2D(128, (3,3), activation = 'relu')) 
# model.add(Conv2D(128, (3,3), activation = 'relu')) 
# model.add(MaxPooling2D())

# model.add(Conv2D(256, (3,3), activation = 'relu')) 
# model.add(MaxPooling2D())

# model.add(Conv2D(512, (3,3), activation = 'relu')) 
# model.add(MaxPooling2D())

# model.add(Conv2D(1024, (1,1), activation = 'relu')) 

# # 차원 축소
# model.add(GlobalAveragePooling2D()) 
# model.add(Dense(12, activation = 'softmax'))


# 조기 종료 옵션
es = EarlyStopping(patience=3, 
                   verbose=True)

mc = ModelCheckpoint("best.h5", 
                    save_best_only = True, 
                    verbose=True
                    )

# transfer learning
model = Sequential() # 항상 모델 선언 다시 하고 학습할 것.

model.add(EfficientNetB1(include_top=False, weights = 'imagenet', pooling = 'avg')) 

model.add(Dense(12, activation='softmax')) 

# 최적화, 손실 함수 정의하기
model.compile(metrics = ['acc'], optimizer= 'adam', loss='categorical_crossentropy')

model.fit(train_generator,
         validation_data  = valid_generator,
         epochs = 100, 
         callbacks = [es, mc], 
          ) 

model.load_weights('best.h5')

  • 모델 Summary
1
model.summary()
1
2
3
4
5
6
7
8
9
10
11
12
Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
efficientnetb1 (Functional)  (None, 1280)              6575239   
_________________________________________________________________
dense (Dense)                (None, 12)                15372     
=================================================================
Total params: 6,590,611
Trainable params: 6,528,556
Non-trainable params: 62,055
_________________________________________________________________

Test Dataset 만들기

  • Train, Valid dataset 만들듯이 ImageDataGenerator를 사용해 전처리하면 된다.
1
2
3
test_path = glob.glob("/kaggle/input/plant-seedlings-classification/test/*")
test = pd.DataFrame({"path" : test_path})
test
path
0 /kaggle/input/plant-seedlings-classification/t...
1 /kaggle/input/plant-seedlings-classification/t...
2 /kaggle/input/plant-seedlings-classification/t...
3 /kaggle/input/plant-seedlings-classification/t...
4 /kaggle/input/plant-seedlings-classification/t...
... ...
789 /kaggle/input/plant-seedlings-classification/t...
790 /kaggle/input/plant-seedlings-classification/t...
791 /kaggle/input/plant-seedlings-classification/t...
792 /kaggle/input/plant-seedlings-classification/t...
793 /kaggle/input/plant-seedlings-classification/t...

794 rows × 1 columns

1
2
3
4
5
6
7
test_generator =idg.flow_from_dataframe(test,
                                        x_col = 'path',
                                        y_col = None,
                                        class_mode=None,
                                        shuffle = False,
                                        target_size = (256,256)
                                       )
1
Found 794 validated image filenames.

모델로 예측하기

1
2
result = model.predict(test_generator, verbose = 1)
result
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
25/25 [==============================] - 9s 376ms/step





array([[3.1431966e-02, 1.3168590e-06, 1.8106741e-06, ..., 3.4729105e-06,
        2.8560698e-05, 6.0173708e-05],
       [1.1053331e-04, 1.5426338e-04, 2.5204610e-04, ..., 2.1230977e-05,
        1.5883794e-05, 8.8891864e-01],
       [1.6521607e-06, 6.2318693e-05, 2.8655048e-05, ..., 4.6870503e-01,
        1.3315211e-04, 4.2114169e-03],
       ...,
       [2.1159883e-06, 2.1564020e-01, 7.7060443e-01, ..., 4.1908701e-03,
        9.3793590e-03, 1.1505652e-05],
       [6.5076878e-05, 3.2120446e-05, 3.2295244e-05, ..., 4.1470761e-05,
        5.1161887e-06, 9.9024123e-01],
       [1.2335427e-06, 6.4034253e-01, 4.4173703e-05, ..., 3.2450294e-05,
        3.5616100e-01, 4.7782612e-07]], dtype=float32)

제출

  • 가장 높은 확률의 class(string)를 예측값으로 추출해야 한다.
  • train_generator.class_indices로 각 클래스에 해당하는 index를 가져온다.
1
2
class_dict = train_generator.class_indices
class_dict
1
2
3
4
5
6
7
8
9
10
11
12
{'Black-grass': 0,
 'Charlock': 1,
 'Cleavers': 2,
 'Common Chickweed': 3,
 'Common wheat': 4,
 'Fat Hen': 5,
 'Loose Silky-bent': 6,
 'Maize': 7,
 'Scentless Mayweed': 8,
 'Shepherds Purse': 9,
 'Small-flowered Cranesbill': 10,
 'Sugar beet': 11}

1
2
3
4
class_list =[]
for i in class_dict:
    class_list.append(i)
class_list 
1
2
3
4
5
6
7
8
9
10
11
12
['Black-grass',
 'Charlock',
 'Cleavers',
 'Common Chickweed',
 'Common wheat',
 'Fat Hen',
 'Loose Silky-bent',
 'Maize',
 'Scentless Mayweed',
 'Shepherds Purse',
 'Small-flowered Cranesbill',
 'Sugar beet']

1
2
3
4
sub = pd.read_csv('/kaggle/input/plant-seedlings-classification/sample_submission.csv')
sub['species']= [class_list[i] for i in np.argmax(result,1)]
sub['file'] = test['path'].apply(lambda x : x.split("/")[-1])
sub
file species
0 fd87b36ae.png Loose Silky-bent
1 0e8492cb1.png Sugar beet
2 8d6acbe9b.png Common Chickweed
3 54b3afd58.png Cleavers
4 6049234e6.png Fat Hen
... ... ...
789 4c7838de4.png Common Chickweed
790 fda39e16f.png Loose Silky-bent
791 da4ed3a28.png Charlock
792 a83820a2c.png Sugar beet
793 e4a76885b.png Maize

794 rows × 2 columns

1
sub.to_csv('plant.csv',index=False)
0%