Plant Seedlings Classification(Classification)

대회 소개

묘목과 잡초 이미지를 분류하는 대회

Plant Seedlings Classification

Determine the species of a seedling from an image
https://www.kaggle.com/c/plant-seedlings-classification

Description

Can you differentiate a weed from a crop seedling?

The ability to do so effectively can mean better crop yields and better stewardship of the environment.

The Aarhus University Signal Processing group, in collaboration with University of Southern Denmark, has recently released a dataset containing images of approximately 960 unique plants belonging to 12 species at several growth stages.

We’re hosting this dataset as a Kaggle competition in order to give it wider exposure, to give the community an opportunity to experiment with different image recognition techniques, as well to provide a place to cross-pollenate ideas.

Baseline 코드

import numpy as np
import pandas as pd 
import os
import glob
from PIL import Image
from sklearn.model_selection import train_test_split
from keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras import * 
from tensorflow.keras.layers import * 
from tensorflow.keras.applications.efficientnet import EfficientNetB1 # 거의 B1 모델 사용 / B2로 갈수록 무거워짐
from tensorflow.keras.callbacks import EarlyStopping, ModelCheckpoint

데이터 불러오기

path = glob.glob("/kaggle/input/plant-seedlings-classification/train/*/*")
train = pd.DataFrame({"path" : path})
train['label'] = train['path'].apply(lambda x : x.split("/")[-2])
train

	path	label
0	/kaggle/input/plant-seedlings-classification/t...	Scentless Mayweed
1	/kaggle/input/plant-seedlings-classification/t...	Scentless Mayweed
2	/kaggle/input/plant-seedlings-classification/t...	Scentless Mayweed
3	/kaggle/input/plant-seedlings-classification/t...	Scentless Mayweed
4	/kaggle/input/plant-seedlings-classification/t...	Scentless Mayweed
...	...	...
4745	/kaggle/input/plant-seedlings-classification/t...	Shepherds Purse
4746	/kaggle/input/plant-seedlings-classification/t...	Shepherds Purse
4747	/kaggle/input/plant-seedlings-classification/t...	Shepherds Purse
4748	/kaggle/input/plant-seedlings-classification/t...	Shepherds Purse
4749	/kaggle/input/plant-seedlings-classification/t...	Shepherds Purse

4750 rows × 2 columns

데이터 확인

Image.open(path[100])

Train, valid dataset 만들기

Train, Valid Dataset 나누기

x_train, x_valid = train_test_split(train, 
                                    test_size = 0.2, 
                                    stratify = train['label'],
                                    random_state=42)

ImgaeDataGenerator로 데이터 전처리하기

idg = ImageDataGenerator()
idg2 = ImageDataGenerator()

train_generator = idg.flow_from_dataframe(x_train,
                                         x_col = 'path',
                                         y_col = 'label',
                                         target_size = (256,256), 
                                         batch_size= 32) 



valid_generator = idg2.flow_from_dataframe(x_valid,
                                         x_col = 'path',
                                         y_col = 'label',
                                         target_size = (256,256),
                                         batch_size= 32)

Found 3800 validated image filenames belonging to 12 classes.
Found 950 validated image filenames belonging to 12 classes.

Model 학습

# # 모델 선언 
# model = Sequential()

# # 이미지 학습하는 층을 쌓기
# model.add(Conv2D(16, (3,3), activation = 'relu', input_shape = (256,256,3))) 
# model.add(Conv2D(16, (3,3), activation = 'relu')) 
# model.add(MaxPooling2D()) 

# model.add(Conv2D(32, (3,3), activation = 'relu')) 
# model.add(Conv2D(32, (3,3), activation = 'relu')) 
# model.add(MaxPooling2D())

# model.add(Conv2D(64, (3,3), activation = 'relu')) 
# model.add(MaxPooling2D())

# model.add(Conv2D(128, (3,3), activation = 'relu')) 
# model.add(Conv2D(128, (3,3), activation = 'relu')) 
# model.add(MaxPooling2D())

# model.add(Conv2D(256, (3,3), activation = 'relu')) 
# model.add(MaxPooling2D())

# model.add(Conv2D(512, (3,3), activation = 'relu')) 
# model.add(MaxPooling2D())

# model.add(Conv2D(1024, (1,1), activation = 'relu')) 

# # 차원 축소
# model.add(GlobalAveragePooling2D()) 
# model.add(Dense(12, activation = 'softmax'))


# 조기 종료 옵션
es = EarlyStopping(patience=3, 
                   verbose=True)

mc = ModelCheckpoint("best.h5", 
                    save_best_only = True, 
                    verbose=True
                    )

# transfer learning
model = Sequential() # 항상 모델 선언 다시 하고 학습할 것.

model.add(EfficientNetB1(include_top=False, weights = 'imagenet', pooling = 'avg')) 

model.add(Dense(12, activation='softmax')) 

# 최적화, 손실 함수 정의하기
model.compile(metrics = ['acc'], optimizer= 'adam', loss='categorical_crossentropy')

model.fit(train_generator,
         validation_data  = valid_generator,
         epochs = 100, 
         callbacks = [es, mc], 
          ) 

model.load_weights('best.h5')

모델 Summary

model.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
efficientnetb1 (Functional)  (None, 1280)              6575239   
_________________________________________________________________
dense (Dense)                (None, 12)                15372     
=================================================================
Total params: 6,590,611
Trainable params: 6,528,556
Non-trainable params: 62,055
_________________________________________________________________

Test Dataset 만들기

Train, Valid dataset 만들듯이 ImageDataGenerator를 사용해 전처리하면 된다.

test_path = glob.glob("/kaggle/input/plant-seedlings-classification/test/*")
test = pd.DataFrame({"path" : test_path})
test

	path
0	/kaggle/input/plant-seedlings-classification/t...
1	/kaggle/input/plant-seedlings-classification/t...
2	/kaggle/input/plant-seedlings-classification/t...
3	/kaggle/input/plant-seedlings-classification/t...
4	/kaggle/input/plant-seedlings-classification/t...
...	...
789	/kaggle/input/plant-seedlings-classification/t...
790	/kaggle/input/plant-seedlings-classification/t...
791	/kaggle/input/plant-seedlings-classification/t...
792	/kaggle/input/plant-seedlings-classification/t...
793	/kaggle/input/plant-seedlings-classification/t...

794 rows × 1 columns

test_generator =idg.flow_from_dataframe(test,
                                        x_col = 'path',
                                        y_col = None,
                                        class_mode=None,
                                        shuffle = False,
                                        target_size = (256,256)
                                       )

1	`Found 794 validated image filenames.`

모델로 예측하기

result = model.predict(test_generator, verbose = 1)
result

25/25 [==============================] - 9s 376ms/step





array([[3.1431966e-02, 1.3168590e-06, 1.8106741e-06, ..., 3.4729105e-06,
        2.8560698e-05, 6.0173708e-05],
       [1.1053331e-04, 1.5426338e-04, 2.5204610e-04, ..., 2.1230977e-05,
        1.5883794e-05, 8.8891864e-01],
       [1.6521607e-06, 6.2318693e-05, 2.8655048e-05, ..., 4.6870503e-01,
        1.3315211e-04, 4.2114169e-03],
       ...,
       [2.1159883e-06, 2.1564020e-01, 7.7060443e-01, ..., 4.1908701e-03,
        9.3793590e-03, 1.1505652e-05],
       [6.5076878e-05, 3.2120446e-05, 3.2295244e-05, ..., 4.1470761e-05,
        5.1161887e-06, 9.9024123e-01],
       [1.2335427e-06, 6.4034253e-01, 4.4173703e-05, ..., 3.2450294e-05,
        3.5616100e-01, 4.7782612e-07]], dtype=float32)

제출

가장 높은 확률의 class(string)를 예측값으로 추출해야 한다.
train_generator.class_indices로 각 클래스에 해당하는 index를 가져온다.

class_dict = train_generator.class_indices
class_dict

{'Black-grass': 0,
 'Charlock': 1,
 'Cleavers': 2,
 'Common Chickweed': 3,
 'Common wheat': 4,
 'Fat Hen': 5,
 'Loose Silky-bent': 6,
 'Maize': 7,
 'Scentless Mayweed': 8,
 'Shepherds Purse': 9,
 'Small-flowered Cranesbill': 10,
 'Sugar beet': 11}

class_list =[]
for i in class_dict:
    class_list.append(i)
class_list 

['Black-grass',
 'Charlock',
 'Cleavers',
 'Common Chickweed',
 'Common wheat',
 'Fat Hen',
 'Loose Silky-bent',
 'Maize',
 'Scentless Mayweed',
 'Shepherds Purse',
 'Small-flowered Cranesbill',
 'Sugar beet']

sub = pd.read_csv('/kaggle/input/plant-seedlings-classification/sample_submission.csv')
sub['species']= [class_list[i] for i in np.argmax(result,1)]
sub['file'] = test['path'].apply(lambda x : x.split("/")[-1])
sub

	file	species
0	fd87b36ae.png	Loose Silky-bent
1	0e8492cb1.png	Sugar beet
2	8d6acbe9b.png	Common Chickweed
3	54b3afd58.png	Cleavers
4	6049234e6.png	Fat Hen
...	...	...
789	4c7838de4.png	Common Chickweed
790	fda39e16f.png	Loose Silky-bent
791	da4ed3a28.png	Charlock
792	a83820a2c.png	Sugar beet
793	e4a76885b.png	Maize

794 rows × 2 columns

sub.to_csv('plant.csv',index=False)