State Farm Distracted Driver Detection(Classification)

대회 소개

png

운전 중인 운전자의 이미지로 운전자의 행동 분류하는 대회

State Farm Distracted Driver Detection

Can computer vision spot distracted drivers?
https://www.kaggle.com/c/state-farm-distracted-driver-detection/overview


Description

We’ve all been there: a light turns green and the car in front of you doesn’t budge. Or, a previously unremarkable vehicle suddenly slows and starts swerving from side-to-side.

When you pass the offending driver, what do you expect to see? You certainly aren’t surprised when you spot a driver who is texting, seemingly enraptured by social media, or in a lively hand-held conversation on their phone.

According to the CDC motor vehicle safety division, one in five car accidents is caused by a distracted driver. Sadly, this translates to 425,000 people injured and 3,000 people killed by distracted driving every year.

State Farm hopes to improve these alarming statistics, and better insure their customers, by testing whether dashboard cameras can automatically detect drivers engaging in distracted behaviors. Given a dataset of 2D dashboard camera images, State Farm is challenging Kagglers to classify each driver’s behavior. Are they driving attentively, wearing their seatbelt, or taking a selfie with their friends in the backseat?


What to do

The 10 classes to predict are:

c0: normal driving
c1: texting - right
c2: talking on the phone - right
c3: texting - left
c4: talking on the phone - left
c5: operating the radio
c6: drinking
c7: reaching behind
c8: hair and makeup
c9: talking to passenger


Import Library

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import os

import numpy as np 
import pandas as pd
import os
import glob
from PIL import Image
from tensorflow.keras import *
from tensorflow.keras.layers import *
from tensorflow.keras.applications import EfficientNetB1
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.callbacks import ModelCheckpoint, EarlyStopping, ReduceLROnPlateau 
from sklearn.model_selection import StratifiedKFold

pd.options.display.max_colwidth =999

이미지 리스트 불러오기

1
2
total = pd.read_csv('/kaggle/input/state-farm-distracted-driver-detection/driver_imgs_list.csv')
total
subject classname img
0 p002 c0 img_44733.jpg
1 p002 c0 img_72999.jpg
2 p002 c0 img_25094.jpg
3 p002 c0 img_69092.jpg
4 p002 c0 img_92629.jpg
... ... ... ...
22419 p081 c9 img_56936.jpg
22420 p081 c9 img_46218.jpg
22421 p081 c9 img_25946.jpg
22422 p081 c9 img_67850.jpg
22423 p081 c9 img_9684.jpg

22424 rows × 3 columns


Train Dataset 만들기

1
2
3
4
5
train = glob.glob('../input/state-farm-distracted-driver-detection/imgs/train/*/*')
train_df = pd.DataFrame({'path' : train})
train_df['img'] = train_df['path'].apply(lambda x : x.split('/')[-1])
train_df['classname'] = train_df['path'].apply(lambda x : x.split('/')[-2])
train_df
path img classname
0 ../input/state-farm-distracted-driver-detection/imgs/train/c5/img_68208.jpg img_68208.jpg c5
1 ../input/state-farm-distracted-driver-detection/imgs/train/c5/img_77583.jpg img_77583.jpg c5
2 ../input/state-farm-distracted-driver-detection/imgs/train/c5/img_49189.jpg img_49189.jpg c5
3 ../input/state-farm-distracted-driver-detection/imgs/train/c5/img_6690.jpg img_6690.jpg c5
4 ../input/state-farm-distracted-driver-detection/imgs/train/c5/img_95740.jpg img_95740.jpg c5
... ... ... ...
22419 ../input/state-farm-distracted-driver-detection/imgs/train/c0/img_6087.jpg img_6087.jpg c0
22420 ../input/state-farm-distracted-driver-detection/imgs/train/c0/img_36959.jpg img_36959.jpg c0
22421 ../input/state-farm-distracted-driver-detection/imgs/train/c0/img_19429.jpg img_19429.jpg c0
22422 ../input/state-farm-distracted-driver-detection/imgs/train/c0/img_99342.jpg img_99342.jpg c0
22423 ../input/state-farm-distracted-driver-detection/imgs/train/c0/img_48589.jpg img_48589.jpg c0

22424 rows × 3 columns

1
train_df['classname'].nunique() # 클래스 개수
1
10

데이터 확인

1
Image.open(train[10])

png


Test Dataset 만들기

1
2
3
4
test = glob.glob('../input/state-farm-distracted-driver-detection/imgs/test/*')
test_df = pd.DataFrame({'path' : test})
test_df['img'] = test_df['path'].apply(lambda x : x.split('/')[-1])
test_df
path img
0 ../input/state-farm-distracted-driver-detection/imgs/test/img_96590.jpg img_96590.jpg
1 ../input/state-farm-distracted-driver-detection/imgs/test/img_32366.jpg img_32366.jpg
2 ../input/state-farm-distracted-driver-detection/imgs/test/img_99675.jpg img_99675.jpg
3 ../input/state-farm-distracted-driver-detection/imgs/test/img_85937.jpg img_85937.jpg
4 ../input/state-farm-distracted-driver-detection/imgs/test/img_73903.jpg img_73903.jpg
... ... ...
79721 ../input/state-farm-distracted-driver-detection/imgs/test/img_109.jpg img_109.jpg
79722 ../input/state-farm-distracted-driver-detection/imgs/test/img_53257.jpg img_53257.jpg
79723 ../input/state-farm-distracted-driver-detection/imgs/test/img_90376.jpg img_90376.jpg
79724 ../input/state-farm-distracted-driver-detection/imgs/test/img_28000.jpg img_28000.jpg
79725 ../input/state-farm-distracted-driver-detection/imgs/test/img_93083.jpg img_93083.jpg

79726 rows × 2 columns

1
2
3
4
5
6
7
8
idg_test = ImageDataGenerator()

test_generator = idg_test.flow_from_dataframe(test_df,
                                             x_col = 'path',
                                             y_col = None,
                                             class_mode =None,
                                             shuffle = False,
                                             target_size = (256,256))
1
Found 79726 validated image filenames.

K-fold Cross Validation(교차검증)으로 모델 학습하기

1
kfold = StratifiedKFold(n_splits = 5, shuffle = True, random_state = 42)

모든 데이터셋을 학습시켜 보다 안정적인 모델 성능 뽑아내기

  • 이 대회에서 교차검증 없이 모델을 학습하면 특정 subject(운전자)에 과적합되는 문제가 발생한다.
  • 모델이 classname을 예측할 때 subject에 크게 영향을 받을 수 있다는 것이다.
  • train_test_split에서 stratify옵션을 주어 학습을 해도, 데이터의 일부분만을 추출하는 것이기 때문에 Test 데이터의 분포가 Valid 데이터셋의 분포와 다를 수 있다.
  • 결국 일정 부분의 학습 데이터를 잃는 것이기에, 전체 데이터셋을 모두 훈련에 사용하고 싶을 때 교차 검증을 진행한다.
  • k-fold Cross Validation 특징
    • 전체 데이터를 훈련 및 평가할 수 있음.
    • 앙상블의 효과를 얻을 수 있다. k-fold 학습마다 독립적인 모델을 학습한다라고 생각하면 쉽다.
    • 같은 작업을 여러번 반복하는 것이기에 시간은 오래 걸린다.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
result = 0
index=0
for train_index, valid_index in kfold.split(train_df, train_df['classname']):

    X_train = train_df.iloc[train_index]
    X_val = train_df.iloc[valid_index]
        
    idg_train = ImageDataGenerator()
    idg_val = ImageDataGenerator()
    

    train_generator = idg_train.flow_from_dataframe(X_train, 
                                                   x_col = 'path',
                                                   y_col = 'classname',
                                                   batch_size = 16,
                                                   target_size = (256,256))
    
    
    val_generator = idg_val.flow_from_dataframe(X_val, 
                                               x_col = 'path',
                                               y_col = 'classname',
                                               batch_size = 16,
                                               target_size = (256,256))
    
    
    
    
    early_stop = EarlyStopping(patience=5,
                              verbose = True)

    model_ckpt = ModelCheckpoint(filepath= 'best.h5',
                                 monitor = 'val_loss',
                                save_best_only = True,
                                verbose = True)

    rl = ReduceLROnPlateau(verbose=True,patience=4,) 
 


    model = Sequential()
    model.add(EfficientNetB1(include_top = False, weights = 'imagenet',pooling = 'avg'))
    model.add(Dense(10, activation = 'softmax'))

    model.compile(optimizer = 'adam', loss = 'categorical_crossentropy', metrics = ['acc'])
    
    model.fit(train_generator,
             validation_data = val_generator,
             epochs=  20,
             callbacks = [early_stop, model_ckpt, rl])
    
    model.load_weights('best.h5')    
    
    result += model.predict(test_generator, verbose = True) /5
    index +=1
    print('fin : ',index)

제출

  • sample_submission.csv 에서 각 이미지의 순서와 result의 이미지 순서가 다르기 때문에 새로 Dataframe을 만드는 것이 편하다.
1
2
result_df = pd.DataFrame(result, columns=['c0', 'c1', 'c2', 'c3','c4', 'c5', 'c6', 'c7','c8', 'c9'])
result_df                                        
c0 c1 c2 c3 c4 c5 c6 c7 c8 c9
0 4.898581e-03 6.054179e-03 5.146383e-03 5.325366e-04 7.062429e-04 0.012463 0.642532 2.142958e-02 3.042254e-01 0.002012
1 4.047657e-08 4.258116e-08 7.385115e-08 7.353977e-08 5.030099e-08 0.999987 0.000007 2.678771e-08 2.975033e-07 0.000006
2 4.653947e-01 9.502587e-03 3.015550e-02 7.141132e-03 8.581814e-03 0.001376 0.228859 1.078819e-03 2.433791e-01 0.004531
3 8.255903e-05 1.916165e-05 1.645608e-05 2.139845e-03 2.219971e-04 0.990963 0.000140 5.377778e-06 3.694886e-05 0.006375
4 3.502117e-07 1.061771e-04 4.572770e-05 2.403757e-06 1.187768e-05 0.000053 0.000408 9.991777e-01 1.797632e-04 0.000014
... ... ... ... ... ... ... ... ... ... ...
79721 7.563246e-03 1.897852e-02 2.062093e-01 1.308421e-03 5.482526e-03 0.002233 0.118231 1.006306e-01 5.321654e-01 0.007198
79722 2.018641e-03 3.363243e-04 4.372783e-04 4.747344e-01 4.001005e-01 0.004480 0.000733 3.581994e-04 8.031728e-04 0.115998
79723 8.465707e-02 5.784687e-01 3.608754e-03 3.662695e-02 1.371328e-02 0.011587 0.017968 7.717087e-02 3.644365e-02 0.139756
79724 6.226664e-08 4.983243e-08 7.004503e-09 2.172765e-08 6.979324e-08 0.999966 0.000004 6.643240e-09 2.172691e-06 0.000028
79725 5.966864e-02 2.801355e-02 7.123795e-03 6.038536e-04 2.516853e-03 0.004967 0.027987 1.176820e-03 8.555837e-01 0.012359

79726 rows × 10 columns


1
2
sub = pd.read_csv('../input/state-farm-distracted-driver-detection/sample_submission.csv')
sub
img c0 c1 c2 c3 c4 c5 c6 c7 c8 c9
0 img_1.jpg 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1
1 img_10.jpg 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1
2 img_100.jpg 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1
3 img_1000.jpg 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1
4 img_100000.jpg 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1
... ... ... ... ... ... ... ... ... ... ... ...
79721 img_99994.jpg 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1
79722 img_99995.jpg 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1
79723 img_99996.jpg 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1
79724 img_99998.jpg 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1
79725 img_99999.jpg 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1

79726 rows × 11 columns


1
2
my_sub = pd.concat([test_df, result_df], axis = 1).drop(['path'], axis =1)
my_sub
img c0 c1 c2 c3 c4 c5 c6 c7 c8 c9
0 img_96590.jpg 4.898581e-03 6.054179e-03 5.146383e-03 5.325366e-04 7.062429e-04 0.012463 0.642532 2.142958e-02 3.042254e-01 0.002012
1 img_32366.jpg 4.047657e-08 4.258116e-08 7.385115e-08 7.353977e-08 5.030099e-08 0.999987 0.000007 2.678771e-08 2.975033e-07 0.000006
2 img_99675.jpg 4.653947e-01 9.502587e-03 3.015550e-02 7.141132e-03 8.581814e-03 0.001376 0.228859 1.078819e-03 2.433791e-01 0.004531
3 img_85937.jpg 8.255903e-05 1.916165e-05 1.645608e-05 2.139845e-03 2.219971e-04 0.990963 0.000140 5.377778e-06 3.694886e-05 0.006375
4 img_73903.jpg 3.502117e-07 1.061771e-04 4.572770e-05 2.403757e-06 1.187768e-05 0.000053 0.000408 9.991777e-01 1.797632e-04 0.000014
... ... ... ... ... ... ... ... ... ... ... ...
79721 img_109.jpg 7.563246e-03 1.897852e-02 2.062093e-01 1.308421e-03 5.482526e-03 0.002233 0.118231 1.006306e-01 5.321654e-01 0.007198
79722 img_53257.jpg 2.018641e-03 3.363243e-04 4.372783e-04 4.747344e-01 4.001005e-01 0.004480 0.000733 3.581994e-04 8.031728e-04 0.115998
79723 img_90376.jpg 8.465707e-02 5.784687e-01 3.608754e-03 3.662695e-02 1.371328e-02 0.011587 0.017968 7.717087e-02 3.644365e-02 0.139756
79724 img_28000.jpg 6.226664e-08 4.983243e-08 7.004503e-09 2.172765e-08 6.979324e-08 0.999966 0.000004 6.643240e-09 2.172691e-06 0.000028
79725 img_93083.jpg 5.966864e-02 2.801355e-02 7.123795e-03 6.038536e-04 2.516853e-03 0.004967 0.027987 1.176820e-03 8.555837e-01 0.012359

79726 rows × 11 columns

1
my_sub.to_csv('sub.csv', index=False) # 0.18165
  • Final score which evaluated with multi-class logarithmic loss is 0.18165.
0%