Skip to content

๐Ÿ“ฐ๋‰ด์Šค ๊ธฐ์‚ฌ๋กœ๋ถ€ํ„ฐ ํ•ด์šด์—… ๊ฒฝ์ œ ๋™ํ–ฅ์„ ์˜ˆ์ธกํ•˜๋Š” ๋ชจ๋ธ

Notifications You must be signed in to change notification settings

soykeepgoing/shipping-sentiment-index

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

1 Commit
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

Shipping Sentiment Index

(2021~2022) Shipping Sentiment Index : ๋‰ด์Šค๋ฐ์ดํ„ฐ ๊ธฐ๋ฐ˜ ํ•ด์šด์—… ๊ฒฝ๊ธฐ ์˜ˆ์ธก ์ง€์ˆ˜
Update: 2022-04-28

Index

About this project

  • ํ”„๋กœ์ ํŠธ ์ด๋ฆ„: ๋‰ด์Šค๋ฐ์ดํ„ฐ๋ฅผ ํ™œ์šฉํ•œ ํ•ด์šด์—… ๊ฒฝ๊ธฐ ๋‹น๊ธฐ ์˜ˆ์ธก ์ง€์ˆ˜ ๊ฐœ๋ฐœ
  • ํ”„๋กœ์ ํŠธ ์ง„ํ–‰ ๋ชฉ์ : 2021 ๊ณต๊ณต๋น…๋ฐ์ดํ„ฐ ์ธํ„ด์‹ญ ์ˆ˜๋ จ ํ™œ๋™
  • ํ”„๋กœ์ ํŠธ ์ง„ํ–‰ ๊ธฐ๊ฐ„: 2021๋…„ 9์›” ~ 2022๋…„ 2์›”
  • ํ”„๋กœ์ ํŠธ ์ฐธ์—ฌ ์ธ์›: 1๋ช…

Overview

Goal

  • (๋ชฉ์ ) ๋‰ด์Šค๋ฐ์ดํ„ฐ๋ฅผ ํ™œ์šฉํ•œ ํ…์ŠคํŠธ๋งˆ์ด๋‹ ๊ธฐ๋ฒ•๊ณผ ๊ฐ์„ฑ๋ถ„์„์„ ํ†ตํ•ด ๋‰ด์Šค๋ฐ์ดํ„ฐ ์ง€์ˆ˜๋ฅผ ์‚ฐ์ถœํ•˜๊ณ  ๊ฒฝ๊ธฐ ์˜ˆ์ธก์— ์‚ฌ์šฉํ•˜๊ธฐ ์œ„ํ•จ.
  • (ํ•„์š”์„ฑ) ๋‹ค์†Œ ์ƒ์†Œํ•œ ํ•ด์šด์—… ๋ถ„์•ผ์˜ ์‹ค๋ฌผ๊ฒฝ๊ธฐ ํ˜„ ์ƒํ™ฉ๊ณผ ๋ณ€ํ™” ๋ฐฉํ–ฅ์„ ์‹ ์†ํ•˜๊ฒŒ ํŒŒ์•…ํ•˜๋Š” ๊ฒƒ๊ณผ ๊ฒฝ์ œ ์ฃผ์ฒด๋“ค์˜ ๋ฏผ์ฒฉํ•œ ๋Œ€์‘์ฑ… ๋งˆ๋ จํ•˜๊ธฐ ์œ„ํ•จ.

Flow

Detail Function

Analysis Sentimental

(1) Crawling and Merging
ํŒŒ์ผ ์œ„์น˜: Developing-CurrentForecastIndex-for-ShippingIndustry/1. Analysis Sentimental/(1) Crawling & Merging/

  • ๋ชจ๋ธ ํ•™์Šต ๋ฐ์ดํ„ฐ ๊ตฌ์ถ•์„ ์œ„ํ•ด ๋„ค์ด๋ฒ„, ๋‹ค์Œ์—์„œ ๋‹ค์Œ์˜ ๋ฐ์ดํ„ฐ๋ฅผ ํฌ๋กค๋งํ•จ.
    ๊ฒ€์ƒ‰ ๋‰ด์Šค: ๋งŽ์ด ๋ณธ ๋‰ด์Šค/ ๋Œ“๊ธ€ ๋งŽ์€ ๋‰ด์Šค (2021๋…„ 3์›” ~ 2021๋…„ 11์›”)
    • ๋‰ด์Šค ๋‚ ์งœ
    • ๋‰ด์Šค ์ œ๋ชฉ
    • ๋‰ด์Šค ๋ณธ๋ฌธ
    • ๋‰ด์Šค URL
    • ๋‰ด์Šค ์•„๋ž˜ ๊ฐ์„ฑ rating ์ˆ˜์น˜
      • ์ข‹์•„์š”
      • ๊ฐ๋™์ด์—์š”
      • ์Šฌํผ์š”
      • ํ™”๊ฐ€ ๋‚˜์š”.
  • ๊ฐ ๊ธฐ์‚ฌ์˜ ๊ฐ์„ฑ์ง€์ˆ˜๋Š” ๋‹ค์Œ์˜ ์‹์„ ํ†ตํ•ด ์‚ฐ์ถœํ•จ.
    • ๊ธ์ • rating(์ข‹์•„์š”, ๊ฐ๋™์ด์—์š”) - ๋ถ€์ • rating (์Šฌํผ์š”, ํ™”๊ฐ€ ๋‚˜์š”)
    • ์–‘์ˆ˜์ด๋ฉด 1(๊ธ์ •) tag, ์Œ์ˆ˜์ด๋ฉด 0(๋ถ€์ •) tag
  • ํฌ๋กค๋ง ํ›„ ์ „์ฒด ๊ธฐ์‚ฌ Merge, ์›”๋ณ„๋กœ Merge

(2) Modeling
์ „์ฒ˜๋ฆฌ

def common_word_list(common_num,neg,pos):
    negative_word=[]; positive_word=[]
    n_list=neg.most_common(common_num); p_list=pos.most_common(common_num)

    for i in range(common_num):
        negative_word.append(n_list[i][0])
        positive_word.append(p_list[i][0])

    common_list=list(set(negative_word) & set(positive_word))

    print(common_list)
    print('common_list ๊ธธ์ด', len(common_list))

    return common_list

# #tokenized๋ฅผ list๋กœ ๋ณ€๊ฒฝ
mecab=Mecab()
stopwords = ['ํ–ˆ','์žˆ','์œผ๋กœ','๋กœ','๊ฒƒ','์”จ','๋ง','๋„', '๋Š”', '๋‹ค', '์˜', '๊ฐ€', '์ด', '์€','์ˆ˜','์—์„œ','ํ•œ', '์—', 'ํ•˜', '๊ณ ', '์„', '๋ฅผ', '์ธ', '๋“ฏ', '๊ณผ', '์™€', '๋„ค',    '๋“ค', '๋“ฏ', '์ง€', '์ž„', '๊ฒŒ', '๋งŒ', '๊ฒœ', '๋˜', '์Œ', '๋ฉด']

train_data['tokenized']=train_data['Sentence'].apply(mecab.morphs) #Sentence ๋‚ด์šฉ์„ morphs๋กœ ํ˜•ํƒœ์†Œ ๋ถ„์„(type: list)
train_data['tokenized'] = train_data['tokenized'].apply(lambda x: [item for item in x if item not in stopwords]) #ํ•ด๋‹น ์—ด์˜ ๊ฐ’ ์ค‘ stopword์— ํ•ด๋‹นํ•˜๋Š” ๊ฐ’ ์ง€์šฐ๊ธฐ
train_data['tokenized'] = train_data['tokenized'].apply(lambda x: [item for item in x if len(item)>1]) #๊ธธ์ด 2์ด์ƒ๋งŒ ์ €์žฅ
train_data['tokenized'] = train_data['tokenized'].apply(lambda x: [item for item in x if item not in common_list]) #ํ•ด๋‹น ์—ด์˜ ๊ฐ’ ์ค‘ stopword์— ํ•ด๋‹นํ•˜๋Š” ๊ฐ’ ์ง€์šฐ๊ธฐ
  • ๋‹ค์Œ์˜ ๊ธฐ์ค€์œผ๋กœ ๊ธฐ์‚ฌ ๋ณธ๋ฌธ ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•ด ์ „์ฒ˜๋ฆฌ๋ฅผ ์ง„ํ–‰ํ•จ.
    • (1) ์กฐ์‚ฌ, ์–ด๋ฏธ ๋“ฑ์œผ๋กœ ๊ตฌ์„ฑ๋œ stopword ์ œ๊ฑฐ
    • (2) ๋‹จ์–ด์˜ ๊ธธ์ด๊ฐ€ 2๋ณด๋‹ค ์ž‘์€ ๊ฒฝ์šฐ ์ œ๊ฑฐ
    • (3) common word list๋ฅผ ์ƒ์„ฑํ•˜๊ณ (ํ•จ์ˆ˜ common_word_list), ๊ทธ์— ํ•ด๋‹นํ•˜๋Š” ๋‹จ์–ด ์ œ๊ฑฐ

์ •์ˆ˜ ์ธ์ฝ”๋”ฉ ๋ฐ ํŒจ๋”ฉ

### ์ •์ˆ˜ ์ธ์ฝ”๋”ฉ ###
tokenizer = Tokenizer()
tokenizer.fit_on_texts(X_train) #๋ฌธ์ž๋ฐ์ดํ„ฐ๋ฅผ ์ž…๋ ฅ๋ฐ›์•„ ๋ฆฌ์ŠคํŠธ ํ˜•ํƒœ๋กœ ๋ณ€ํ™˜, ๊ฐ ๋‹จ์–ด์— index ๋ถ€์—ฌ

vocab_size = total_cnt - rare_cnt + 2 # ์‚ฌ์šฉ๋˜๋Š” ๋‹จ์–ด ์ง‘ํ•ฉ์˜ ํฌ๊ธฐ

tokenizer = Tokenizer(vocab_size, oov_token = 'OOV') #์ƒˆ vocab_size๋กœ tokenizer ์ƒˆ๋กœ ์„ค์ •
tokenizer.fit_on_texts(X_train)

#X_train, X_test์˜ ๋ฐ์ดํ„ฐ๋ฅผ ๋„ฃ์–ด์„œ ์ธ์ฝ”๋”ฉ 
X_train = tokenizer.texts_to_sequences(X_train)

### ํŒจ๋”ฉ ### 
def below_threshold_len(max_len, nested_list):
# ํฌ๊ท€ ๋‹จ์–ด์˜ ๊ฐœ์ˆ˜๋งŒํผ ์ œ๊ฑฐํ•˜๋Š” ํ•จ์ˆ˜, max_len์€ ๋ฆฌ๋ทฐ์˜ ์ตœ๋Œ€ ๋ฐ ํ‰๊ท  ๊ธธ์ด๋ฅผ ๋ณด๊ณ  ๋น„๊ตํ•ด์„œ ์„ค์ •
  count = 0
  for sentence in nested_list:
    if(len(sentence) <= max_len):
        count = count + 1
  print('์ „์ฒด ์ƒ˜ํ”Œ ์ค‘ ๊ธธ์ด๊ฐ€ %s ์ดํ•˜์ธ ์ƒ˜ํ”Œ์˜ ๋น„์œจ: %s'%(max_len, (count / len(nested_list))*100))

max_len = 1000
below_threshold_len(max_len, X_train)
X_train = pad_sequences(X_train, maxlen = max_len)
  • ์ •์ˆ˜ ์ธ์ฝ”๋”ฉ ๋ฒ”์œ„ ์„ค์ •ํ•จ.
    • ์ „์ฒด ๋‹จ์–ด์˜ ๊ฐœ์ˆ˜(total cnt)์™€ ์ž„๊ณ„์น˜(threshold)๋ณด๋‹ค ์ž‘์€ ๊ฒฝ์šฐ์— ํ•ด๋‹นํ•˜๋Š” ํฌ๊ท€ ๋‹จ์–ด ์ˆ˜(rare cnt)๋ฅผ ๊ณ„์‚ฐ
  • ํŒจ๋”ฉ
    • max_len์˜ ๊ฐ’์„ ์ž„์˜๋กœ ๋ณ€๊ฒฝํ•˜๋ฉฐ ์ƒ˜ํ”Œ ๋น„์œจ์„ ํ™•์ธํ•˜๊ณ (ํ•จ์ˆ˜ below_threshold_len), pad_sequence ์‹ค์‹œ

๋ชจ๋ธ ์ƒ์„ฑ

embedding_dim = 100
hidden_units = 128

model = Sequential()
model.add(Embedding(vocab_size, embedding_dim))
model.add(Bidirectional(LSTM(hidden_units)))
model.add(Dense(1, activation='sigmoid'))

es = EarlyStopping(monitor='val_loss', mode='min', verbose=1, patience=4)
mc = ModelCheckpoint('best_model.h5', monitor='val_acc', mode='max', verbose=1, save_best_only=True)

model.compile(optimizer='rmsprop', loss='binary_crossentropy', metrics=['acc'])
history = model.fit(X_train, y_train, epochs=15, callbacks=[es, mc], batch_size=256, validation_split=0.2)

loaded_model = load_model('best_model.h5')
print("ํ…Œ์ŠคํŠธ ์ •ํ™•๋„: %.4f" % (loaded_model.evaluate(X_test, y_test)[1]))
  • ๋‹ค์Œ ํŒจํ‚ค์ง€๋ฅผ ์„ค์น˜ํ•˜๊ณ  ๋ชจ๋ธ๋ง ์‹ค์‹œ
    • from tensorflow.keras.layers import Embedding, Dense, LSTM, Bidirectional
    • from tensorflow.keras.models import Sequential, load_model
    • from tensorflow.keras.callbacks import EarlyStopping, ModelCheckpoint
  • Bidirectional-LSTM ๋ฐฉ์‹์œผ๋กœ ๋ฐ์ดํ„ฐ๋ฅผ ํ•™์Šตํ•จ.
  • ์†์‹ค์œจ: 0.4597 // ์ •ํ™•๋„: 0.8141

Handling Shipping News

(1) Crawling
ํŒŒ์ผ ์œ„์น˜: Developing-CurrentForecastIndex-for-ShippingIndustry/2. Handling Shipping News/Crawling_๋‰ด์Šค๋ฐ์ดํ„ฐ_Shipping.ipynb

  • ๊ฐ์„ฑ ๋ถ„๋ฅ˜๊ธฐ ๋ชจ๋ธ์— input์œผ๋กœ ๋“ค์–ด๊ฐˆ ๋ฐ์ดํ„ฐ ๊ตฌ์ถ•์„ ์œ„ํ•ด bigkinds ์‚ฌ์ดํŠธ์—์„œ ๋‰ด์Šค๋ฐ์ดํ„ฐ๋ฅผ ํฌ๋กค๋งํ•จ.
  • ํฌ๋กค๋งํ•œ ๋ฐ์ดํ„ฐ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์Œ.
    • ๊ฒ€์ƒ‰ ํ‚ค์›Œ๋“œ: ํ•ด์šด์—…,ํ•ด์šด์‚ฐ์—…,ํ•ด์šด๊ฒฝ๊ธฐ,ํ•ด์šด์—…๊ณ„
    • ๊ฒ€์ƒ‰ ๊ธฐ๊ฐ„: 2000๋…„ 1์›” ~ 2021๋…„ 11์›”
    • ๋‰ด์Šค ์ œ๋ชฉ
    • ๋‰ด์Šค ๋‚ ์งœ
    • ๋‰ด์Šค ๋ณธ๋ฌธ
    • ๋‰ด์Šค url

(2) Topic Modeling
๊ฐ 80๊ฐœ ํ† ํ”ฝ์˜ ์ƒ์œ„ 25๊ฐœ ์—ฐ๊ด€์–ด๋ฅผ ์ถ”์ถœ ํ›„ ์ •ํ•ฉ์„ฑ ๊ฒ€์ฆ ํ›„ NMF ํ† ํ”ฝ์„ ์‚ฌ์šฉํ•˜์˜€์Œ.
LDA Topic Modeling

# ์„ค์น˜ ํŒจํ‚ค์ง€
from gensim import corpora, models
from gensim.models.coherencemodel import CoherenceModel
from gensim.models.ldamodel import LdaModel
from gensim.corpora.dictionary import Dictionary
from gensim.test.utils import common_texts
from gensim.test.utils import datapath

# common_texts์—์„œ dictionary ์ƒ์„ฑ
common_dictionary = Dictionary(common_texts)
common_corpus = [common_dictionary.doc2bow(text) for text in common_texts]

# corpus๋ฅผ ํ™œ์šฉํ•˜์—ฌ LdaModel ์ƒ์„ฑ
lda = LdaModel(common_corpus, num_topics=80)

#document(๋‰ด์Šค๋ฐ์ดํ„ฐ)์—์„œ word ์ถ”์ถœ (๋ง๋ญ‰์น˜ ์ƒ์„ฑ)
data_word=[[word for word in x.split(' ')] for x in document] 
id2word=corpora.Dictionary(data_word)

texts=data_word
corpus=[id2word.doc2bow(text) for text in texts]

print("Corpus Ready")

#์ƒ์„ฑํ•œ ๋ง๋ญ‰์น˜๋กœ lda ์‹œ์ž‘ 
lda = LdaModel(corpus=corpus, id2word=id2word, num_topics=80)
print("lda done, please wait")

#์ถœ๋ ฅ๋ถ€
for i in range(num_topics):
    words = model.show_topic(i, topn=num_words); #๋ฐ˜ํ™˜ํ•˜๋Š” ํ† ํ”ฝ ์—ฐ๊ด€์–ด ๊ฐœ์ˆ˜ 
    word_dict['Topic # ' + '{:02d}'.format(i+1)] = [i[0] for i in words]

print("Result_out")
  • gensim ํŒจํ‚ค์ง€ ํ™œ์šฉํ•˜์—ฌ LDA Topic Modeling
  • 80๊ฐœ ํ† ํ”ฝ์œผ๋กœ ๋‚˜๋ˆ„์–ด ๋ถ„๋ฅ˜

NMF Topic Modeling

from sklearn.feature_extraction.text import CountVectorizer,TfidfTransformer
from sklearn.decomposition import NMF
from sklearn.preprocessing import normalize

#Count Vector ์ƒ์„ฑ
vectorizer=CountVectorizer(analyzer='word')
x_counts=vectorizer.fit_transform(text)

transformer=TfidfTransformer(smooth_idf=False)
x_tfidf=transformer.fit_transform(x_counts)

xtfidf_norm=normalize(x_tfidf,norm='l2',axis=1)

print("xtfidf_norm Ready")

model=NMF(n_components=80,init='nndsvd')
model.fit(xtfidf_norm) # xtidf ๋ฐ์ดํ„ฐ๋ฅผ fitํ•จ

print("Model Ready")

for topic in range(components_df.shape[0]):
    tmp = components_df.iloc[topic]
    
    print(f'For topic {topic+1} the words with the highest value are:')
    
    print(tmp.nlargest(25))
#์ถœ๋ ฅ๋ถ€
  • sklearn์„ ํ™œ์šฉํ•˜์—ฌ NMF Topic Modeling
  • ๋งˆ์ฐฌ๊ฐ€์ง€๋กœ 80๊ฐœ ํ† ํ”ฝ์„ ๋ถ„๋ฅ˜

Calculating Index

(1) Topic Count

  • 2000๋…„ 1์›”๋ถ€ํ„ฐ 2021๋…„ 10์›”๊นŒ์ง€์˜ ๋‰ด์Šค๋ฐ์ดํ„ฐ์—์„œ NMF ๋ฐฉ์‹์œผ๋กœ ์ถ”์ถœํ•œ ๊ฐ ํ† ํ”ฝ๋ณ„ ์—ฐ๊ด€์–ด์˜ ๊ฐœ์ˆ˜ ์ง‘๊ณ„
  • ์›”๋ณ„ ์ง€์ˆ˜ ์‚ฐ์ถœ์„ ์œ„ํ•ด ๊ฐ ๋‰ด์Šค๋ฐ์ดํ„ฐ์˜ ์ผ๋ณ„ ํ† ํ”ฝ ๋‹จ์–ด ์ˆ˜๋ฅผ ์ง‘๊ณ„ํ•จ

(2) Sentimental Index Daily Sentimental

# ๊ฐ์„ฑ์ง€์ˆ˜๋ฅผ ๋ถ„์„ํ•˜๋Š” ํ•จ์ˆ˜ 
def sentiment_predict(new_sentence):
    encoded = tokenizer.texts_to_sequences([new_sentence]) # ์ •์ˆ˜ ์ธ์ฝ”๋”ฉ

    pad_new = pad_sequences(encoded, maxlen = max_len) # ํŒจ๋”ฉ
    score = float(loaded_model.predict(pad_new)) # ์˜ˆ์ธก
    return score
  • ์œ„ ํ•จ์ˆ˜๋ฅผ ํ†ตํ•ด ๊ฐ ๋‰ด์Šค๋ฐ์ดํ„ฐ์˜ ๊ธ์ •, ๋ถ€์ • ์ง€์ˆ˜๋ฅผ predictํ•จ.
  • ์˜ˆ์ธก ํ›„์—๋Š” ๋‹จ์ˆœ ๊ฐ์„ฑ์ง€์ˆ˜์— ํ•ด๋‹นํ•˜๋Š” ๊ธ์ •-๋ถ€์ • ๊ฐ’์„ ๋ง๋ถ™์—ฌ์ฃผ์—ˆ์Œ.

Monthly Sentimental

for i in LCount: #์›”๋ณ„ ๋‰ด์Šค ๊ฐœ์ˆ˜ 
    index=LCount.index(i) 
    df_tmp=df_Sentimental[pre:pre+i]
    
    #์ผ๋ณ„ ๊ฐ์„ฑ์ง€์ˆ˜์˜ ํ‰๊ท ๊ฐ’์„ ๊ฐ์„ฑ์ง€์ˆ˜
    pos=df_tmp['Pos'].tolist(); MeanPos=np.mean(pos); 
    neg=df_tmp['Neg'].tolist(); MeanNeg=np.mean(neg)
    
    SentiIndex=(MeanPos-MeanNeg)*100
    LSenti.append(round(SentiIndex,1))
    pre=i
  • ๋‹ค์Œ์˜ ์ ˆ์ฐจ๋ฅผ ํ†ตํ•ด ์ผ๋ณ„๋กœ ์˜ˆ์ธกํ•œ ๊ฐ์„ฑ์ง€์ˆ˜๋ฅผ ์›”๋ณ„ ์ง€์ˆ˜๋กœ ๋ณ€ํ™˜ํ•˜์˜€์Œ.
    • (1) ์›”๋ณ„ ๋‰ด์Šค ๊ฐœ์ˆ˜ ๋งŒํผ ๊ธ์ • ์ˆ˜์น˜์™€ ๋ถ€์ •์ˆ˜์น˜์˜ ํ‰๊ท ์„ ๊ตฌํ•จ.
    • (2) (ํ‰๊ท  ๊ธ์ • - ํ‰๊ท  ๋ถ€์ •)*100 ์œผ๋กœ ๊ฐ์„ฑ์ง€์ˆ˜๋ฅผ ์‚ฐ์ถœ

(3) Index

  • ๋‰ด์Šค๋ฐ์ดํ„ฐ์ง€์ˆ˜ ์‚ฐ์ถœ์˜ ๊ฒฝ์šฐ ์„ ํ–‰์—ฐ๊ตฌ๋ฅผ ๋”ฐ๋ผ ์‹์„ ์„ค๊ณ„ํ•˜์˜€์Œ.
    • ๊ฒฐํ•ฉ์ง€์ˆ˜ 1: ๊ฐ์„ฑ์ง€์ˆ˜ * ํ† ํ”ฝ ๋น„์ค‘ ์ƒ์œ„ 20๊ฐœ ํ† ํ”ฝ์˜ 10๊ฐœ ์—ฐ๊ด€์–ด ๋น„์ค‘ (%)
    • ๊ฒฐํ•ฉ์ง€์ˆ˜ 2: ๊ฐ์„ฑ์ง€์ˆ˜ * ํ† ํ”ฝ ๊ฐ„ ์ƒ๊ด€ ์ƒ์œ„ 20๊ฐœ ํ† ํ”ฝ ๋‹จ์–ด ๋น„์ค‘ (%)
    • ๊ฒฐํ•ฉ์ง€์ˆ˜ 3: ๊ฐ์„ฑ์ง€์ˆ˜*ํ† ํ”ฝ-์ƒ์‚ฐ ์ƒ๊ด€ ์ƒ์œ„ 20๊ฐœ ํ† ํ”ฝ ๋‹จ์–ด ๋น„์ค‘ (%)
      ์—ฌ๊ธฐ์„œ ํ•ด์šด์—… ์ƒ์‚ฐ ์ƒ๊ด€์„ฑ์„ ๋น„๊ตํ•˜๊ธฐ ์œ„ํ•ด ์ˆ˜์ƒ์šด์†ก์—…์ƒ์‚ฐ์ง€์ˆ˜๋ฅผ ์ฐธ๊ณ ํ•˜์˜€์Œ. (ํ†ต๊ณ„์ฒญ)
  • 3๊ฐœ์˜ ์ง€์ˆ˜์™€ ์‹ค์ œ์ง€ํ‘œ ๊ฐ„ ๋†’์€ ์ƒ๊ด€์„ฑ์„ ๋„๋Š” ๊ฒฐํ•ฉ์ง€์ˆ˜ 3์„ ๋‰ด์Šค๋ฐ์ดํ„ฐ ์ง€์ˆ˜๋กœ ์„ ์ •ํ•˜์˜€์Œ.
    • ์‹ค์ œ ์ง€ํ‘œ: OECD์—์„œ ๋ฐœํ‘œํ•œ ์šฐ๋ฆฌ๋‚˜๋ผ์˜ ์‚ฐ์—…์ƒ์‚ฐ์ง€์ˆ˜
    • ์ง€์ˆ˜์™€ ์‹ค์ œ ์ง€ํ‘œ ๊ฐ„ ์ƒ๊ด€๊ณ„์ˆ˜
      ๊ฒฐํ•ฉ์ง€์ˆ˜(1) ๊ฒฐํ•ฉ์ง€์ˆ˜(2) ๊ฒฐํ•ฉ์ง€์ˆ˜(3)
      -0.295 -0.343 -0.493

Data Analysis

(1) Data Set Ready

  • ๋ชจํ˜•์„ ๋งŒ๋“ค๊ธฐ ์ „ ๋‰ด์Šค๋ฐ์ดํ„ฐ ์ด์™ธ์— ๋‹น๊ธฐ์˜ˆ์ธก๋ชจํ˜•์— ์ ์šฉ๋  ํ•ด์šด์—… ์‹ค๋ฌผ ๋ฐ์ดํ„ฐ์…‹์„ ๊ตฌ์ถ•ํ•จ.
  • Stopford(2008), Chen et al(2015), Choi,Kim and Han(2018)์„ ์ฐธ๊ณ ํ•˜์—ฌ ํ•ด์šด์‹œ์žฅ์˜ ๊ณต๊ธ‰๋ถ„์•ผ, ์ˆ˜์š”๋ถ„์•ผ, ์šด์ž„ ๋ฐ ๊ฐ€๊ฒฉ ๋ถ„์•ผ, ๊ฒฝ์ œ์ƒํ™ฉ ๋ถ„์•ผ๋กœ ๋‚˜๋ˆ„์–ด ์ˆ˜์ง‘, ์šฐ๋ฆฌ๋‚˜๋ผ ํ•ด์šด์—… ์ƒ์‚ฐ์ง€์ˆ˜๋ฅผ ์˜ˆ์ธกํ•˜๋Š” ๊ฒƒ์„ ๋ชฉํ‘œ๋กœ ํ•˜๊ธฐ์— KOSPI, CLI(KOREA) ๋“ฑ์„ ์ถ”๊ฐ€ํ•˜์—ฌ 26๊ฐœ์˜ ์‹ค๋ฌผ ์ง€ํ‘œ๋ฅผ ์„ ์ •ํ•จ.
  • ์›”๋ณ„ ์‹œ๊ณ„์—ด ๋ฐ์ดํ„ฐ์— ํ•ด๋‹นํ•˜๋ฏ€๋กœ ์•ˆ์ •๋œ ์‹œ๊ณ„์—ด์„ฑ์„ ๋„๊ธฐ ์œ„ํ•ด Bpanel(library tseries)๋ฅผ ํ™œ์šฉํ•˜์—ฌ ์•ˆ์ •ํ™”ํ•จ.
    • trans code๋Š” 3์œผ๋กœ ์ „๋…„๋Œ€๋น„ ์ฆ๊ฐ€์œจ์— ํ•ด๋‹น

(2) Modling

  • Domenico Giannone(2008)์ด ์ œ์•ˆํ•œ ๋‹น๊ธฐ์˜ˆ์ธก๋ชจํ˜•์„ ํ™œ์šฉํ•˜์—ฌ ํ•ด์šด์—… ๊ฒฝ๊ธฐ ์˜ˆ์ธก์„ ์‹œ๋„ํ•˜์˜€์Œ.
  • ์˜ˆ์ธก๋ ฅ ํ‰๊ฐ€๋ฅผ ์œ„ํ•ด 3๊ฐœ์˜ ๋ชจ๋ธ์„ ๋งŒ๋“ค์—ˆ์Œ.
    • ์ž๊ธฐํšŒ๊ท€๋ชจํ˜•
    • ๋™ํƒœ์š”์ธ๋ชจํ˜• : ์‹ค์ œ ์ง€ํ‘œ๋งŒ ์‚ฌ์šฉ
    • ๋‹น๊ธฐ์˜ˆ์ธก๋ชจํ˜• : ์‹ค์ œ ์ง€ํ‘œ + ๋‰ด์Šค๋ฐ์ดํ„ฐ ์ง€์ˆ˜ ์‚ฌ์šฉ
  • ๊ฐ ๋ชจํ˜•์˜ RMSE์™€ MAE ๋น„๊ต
    ๋ถ„๋ฅ˜ ์ž๊ธฐํšŒ๊ท€๋ชจํ˜• ๋™ํƒœ์š”์ธ๋ชจํ˜• ๋‹น๊ธฐ์˜ˆ์ธก๋ชจํ˜•
    RMSE 0.06661 0.03762 0.03754
    MAE 0.04841 0.02794 0.02784
  • ๋น„๊ต ๊ฒฐ๊ณผ ๋‹น๊ธฐ์˜ˆ์ธก๋ชจํ˜•(์‹ค์ œ ์ง€ํ‘œ + ๋‰ด์Šค๋ฐ์ดํ„ฐ ์ง€์ˆ˜)์˜ ์„ฑ๋Šฅ์ด ๊ฐ€์žฅ ์ข‹์•˜์Œ.

Environment

  • Python (3.7.3)
  • R (4,1.2)
  • JupyterNotebook

About

๐Ÿ“ฐ๋‰ด์Šค ๊ธฐ์‚ฌ๋กœ๋ถ€ํ„ฐ ํ•ด์šด์—… ๊ฒฝ์ œ ๋™ํ–ฅ์„ ์˜ˆ์ธกํ•˜๋Š” ๋ชจ๋ธ

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published