시계열 분석¶
- 연도별 계절별 월별 일별 시 분 초별로 시간의 흐름에 따라 관측된 자료
- 시계열 자료의 여러 형태
- 시계열 그림 시간의 경과에 따라 시계열자료의 값이 변하는 것을 그린 그림
- 목적
- 1) 미래 예측
- 2) 시스템 또는 확률과정의 이해와 제어
1. Numpy의 profit으로 회귀(regression) 분석하기¶
- 1) 야후 파이낸스 데이터를 가져오기 위한 yfinance
* pip install yfinance
- 2) 예측 모델링 및 시각화를 위한 prophet
* pip install prophet
- 3) Prophets는 Plotly 기반으로 데이터를 시각화하므로
* pip install plotly
In [1]:
import warnings warnings.filterwarnings('ignore') import pandas as pd import numpy as np import matplotlib.pyplot as plt from prophet import Prophet
In [2]:
import matplotlib.pyplot as plt from matplotlib import font_manager, rc plt.rcParams['axes.unicode_minus'] = False # - 기호 깨지는 것 방지 # f_path = "/Library/Fonts/AppleGothic.ttf" -> MAC f_path = "C:/Windows/Fonts/malgun.ttf" font_name = font_manager.FontProperties(fname=f_path).get_name() rc('font', family=font_name)
- 홈페이지 방문객 숫자(날짜, 방문객 수)
In [3]:
pweb = pd.read_csv('../../data/08. PinkWink Web Traffic.csv', encoding='utf-8', thousands=',', names=['date', 'hit'], index_col=0) pweb = pweb[pweb['hit'].notnull()] pweb.head()
Out[3]:
hit | |
---|---|
date | |
16. 7. 1. | 766.0 |
16. 7. 2. | 377.0 |
16. 7. 3. | 427.0 |
16. 7. 4. | 902.0 |
16. 7. 5. | 850.0 |
In [4]:
pweb['hit'].plot(figsize=(12,4), grid=True);
In [5]:
time = np.arange(0,len(pweb)) # 시간(time) 축 생성 (0~364) # print(time) traffic = pweb['hit'].values # 웹 트래픽 traffic 변수에 저장 fx = np.linspace(0, time[-1], 1000) print(fx)
[ 0. 0.36436436 0.72872873 1.09309309 1.45745746 1.82182182 2.18618619 2.55055055 2.91491491 3.27927928 3.64364364 4.00800801 4.37237237 4.73673674 5.1011011 5.46546547 5.82982983 6.19419419 6.55855856 6.92292292 7.28728729 7.65165165 8.01601602 8.38038038 8.74474474 9.10910911 9.47347347 9.83783784 10.2022022 10.56656657 10.93093093 11.2952953 11.65965966 12.02402402 12.38838839 12.75275275 13.11711712 13.48148148 13.84584585 14.21021021 14.57457457 14.93893894 15.3033033 15.66766767 16.03203203 16.3963964 16.76076076 17.12512513 17.48948949 17.85385385 18.21821822 18.58258258 18.94694695 19.31131131 19.67567568 20.04004004 20.4044044 20.76876877 21.13313313 21.4974975 21.86186186 22.22622623 22.59059059 22.95495495 23.31931932 23.68368368 24.04804805 24.41241241 24.77677678 25.14114114 25.50550551 25.86986987 26.23423423 26.5985986 26.96296296 27.32732733 27.69169169 28.05605606 28.42042042 28.78478478 29.14914915 29.51351351 29.87787788 30.24224224 30.60660661 30.97097097 31.33533534 31.6996997 32.06406406 32.42842843 32.79279279 33.15715716 33.52152152 33.88588589 34.25025025 34.61461461 34.97897898 35.34334334 35.70770771 36.07207207 36.43643644 36.8008008 37.16516517 37.52952953 37.89389389 38.25825826 38.62262262 38.98698699 39.35135135 39.71571572 40.08008008 40.44444444 40.80880881 41.17317317 41.53753754 41.9019019 42.26626627 42.63063063 42.99499499 43.35935936 43.72372372 44.08808809 44.45245245 44.81681682 45.18118118 45.54554555 45.90990991 46.27427427 46.63863864 47.003003 47.36736737 47.73173173 48.0960961 48.46046046 48.82482482 49.18918919 49.55355355 49.91791792 50.28228228 50.64664665 51.01101101 51.37537538 51.73973974 52.1041041 52.46846847 52.83283283 53.1971972 53.56156156 53.92592593 54.29029029 54.65465465 55.01901902 55.38338338 55.74774775 56.11211211 56.47647648 56.84084084 57.20520521 57.56956957 57.93393393 58.2982983 58.66266266 59.02702703 59.39139139 59.75575576 60.12012012 60.48448448 60.84884885 61.21321321 61.57757758 61.94194194 62.30630631 62.67067067 63.03503504 63.3993994 63.76376376 64.12812813 64.49249249 64.85685686 65.22122122 65.58558559 65.94994995 66.31431431 66.67867868 67.04304304 67.40740741 67.77177177 68.13613614 68.5005005 68.86486486 69.22922923 69.59359359 69.95795796 70.32232232 70.68668669 71.05105105 71.41541542 71.77977978 72.14414414 72.50850851 72.87287287 73.23723724 73.6016016 73.96596597 74.33033033 74.69469469 75.05905906 75.42342342 75.78778779 76.15215215 76.51651652 76.88088088 77.24524525 77.60960961 77.97397397 78.33833834 78.7027027 79.06706707 79.43143143 79.7957958 80.16016016 80.52452452 80.88888889 81.25325325 81.61761762 81.98198198 82.34634635 82.71071071 83.07507508 83.43943944 83.8038038 84.16816817 84.53253253 84.8968969 85.26126126 85.62562563 85.98998999 86.35435435 86.71871872 87.08308308 87.44744745 87.81181181 88.17617618 88.54054054 88.9049049 89.26926927 89.63363363 89.997998 90.36236236 90.72672673 91.09109109 91.45545546 91.81981982 92.18418418 92.54854855 92.91291291 93.27727728 93.64164164 94.00600601 94.37037037 94.73473473 95.0990991 95.46346346 95.82782783 96.19219219 96.55655656 96.92092092 97.28528529 97.64964965 98.01401401 98.37837838 98.74274274 99.10710711 99.47147147 99.83583584 100.2002002 100.56456456 100.92892893 101.29329329 101.65765766 102.02202202 102.38638639 102.75075075 103.11511512 103.47947948 103.84384384 104.20820821 104.57257257 104.93693694 105.3013013 105.66566567 106.03003003 106.39439439 106.75875876 107.12312312 107.48748749 107.85185185 108.21621622 108.58058058 108.94494494 109.30930931 109.67367367 110.03803804 110.4024024 110.76676677 111.13113113 111.4954955 111.85985986 112.22422422 112.58858859 112.95295295 113.31731732 113.68168168 114.04604605 114.41041041 114.77477477 115.13913914 115.5035035 115.86786787 116.23223223 116.5965966 116.96096096 117.32532533 117.68968969 118.05405405 118.41841842 118.78278278 119.14714715 119.51151151 119.87587588 120.24024024 120.6046046 120.96896897 121.33333333 121.6976977 122.06206206 122.42642643 122.79079079 123.15515516 123.51951952 123.88388388 124.24824825 124.61261261 124.97697698 125.34134134 125.70570571 126.07007007 126.43443443 126.7987988 127.16316316 127.52752753 127.89189189 128.25625626 128.62062062 128.98498498 129.34934935 129.71371371 130.07807808 130.44244244 130.80680681 131.17117117 131.53553554 131.8998999 132.26426426 132.62862863 132.99299299 133.35735736 133.72172172 134.08608609 134.45045045 134.81481481 135.17917918 135.54354354 135.90790791 136.27227227 136.63663664 137.001001 137.36536537 137.72972973 138.09409409 138.45845846 138.82282282 139.18718719 139.55155155 139.91591592 140.28028028 140.64464464 141.00900901 141.37337337 141.73773774 142.1021021 142.46646647 142.83083083 143.1951952 143.55955956 143.92392392 144.28828829 144.65265265 145.01701702 145.38138138 145.74574575 146.11011011 146.47447447 146.83883884 147.2032032 147.56756757 147.93193193 148.2962963 148.66066066 149.02502503 149.38938939 149.75375375 150.11811812 150.48248248 150.84684685 151.21121121 151.57557558 151.93993994 152.3043043 152.66866867 153.03303303 153.3973974 153.76176176 154.12612613 154.49049049 154.85485485 155.21921922 155.58358358 155.94794795 156.31231231 156.67667668 157.04104104 157.40540541 157.76976977 158.13413413 158.4984985 158.86286286 159.22722723 159.59159159 159.95595596 160.32032032 160.68468468 161.04904905 161.41341341 161.77777778 162.14214214 162.50650651 162.87087087 163.23523524 163.5995996 163.96396396 164.32832833 164.69269269 165.05705706 165.42142142 165.78578579 166.15015015 166.51451451 166.87887888 167.24324324 167.60760761 167.97197197 168.33633634 168.7007007 169.06506507 169.42942943 169.79379379 170.15815816 170.52252252 170.88688689 171.25125125 171.61561562 171.97997998 172.34434434 172.70870871 173.07307307 173.43743744 173.8018018 174.16616617 174.53053053 174.89489489 175.25925926 175.62362362 175.98798799 176.35235235 176.71671672 177.08108108 177.44544545 177.80980981 178.17417417 178.53853854 178.9029029 179.26726727 179.63163163 179.995996 180.36036036 180.72472472 181.08908909 181.45345345 181.81781782 182.18218218 182.54654655 182.91091091 183.27527528 183.63963964 184.004004 184.36836837 184.73273273 185.0970971 185.46146146 185.82582583 186.19019019 186.55455455 186.91891892 187.28328328 187.64764765 188.01201201 188.37637638 188.74074074 189.10510511 189.46946947 189.83383383 190.1981982 190.56256256 190.92692693 191.29129129 191.65565566 192.02002002 192.38438438 192.74874875 193.11311311 193.47747748 193.84184184 194.20620621 194.57057057 194.93493493 195.2992993 195.66366366 196.02802803 196.39239239 196.75675676 197.12112112 197.48548549 197.84984985 198.21421421 198.57857858 198.94294294 199.30730731 199.67167167 200.03603604 200.4004004 200.76476476 201.12912913 201.49349349 201.85785786 202.22222222 202.58658659 202.95095095 203.31531532 203.67967968 204.04404404 204.40840841 204.77277277 205.13713714 205.5015015 205.86586587 206.23023023 206.59459459 206.95895896 207.32332332 207.68768769 208.05205205 208.41641642 208.78078078 209.14514515 209.50950951 209.87387387 210.23823824 210.6026026 210.96696697 211.33133133 211.6956957 212.06006006 212.42442442 212.78878879 213.15315315 213.51751752 213.88188188 214.24624625 214.61061061 214.97497497 215.33933934 215.7037037 216.06806807 216.43243243 216.7967968 217.16116116 217.52552553 217.88988989 218.25425425 218.61861862 218.98298298 219.34734735 219.71171171 220.07607608 220.44044044 220.8048048 221.16916917 221.53353353 221.8978979 222.26226226 222.62662663 222.99099099 223.35535536 223.71971972 224.08408408 224.44844845 224.81281281 225.17717718 225.54154154 225.90590591 226.27027027 226.63463463 226.998999 227.36336336 227.72772773 228.09209209 228.45645646 228.82082082 229.18518519 229.54954955 229.91391391 230.27827828 230.64264264 231.00700701 231.37137137 231.73573574 232.1001001 232.46446446 232.82882883 233.19319319 233.55755756 233.92192192 234.28628629 234.65065065 235.01501502 235.37937938 235.74374374 236.10810811 236.47247247 236.83683684 237.2012012 237.56556557 237.92992993 238.29429429 238.65865866 239.02302302 239.38738739 239.75175175 240.11611612 240.48048048 240.84484484 241.20920921 241.57357357 241.93793794 242.3023023 242.66666667 243.03103103 243.3953954 243.75975976 244.12412412 244.48848849 244.85285285 245.21721722 245.58158158 245.94594595 246.31031031 246.67467467 247.03903904 247.4034034 247.76776777 248.13213213 248.4964965 248.86086086 249.22522523 249.58958959 249.95395395 250.31831832 250.68268268 251.04704705 251.41141141 251.77577578 252.14014014 252.5045045 252.86886887 253.23323323 253.5975976 253.96196196 254.32632633 254.69069069 255.05505506 255.41941942 255.78378378 256.14814815 256.51251251 256.87687688 257.24124124 257.60560561 257.96996997 258.33433433 258.6986987 259.06306306 259.42742743 259.79179179 260.15615616 260.52052052 260.88488488 261.24924925 261.61361361 261.97797798 262.34234234 262.70670671 263.07107107 263.43543544 263.7997998 264.16416416 264.52852853 264.89289289 265.25725726 265.62162162 265.98598599 266.35035035 266.71471471 267.07907908 267.44344344 267.80780781 268.17217217 268.53653654 268.9009009 269.26526527 269.62962963 269.99399399 270.35835836 270.72272272 271.08708709 271.45145145 271.81581582 272.18018018 272.54454454 272.90890891 273.27327327 273.63763764 274.002002 274.36636637 274.73073073 275.0950951 275.45945946 275.82382382 276.18818819 276.55255255 276.91691692 277.28128128 277.64564565 278.01001001 278.37437437 278.73873874 279.1031031 279.46746747 279.83183183 280.1961962 280.56056056 280.92492492 281.28928929 281.65365365 282.01801802 282.38238238 282.74674675 283.11111111 283.47547548 283.83983984 284.2042042 284.56856857 284.93293293 285.2972973 285.66166166 286.02602603 286.39039039 286.75475475 287.11911912 287.48348348 287.84784785 288.21221221 288.57657658 288.94094094 289.30530531 289.66966967 290.03403403 290.3983984 290.76276276 291.12712713 291.49149149 291.85585586 292.22022022 292.58458458 292.94894895 293.31331331 293.67767768 294.04204204 294.40640641 294.77077077 295.13513514 295.4994995 295.86386386 296.22822823 296.59259259 296.95695696 297.32132132 297.68568569 298.05005005 298.41441441 298.77877878 299.14314314 299.50750751 299.87187187 300.23623624 300.6006006 300.96496496 301.32932933 301.69369369 302.05805806 302.42242242 302.78678679 303.15115115 303.51551552 303.87987988 304.24424424 304.60860861 304.97297297 305.33733734 305.7017017 306.06606607 306.43043043 306.79479479 307.15915916 307.52352352 307.88788789 308.25225225 308.61661662 308.98098098 309.34534535 309.70970971 310.07407407 310.43843844 310.8028028 311.16716717 311.53153153 311.8958959 312.26026026 312.62462462 312.98898899 313.35335335 313.71771772 314.08208208 314.44644645 314.81081081 315.17517518 315.53953954 315.9039039 316.26826827 316.63263263 316.996997 317.36136136 317.72572573 318.09009009 318.45445445 318.81881882 319.18318318 319.54754755 319.91191191 320.27627628 320.64064064 321.00500501 321.36936937 321.73373373 322.0980981 322.46246246 322.82682683 323.19119119 323.55555556 323.91991992 324.28428428 324.64864865 325.01301301 325.37737738 325.74174174 326.10610611 326.47047047 326.83483483 327.1991992 327.56356356 327.92792793 328.29229229 328.65665666 329.02102102 329.38538539 329.74974975 330.11411411 330.47847848 330.84284284 331.20720721 331.57157157 331.93593594 332.3003003 332.66466466 333.02902903 333.39339339 333.75775776 334.12212212 334.48648649 334.85085085 335.21521522 335.57957958 335.94394394 336.30830831 336.67267267 337.03703704 337.4014014 337.76576577 338.13013013 338.49449449 338.85885886 339.22322322 339.58758759 339.95195195 340.31631632 340.68068068 341.04504505 341.40940941 341.77377377 342.13813814 342.5025025 342.86686687 343.23123123 343.5955956 343.95995996 344.32432432 344.68868869 345.05305305 345.41741742 345.78178178 346.14614615 346.51051051 346.87487487 347.23923924 347.6036036 347.96796797 348.33233233 348.6966967 349.06106106 349.42542543 349.78978979 350.15415415 350.51851852 350.88288288 351.24724725 351.61161161 351.97597598 352.34034034 352.7047047 353.06906907 353.43343343 353.7977978 354.16216216 354.52652653 354.89089089 355.25525526 355.61961962 355.98398398 356.34834835 356.71271271 357.07707708 357.44144144 357.80580581 358.17017017 358.53453453 358.8988989 359.26326326 359.62762763 359.99199199 360.35635636 360.72072072 361.08508509 361.44944945 361.81381381 362.17817818 362.54254254 362.90690691 363.27127127 363.63563564 364. ]
- 평균제곱오차(mean squared error, mse)
- 오차 = (예측값 - 실제값) ** 2
In [6]:
# MSE def error(f, x, y): return np.sqrt(np.mean((f(x)-y)**2))
In [7]:
# polyfit(x, y, n차) fp1 = np.polyfit(time, traffic, 1) # 다항식의 입력값 x, y, 1차 f1 = np.poly1d(fp1) fp2 = np.polyfit(time, traffic, 2) # 다항식의 입력값 x, y, 2차 f2 = np.poly1d(fp2) fp3 = np.polyfit(time, traffic, 3) # 다항식의 입력값 x, y, 3차 f3 = np.poly1d(fp3) fp15 = np.polyfit(time, traffic, 15) # 다항식의 입력값 x, y, 15차 f15 = np.poly1d(fp15) print(error(f1, time, traffic)) print(error(f2, time, traffic)) print(error(f3, time, traffic)) print(error(f15, time, traffic))
430.85973081109626 430.6284101894695 429.53280466762925 330.4777304274343
In [8]:
plt.figure(figsize=(10,6)) plt.scatter(time, traffic, s=10) plt.plot(fx, f1(fx), lw=4, label='f1') plt.plot(fx, f2(fx), lw=4, label='f2') plt.plot(fx, f3(fx), lw=4, label='f3') plt.plot(fx, f15(fx), lw=4, label='f15') # overfitting plt.grid(True, linestyle='-', color='0.75') plt.legend(loc=2) plt.show()
2. Prophet 모듈을 이용한 forecast 예측¶
In [9]:
df = pd.DataFrame({'ds':pweb.index, 'y':pweb['hit']}) df.reset_index(inplace=True) df['ds'] = pd.to_datetime(df['ds'], format="%y. %m. %d.") df.head()
Out[9]:
date | ds | y | |
---|---|---|---|
0 | 16. 7. 1. | 2016-07-01 | 766.0 |
1 | 16. 7. 2. | 2016-07-02 | 377.0 |
2 | 16. 7. 3. | 2016-07-03 | 427.0 |
3 | 16. 7. 4. | 2016-07-04 | 902.0 |
4 | 16. 7. 5. | 2016-07-05 | 850.0 |
In [10]:
del df['date'] # df 변수를 생성했으므로 삭제 # Prophet() 함수를 사용 시 # 주기성이 연단위(yearly_seasonality) 및 일단위(daily_seasonality=True)로 있다고 알려줌 # Prophet 클래스 객체를 만들고 시계열 데이터를 입력으로 fit 메서드를 호출 m = Prophet(yearly_seasonality=True, daily_seasonality=True) m.fit(df)
16:45:53 - cmdstanpy - INFO - Chain [1] start processing 16:45:53 - cmdstanpy - INFO - Chain [1] done processing
Out[10]:
<prophet.forecaster.Prophet at 0x1b874521d50>
In [11]:
# make_future_dataframe : 예측 날짜 구간 생성() # 16. 7. 1. ~ 17. 6. 30. 홈페이지 방문자 수 future = m.make_future_dataframe(periods=60) future.head()
Out[11]:
ds | |
---|---|
0 | 2016-07-01 |
1 | 2016-07-02 |
2 | 2016-07-03 |
3 | 2016-07-04 |
4 | 2016-07-05 |
- predict: 신뢰구간을 포함한 예측 실행
In [12]:
forecast = m.predict(future) forecast[['ds', 'yhat', 'yhat_lower', 'yhat_upper']].tail()
Out[12]:
ds | yhat | yhat_lower | yhat_upper | |
---|---|---|---|---|
420 | 2017-08-25 | 901.398379 | 703.775237 | 1110.598991 |
421 | 2017-08-26 | 486.266408 | 260.578665 | 691.996784 |
422 | 2017-08-27 | 618.928059 | 414.174886 | 825.380619 |
423 | 2017-08-28 | 1171.734066 | 963.785863 | 1375.587713 |
424 | 2017-08-29 | 1207.743873 | 987.267201 | 1429.176432 |
In [13]:
# plot: 원래의 시계열 데이터와 예측 데이터 m.plot(forecast)
Out[13]:
In [14]:
# plot_components: 선형회귀 및 계절성 성분별로 분리 m.plot_components(forecast)
Out[14]:
3. Seasonal 시계열 분석으로 주식 데이터 분석하기¶
In [15]:
# https://finance.yahoo.com/ import datetime as dt import yfinance as yf company = 'TATAELXSI.NS' # Define a start date and End Date start = dt.datetime(2021, 1, 1) end = dt.datetime(2023, 5, 1) # Read Stock Price Data data = yf.download(company, start, end) data.tail(10) # oppen 시가, high 고가, low 저가, close 종가, Adj close 조정종가, volume 거래량
[*********************100%***********************] 1 of 1 completed
Out[15]:
Open | High | Low | Close | Adj Close | Volume | |
---|---|---|---|---|---|---|
Date | ||||||
2023-04-17 | 6237.950195 | 6300.0 | 6180.000000 | 6289.750000 | 6240.825195 | 108512 |
2023-04-18 | 6298.750000 | 6396.0 | 6295.000000 | 6367.299805 | 6317.771973 | 100611 |
2023-04-19 | 6368.000000 | 6375.0 | 6290.049805 | 6307.200195 | 6258.139648 | 69527 |
2023-04-20 | 6316.000000 | 6364.5 | 6235.350098 | 6251.649902 | 6203.021484 | 90809 |
2023-04-21 | 6269.000000 | 6300.0 | 6205.549805 | 6281.000000 | 6232.143555 | 66308 |
2023-04-24 | 6280.850098 | 6330.0 | 6220.049805 | 6252.950195 | 6204.312012 | 61632 |
2023-04-25 | 6253.000000 | 6307.0 | 6235.299805 | 6271.200195 | 6222.419922 | 47904 |
2023-04-26 | 6269.000000 | 6300.0 | 6240.000000 | 6290.850098 | 6241.916992 | 56123 |
2023-04-27 | 6284.000000 | 6635.0 | 6265.149902 | 6578.200195 | 6527.031738 | 493002 |
2023-04-28 | 6629.000000 | 6687.0 | 6565.649902 | 6643.600098 | 6591.922852 | 191519 |
In [16]:
start = '2012-7-1' end = '2022-7-31' KIA = yf.download('000270.KS', start, end) # 기아차 KIA.head()
[*********************100%***********************] 1 of 1 completed
Out[16]:
Open | High | Low | Close | Adj Close | Volume | |
---|---|---|---|---|---|---|
Date | ||||||
2012-07-02 | 75600.0 | 75900.0 | 75000.0 | 75300.0 | 57019.429688 | 950363 |
2012-07-03 | 75800.0 | 75800.0 | 73100.0 | 73500.0 | 55656.425781 | 2542262 |
2012-07-04 | 74300.0 | 74500.0 | 73600.0 | 74500.0 | 56413.644531 | 1786898 |
2012-07-05 | 74600.0 | 74800.0 | 73800.0 | 74100.0 | 56110.750000 | 835637 |
2012-07-06 | 74400.0 | 74500.0 | 73500.0 | 73700.0 | 55807.871094 | 758448 |
In [17]:
start = '2012-7-1' end = '2022-7-31' KAKAO = yf.download('035720.KS', start, end) # 카카오 KAKAO.head()
[*********************100%***********************] 1 of 1 completed
Out[17]:
Open | High | Low | Close | Adj Close | Volume | |
---|---|---|---|---|---|---|
Date | ||||||
2012-07-02 | 20348.544922 | 20993.250000 | 20207.515625 | 20207.515625 | 19494.722656 | 396841 |
2012-07-03 | 20267.957031 | 20650.750000 | 20227.662109 | 20550.015625 | 19825.144531 | 296579 |
2012-07-04 | 20670.898438 | 20751.486328 | 20288.103516 | 20408.986328 | 19689.089844 | 351768 |
2012-07-05 | 20449.279297 | 20711.191406 | 20348.544922 | 20590.308594 | 19864.013672 | 239518 |
2012-07-06 | 20529.867188 | 20691.044922 | 20429.132812 | 20691.044922 | 19961.195312 | 132778 |
In [18]:
start = '2012-7-1' end = '2022-7-31' SAM = yf.download('005930.KS', start, end) # 삼성 SAM.head()
[*********************100%***********************] 1 of 1 completed
Out[18]:
Open | High | Low | Close | Adj Close | Volume | |
---|---|---|---|---|---|---|
Date | ||||||
2012-07-02 | 24160.0 | 24180.0 | 23420.0 | 23480.0 | 18562.669922 | 19106750 |
2012-07-03 | 23640.0 | 23860.0 | 23400.0 | 23500.0 | 18578.484375 | 15801550 |
2012-07-04 | 23640.0 | 23940.0 | 23560.0 | 23820.0 | 18831.460938 | 19158750 |
2012-07-05 | 23740.0 | 23880.0 | 23540.0 | 23700.0 | 18736.601562 | 9068550 |
2012-07-06 | 23880.0 | 23880.0 | 23060.0 | 23220.0 | 18357.119141 | 21158900 |
In [19]:
SAM['Close'].plot(figsize=(12,6), grid=True)
Out[19]:
<Axes: xlabel='Date'>
In [20]:
SAM_trunc = SAM[:'2022-12-31'] SAM_trunc
Out[20]:
Open | High | Low | Close | Adj Close | Volume | |
---|---|---|---|---|---|---|
Date | ||||||
2012-07-02 | 24160.0 | 24180.0 | 23420.0 | 23480.0 | 18562.669922 | 19106750 |
2012-07-03 | 23640.0 | 23860.0 | 23400.0 | 23500.0 | 18578.484375 | 15801550 |
2012-07-04 | 23640.0 | 23940.0 | 23560.0 | 23820.0 | 18831.460938 | 19158750 |
2012-07-05 | 23740.0 | 23880.0 | 23540.0 | 23700.0 | 18736.601562 | 9068550 |
2012-07-06 | 23880.0 | 23880.0 | 23060.0 | 23220.0 | 18357.119141 | 21158900 |
... | ... | ... | ... | ... | ... | ... |
2022-07-25 | 60900.0 | 61900.0 | 60800.0 | 61100.0 | 59958.777344 | 9193681 |
2022-07-26 | 60800.0 | 61900.0 | 60800.0 | 61700.0 | 60547.570312 | 6597211 |
2022-07-27 | 61300.0 | 61900.0 | 61200.0 | 61800.0 | 60645.703125 | 7320997 |
2022-07-28 | 62300.0 | 62600.0 | 61600.0 | 61900.0 | 60743.835938 | 10745302 |
2022-07-29 | 62400.0 | 62600.0 | 61300.0 | 61400.0 | 60253.171875 | 15093120 |
2479 rows × 6 columns
In [21]:
df = pd.DataFrame({'ds':SAM_trunc.index, 'y':SAM_trunc['Close']}) df.reset_index(inplace=True) del df['Date'] df.head()
Out[21]:
ds | y | |
---|---|---|
0 | 2012-07-02 | 23480.0 |
1 | 2012-07-03 | 23500.0 |
2 | 2012-07-04 | 23820.0 |
3 | 2012-07-05 | 23700.0 |
4 | 2012-07-06 | 23220.0 |
In [22]:
# 주기성이 일단위로 있다고 알려줌 m = Prophet(daily_seasonality=True) m.fit(df)
16:45:59 - cmdstanpy - INFO - Chain [1] start processing 16:46:01 - cmdstanpy - INFO - Chain [1] done processing
Out[22]:
<prophet.forecaster.Prophet at 0x1b876b5ccd0>
In [23]:
# 위의 df에서 1년치를 예측하겠다 future = m.make_future_dataframe(periods=365) future.tail()
Out[23]:
ds | |
---|---|
2839 | 2023-07-25 |
2840 | 2023-07-26 |
2841 | 2023-07-27 |
2842 | 2023-07-28 |
2843 | 2023-07-29 |
In [24]:
forecast = m.predict(future) forecast[['ds', 'yhat', 'yhat_lower', 'yhat_upper']].tail()
Out[24]:
ds | yhat | yhat_lower | yhat_upper | |
---|---|---|---|---|
2839 | 2023-07-25 | 79309.637559 | 71225.228519 | 87876.370676 |
2840 | 2023-07-26 | 79304.453351 | 71186.993432 | 88040.565048 |
2841 | 2023-07-27 | 79392.731979 | 71285.285187 | 87915.863145 |
2842 | 2023-07-28 | 79357.274911 | 71483.087512 | 87833.151436 |
2843 | 2023-07-29 | 78919.623670 | 70468.300319 | 87604.851549 |
In [25]:
m.plot(forecast)
Out[25]:
In [26]:
m.plot_components(forecast)
Out[26]:
In [27]:
start = '2014-1-1' end = '2017-7-31' SAM = yf.download('005930.KS', start, end) # 삼성 SAM['Close'].plot(figsize=(12,6), grid=True)
[*********************100%***********************] 1 of 1 completed
Out[27]:
<Axes: xlabel='Date'>
In [28]:
# 일부 데이터를 잘라서 forecast 수행 => 2014-1-1 ~ 2017-7-31 SAM_trunc = SAM[:'2017-05-31'] SAM_trunc['Close'].plot(figsize=(12,6), grid=True)
Out[28]:
<Axes: xlabel='Date'>
In [29]:
df = pd.DataFrame({'ds':SAM_trunc.index, 'y':SAM_trunc['Close']}) df.reset_index(inplace=True) del df['Date']
In [30]:
m = Prophet(daily_seasonality=True) m.fit(df)
16:46:06 - cmdstanpy - INFO - Chain [1] start processing 16:46:06 - cmdstanpy - INFO - Chain [1] done processing
Out[30]:
<prophet.forecaster.Prophet at 0x1b876b5d6c0>
In [31]:
future = m.make_future_dataframe(periods=61) future.tail()
Out[31]:
ds | |
---|---|
897 | 2017-07-27 |
898 | 2017-07-28 |
899 | 2017-07-29 |
900 | 2017-07-30 |
901 | 2017-07-31 |
In [32]:
forecast = m.predict(future) m.plot(forecast);
In [33]:
plt.figure(figsize=(12,6)) plt.plot(SAM.index, SAM['Close'], label='real') plt.plot(forecast['ds'], forecast['yhat'], label='forecast') plt.grid() plt.legend() plt.show()
4. Growth Model¶
- 주기성을 띠면서 점점 성장하는 모델
In [34]:
df = pd.read_csv('../../data/08. example_wp_R.csv') df['y'] = np.log(df['y']) # 로그 변환 df
Out[34]:
ds | y | |
---|---|---|
0 | 2008-01-30 | 5.976351 |
1 | 2008-01-16 | 6.049733 |
2 | 2008-01-17 | 6.011267 |
3 | 2008-01-14 | 5.953243 |
4 | 2008-01-15 | 5.910797 |
... | ... | ... |
2858 | 2015-12-11 | 7.834788 |
2859 | 2015-12-12 | 7.360104 |
2860 | 2015-12-13 | 7.479864 |
2861 | 2015-12-18 | 7.765145 |
2862 | 2015-12-19 | 7.220374 |
2863 rows × 2 columns
In [35]:
# 예측값의 최대 상한, 하한값을 제어할 수 있음 df['cap'] = 8.5 # 예측값의 상한값 df['floor'] = 6 # 예측값의 하한값
In [36]:
# daily_seasonality=True : 주기성이 일단위라고 알려줌 # growth 파라미터를 설정 : 로지스틱 함수 m = Prophet(growth='logistic', daily_seasonality=True) m.fit(df)
16:46:08 - cmdstanpy - INFO - Chain [1] start processing 16:46:08 - cmdstanpy - INFO - Chain [1] done processing
Out[36]:
<prophet.forecaster.Prophet at 0x1b878463460>
In [37]:
# Prophet은 Linear한 모델을 사용 - 제어하지 않으면 해당 카테고리의 # 최대 사이즈를 넘을 수도 있음(1825일 -> 5년) future = m.make_future_dataframe(periods=1826) future['cap'] = 8.5 future['floor'] = 6 fcst = m.predict(future) m.plot(fcst);
In [38]:
forecast = m.predict(future) m.plot_components(forecast);
In [39]:
start = '2012-7-1' end = '2023-7-31' hanwhaAero = yf.download('012450.KS', start, end) # 한화에어로스페이스 df_han = pd.DataFrame({'ds':hanwhaAero.index, 'y':hanwhaAero['Close']}) df_han.reset_index(inplace=True) del df_han['Date'] m = Prophet(daily_seasonality=True) m.fit(df_han) future = m.make_future_dataframe(periods=181) forecast = m.predict(future) forecast[['ds', 'yhat', 'yhat_lower', 'yhat_upper']].tail()
[*********************100%***********************] 1 of 1 completed
16:46:40 - cmdstanpy - INFO - Chain [1] start processing 16:46:41 - cmdstanpy - INFO - Chain [1] done processing
Out[39]:
ds | yhat | yhat_lower | yhat_upper | |
---|---|---|---|---|
2883 | 2023-12-28 | 105717.788834 | 96303.582508 | 114361.737294 |
2884 | 2023-12-29 | 105756.688165 | 97064.901554 | 114597.808848 |
2885 | 2023-12-30 | 107089.836948 | 98152.359358 | 115753.696234 |
2886 | 2023-12-31 | 107187.987911 | 98441.743063 | 115587.641210 |
2887 | 2024-01-01 | 106143.506882 | 97443.648866 | 115126.941530 |
In [40]:
plt.figure(figsize=(12,6)) plt.plot(hanwhaAero.index, hanwhaAero['Close'], label='real') plt.plot(forecast['ds'], forecast['yhat'], label='forecast') plt.grid() plt.legend() plt.show()
In [41]:
forecast = m.predict(future) m.plot(forecast);
반응형
'데이터분석' 카테고리의 다른 글
[23.07.06] 인스타그램 크롤링 - 25(1) (1) | 2023.07.06 |
---|---|
[23.07.05] 데이터 시각화(WordCloud) - 24(1) (0) | 2023.07.05 |
[23.07.03] 데이터 시각화(인구 소멸) - 22(2) (0) | 2023.07.03 |
[23.07.03] 데이터 시각화(기름 제일 싼 곳) - 22(1) (0) | 2023.07.03 |
[23.06.30] 데이터 시각화(따릉이) - 21(3) (0) | 2023.06.30 |