자료형을 자유자재로 변환하기 - astype 메서드¶
In [1]:
import warnings
warnings.simplefilter(action="ignore", category = FutureWarning)
In [2]:
import pandas as pd
import seaborn as sns
tips = sns.load_dataset('tips')
tips
Out[2]:
total_bill | tip | sex | smoker | day | time | size | |
---|---|---|---|---|---|---|---|
0 | 16.99 | 1.01 | Female | No | Sun | Dinner | 2 |
1 | 10.34 | 1.66 | Male | No | Sun | Dinner | 3 |
2 | 21.01 | 3.50 | Male | No | Sun | Dinner | 3 |
3 | 23.68 | 3.31 | Male | No | Sun | Dinner | 2 |
4 | 24.59 | 3.61 | Female | No | Sun | Dinner | 4 |
... | ... | ... | ... | ... | ... | ... | ... |
239 | 29.03 | 5.92 | Male | No | Sat | Dinner | 3 |
240 | 27.18 | 2.00 | Female | Yes | Sat | Dinner | 2 |
241 | 22.67 | 2.00 | Male | Yes | Sat | Dinner | 2 |
242 | 17.82 | 1.75 | Male | No | Sat | Dinner | 2 |
243 | 18.78 | 3.00 | Female | No | Thur | Dinner | 2 |
244 rows × 7 columns
In [3]:
tips.info
Out[3]:
<bound method DataFrame.info of total_bill tip sex smoker day time size
0 16.99 1.01 Female No Sun Dinner 2
1 10.34 1.66 Male No Sun Dinner 3
2 21.01 3.50 Male No Sun Dinner 3
3 23.68 3.31 Male No Sun Dinner 2
4 24.59 3.61 Female No Sun Dinner 4
.. ... ... ... ... ... ... ...
239 29.03 5.92 Male No Sat Dinner 3
240 27.18 2.00 Female Yes Sat Dinner 2
241 22.67 2.00 Male Yes Sat Dinner 2
242 17.82 1.75 Male No Sat Dinner 2
243 18.78 3.00 Female No Thur Dinner 2
[244 rows x 7 columns]>
여러가지 자료형을 문자열로 변환하기¶
In [4]:
# 자료형을 변혼하려면 astype 메서드 사용
# sex 열의 자료형은 카테고리 -> 문자열로 변환하여
tips['sex_str'] = tips['sex'].astype(str)
tips.dtypes
Out[4]:
total_bill float64
tip float64
sex category
smoker category
day category
time category
size int64
sex_str object
dtype: object
자료형을 변환한 데이터 다시 원래대로 만들기¶
In [5]:
# total_bill ==> 문자열 변환
tips['total_bill'] = tips['total_bill'].astype(str)
tips.dtypes
Out[5]:
total_bill object
tip float64
sex category
smoker category
day category
time category
size int64
sex_str object
dtype: object
In [6]:
# 문자열로 변환한 total_bill 열을 다시 실수로 변환
tips['total_bill'] = tips['total_bill'].astype(float)
tips.dtypes
Out[6]:
total_bill float64
tip float64
sex category
smoker category
day category
time category
size int64
sex_str object
dtype: object
잘못 입력한 문자열 처리하기 - to_numeric 메서드¶
In [7]:
# total_bill 열의 1, 3, 5, 7 행의 데이터를 'missing'으로 변경해서 tips_sub_miss에 저장
tips_sub_miss = tips.head(10)
tips_sub_miss.loc[[1, 3, 5, 7], 'total_bill'] = 'missing'
tips_sub_miss
Out[7]:
total_bill | tip | sex | smoker | day | time | size | sex_str | |
---|---|---|---|---|---|---|---|---|
0 | 16.99 | 1.01 | Female | No | Sun | Dinner | 2 | Female |
1 | missing | 1.66 | Male | No | Sun | Dinner | 3 | Male |
2 | 21.01 | 3.50 | Male | No | Sun | Dinner | 3 | Male |
3 | missing | 3.31 | Male | No | Sun | Dinner | 2 | Male |
4 | 24.59 | 3.61 | Female | No | Sun | Dinner | 4 | Female |
5 | missing | 4.71 | Male | No | Sun | Dinner | 4 | Male |
6 | 8.77 | 2.00 | Male | No | Sun | Dinner | 2 | Male |
7 | missing | 3.12 | Male | No | Sun | Dinner | 4 | Male |
8 | 15.04 | 1.96 | Male | No | Sun | Dinner | 2 | Male |
9 | 14.78 | 3.23 | Male | No | Sun | Dinner | 2 | Male |
In [8]:
# 'missing'이라는 문자열 때문에 실수가 아니라 문자열로 변경됨
tips_sub_miss.dtypes
Out[8]:
total_bill object
tip float64
sex category
smoker category
day category
time category
size int64
sex_str object
dtype: object
In [9]:
##### pandas는 'missing'이라는 문자열을 실수로 변환하는 방법을 모름
# tips_sub_miss['total_bill'].astype(float)
##### output : error
In [10]:
#### to_numeric 메서드를사용해도 문자열을 실수로 변환할 수 없다
# pd.to_numeric(tips_sub_miss['total_bill'])
#### output : error
In [11]:
# errors 인자를 ignore로 설정하면 오류는 발생하지 않지만, 자료형도 변하지 않음
tips_sub_miss['total_bill'] = \
pd.to_numeric(tips_sub_miss['total_bill'], errors='ignore')
tips_sub_miss.dtypes
C:\Users\Playdata\AppData\Local\Temp\ipykernel_5084\971638791.py:3: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
tips_sub_miss['total_bill'] = \
Out[11]:
total_bill object
tip float64
sex category
smoker category
day category
time category
size int64
sex_str object
dtype: object
In [12]:
# coerce: 숫자로 변환할 수 없는 값을 누락값으로 지정
tips_sub_miss['total_bill'] = \
pd.to_numeric(tips_sub_miss['total_bill'], errors='coerce')
tips_sub_miss.dtypes
C:\Users\Playdata\AppData\Local\Temp\ipykernel_5084\584473275.py:3: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
tips_sub_miss['total_bill'] = \
Out[12]:
total_bill float64
tip float64
sex category
smoker category
day category
time category
size int64
sex_str object
dtype: object
In [13]:
tips_sub_miss.head()
Out[13]:
total_bill | tip | sex | smoker | day | time | size | sex_str | |
---|---|---|---|---|---|---|---|---|
0 | 16.99 | 1.01 | Female | No | Sun | Dinner | 2 | Female |
1 | NaN | 1.66 | Male | No | Sun | Dinner | 3 | Male |
2 | 21.01 | 3.50 | Male | No | Sun | Dinner | 3 | Male |
3 | NaN | 3.31 | Male | No | Sun | Dinner | 2 | Male |
4 | 24.59 | 3.61 | Female | No | Sun | Dinner | 4 | Female |
반응형