Python 데이터 분석

Notice

Recent Posts

Recent Comments

Link

« 2025/07 »
일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

Tags more

Archives

Today

Total

관리 메뉴

Patrick's 데이터 세상

Python 데이터 분석 - Pandas 본문

Programming/Python

Python 데이터 분석 - Pandas

patrick610 2020. 6. 24. 21:59

SMALL

Pandas

file의 가장 일반적인 형태는 .csv(comma), .tsv(tab)
pandas는 , tab 를 다루는 최적화된 툴(독보적이고 강력함)

* pandas(panel data의 준말) 설치

1. conda install pandas
2. pip install pandas

conda가 1순위 안되면 pip 2순위로 설치 (conda는 설치 시 필요한 파일들(의존성 패키지)을 알아서 더 deep하게 찾아서 설치)
(pip가 더 범용적이고 속도가 빠르다.)

◉ Pandas(Panel Data의 준말)

오로지 2차원만 지원
numpy는 값만 다루지만 pandas는 스키마(구조)를 다룰 수 있다.
ex)join, group by

◎ pandas를 통해 data_files경로에 titanic-train.csv파일을 읽어들이기

import pandas as pd # 관행적 구문


titanic_df = pd.read_csv("data_files/titanic-train.csv")   # csv를 가져와 데이터프레임 생성

titanic_df.head()   # 상단 5개

titanic_df.to_csv("data_files/titanic-train2.csv", 
                  sep=",", 
                  encoding="utf-8",
                  index=False
                  )

◎ Series로 DataFrame 만들기

d = {'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']), 
     'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])} 
df = pd.DataFrame(d) 

df['three'] = pd.Series([10,20,30], index=['a','b','c']) 
df['four'] = df['one'] + df['three'] 
df

결과

- DataFrame
series의 집합

- series
데이터가 입력되면 무조건 인덱스가 붙는다.

◎ drop과 del로 열 삭제하기

d = {'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']), 
     'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd']), 
     'three' : pd.Series([10,20,30], index=['a','b','c'])} 

df = pd.DataFrame(d) 

# df2 = df.drop('two', axis=1)   # 컬럼이 제거된 새로운 데이터 프레임을 만들어 반환
df.drop('two', axis=1, inplace=True) # 원본 데이터 프레임에서 컬럼 삭제
              # 1 열방향, 0 행방향 지우기
df

del df['one']   # one 열 삭제
print(df) 

# df.pop('two') 
# print(df)

결과

◎ 열 인덱싱

d = {'one' : pd.Series([1, 2, 3]), 
     'two' : pd.Series([1, 2, 3, 4]), 
     'three' : pd.Series([10,20,30])} 

df = pd.DataFrame(d)

#df[2]   # 오류
#df["one"]
#df["one":"three"]   # 오류
df[["one", "three"]]

결과

◎ drop과 del로 열 삭제하기

df = {'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']), 
     'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])} 
df = pd.DataFrame(df)
print(df)
print("------------1")
print(df["one"])           # 컬렁 기반 인덱싱
print("------------2")
print(df[["one", "two"]])  # 컬럼 기반 슬라이싱
print("------------3")
print(df.loc['b'])   # 행 인덱싱 이름
print("------------4")
print(df.iloc[1])   # 행 인덱싱 번호
print("------------5")
print(df[0:2])      # 열 슬라이싱

결과

데이터프레임의 인덱싱은 컬럼 인덱싱으로만 찾아야한다.
슬라이싱 또한 df[["one", "three"]]와 같은 구조로 찾아야한다.
└ 그래서 나온 개념이 loc(index-name), iloc(index-order)

◎ pandas DataFrame cell 너비 설정

import pandas as pd

pd.set_option('display.max.colwidth', 150)

◎ pandas DataFrame 텍스트 정렬

dfStyler = df.style.set_properties(**{'text-align': 'left'})
dfStyler.set_table_styles([dict(selector='th', 
                                props=[('text-align', 'left')])])

◎ pandas DataFrame 소수점 자릿수 설정

import pandas as pd

pd.options.display.float_format = '{:.2f}'.format

◎ pandas DataFrame 최대 행 갯수 설정

import pandas as pd
import numpy as np
 
df = pd.DataFrame(np.arange(200).reshape(100, 2))
df

◎ pandas DataFrame 최대 열 갯수 설정

import pandas as pd

pd.set_option('display.max.columns', 50) # set the maximum number whatever you want

LIST

저작자표시 (새창열림)

'Programming > Python' 카테고리의 다른 글

Python 데이터 분석 - Google API (0)	2020.06.24
Python 데이터 분석 - WordCloud (2)	2020.06.24
Python 데이터 분석 - NumPy (0)	2020.06.24
Python 데이터 분석 - 데이터 로딩 (0)	2020.06.24
Python 환경 설치(Google Colab, Visual Studio Code, Anaconda) (1)	2020.06.24