Python 데이터 분석

Notice

Recent Posts

Recent Comments

Link

« 2025/04 »
일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30

Tags more

Archives

Today

Total

관리 메뉴

Patrick's 데이터 세상

Python 데이터 분석 - WordCloud 본문

Programming/Python

Python 데이터 분석 - WordCloud

patrick610 2020. 6. 24. 21:59

SMALL

◉ wordCloud

텍스트 데이터에서 단어 빈도를 분석하여 시각화하는 기법

- nltk
텍스트 분석하기 위해 만들어진 패키지

- wordCloud
시각화 목적 패키지

◉ 데이터 시각화

- matplotlib
데이터를 시각화하는 패키지

- seaborn
matplotlib를 보완한 패키지

* 설치 pip install wordcloud
pip install matplotlib

⊙ 텍스트 파일을 읽어 wordClouding을 처리

from wordcloud import WordCloud

with open('data_files/constitution-en.txt', 'r') as f:
    constitution_en = f.read()   # 전체 파일 내용 읽기

constitution_en

결과

...생략

wcloud = WordCloud()   # wordcloud 기능을 가진 클래스의 객체 만들기

wcloud = wcloud.generate(constitution_en)

wcloud.words_   # 문장을 단어로 나누고 빈도수 
wcloud.words_['People']
#len(wcloud.words_)   # 200

결과

⊙ 피규어를 생성해서 이미지에 대한 수정사항을 입력 후 출력

import matplotlib.pyplot as plt

plt.figure(figsize=(12, 12))  # figure 사이즈 변경
plt.imshow(wcloud)   # imshow : 이미지 출력
plt.axis('off')      # 축 표시 생략
plt.show()           # 부가 정보 출력 생략

결과

빈도수에 의해 크기별로 정렬된 WordCloud 모습

⊙ 이미지 안에 워드크라우딩 하기

from PIL import Image
from wordcloud import STOPWORDS

STOPWORDS   # 불용어 처리

결과

...생략

with open('data_files/alice.txt', 'r') as f:
    alice = f.read()   # 전체 파일 내용 읽기

결과

의미를 갖지 않는 불용어(he she i by 은 는 이 가, 등)는 STOPWORD로 제외

import numpy as np

img = Image.open('data_files/alice_mask.png')
alice_mask = np.array(img)

alice_mask.shape

결과

alice_mask[:10, :10]

결과

wcloud2 = WordCloud(mask=alice_mask,
                    stopwords=STOPWORDS,
                    background_color='white')

이미지는 실제로는 픽셀마다 숫자로 되어져있다.
이미지를 array 구조로 받아서 wordcloud 처리

wcloud2 = wcloud2.generate(alice)

wcloud2.words_

결과

...생략

plt.figure(figsize=(12,12))
plt.imshow(wcloud2)
plt.axis('off')
plt.show()

결과

LIST

저작자표시

'Programming > Python' 카테고리의 다른 글

Python 데이터 분석 - Google Maps Setting (0)	2020.06.24
Python 데이터 분석 - Google API (0)	2020.06.24
Python 데이터 분석 - Pandas (0)	2020.06.24
Python 데이터 분석 - NumPy (0)	2020.06.24
Python 데이터 분석 - 데이터 로딩 (0)	2020.06.24