數據視覺化 | Matplotlib | 神經網絡與深度學習 | Tensorflow 2.0 系列 (5) | AI入門

Matplotlib繪圖基礎

Matplotlib：第三方資源庫，可以快速簡單地生成高質量的圖表

數據視覺化(Data visualization)

數據分析階段：理解和觀察數據之間的關係
算法調試階段：發現問題，改良算法
項目總結階段：展示項目成果

安裝Matplotlib

Anaconda: Matplotlib會自動安裝於Anaconda之中
pip安裝：pip install matplolib
匯入Matplotlib：import matplotlib.pyplot as ply

Figure物件

Figure物件用於建立畫布：figure( num,figsize,dpi,facecolor,edgecolor,frameon )

num: 圖形編號或名稱，取值為數字/字串
figsize: 繪圖物件的寬和高，單位為英吋
dpi: 繪圖物件的解像度，預設值為80
facecolor: 背景顏色
edgecolor: 邊框顏色

frameon: 是否顯示邊框

import matplotlib.pyplot as plt

plt.figure(figsize=(3,2),facebolor='green') #創建畫布
plt.plot() #繪製空白圖形
plt.show() #顯示繪圖

劃分子圖

使用subplot()函數可以為畫布劃分子圖：subplot(行數,列數,子圖序號)

import matplotlib.pyplot as plt

fig = plt.figure()

plt.subplot(221)
plt.subplot(222)
plt.subplot(223)

plt.show()

執行結果：

全局標題及佈局

import matplotlib.pyplot as plt 

fig = plt.figure(facecolor="lightgrey")

plt.subplot(2,2,1)
plt.tutle('子標題1')
plt.subplot(2,2,2)
plt.title('子標題2',loc="left",color="b")
plt.subplot(2,2,3)
myfontdict = {"fontsize":12, "color":"g", "rotation":30}
plt.title('子標題3',fontdict=myfontdict)
plt.subplot(2,2,4)
plt.title('子標題4',color="white",backgroundcolor="black")

plt.suptitle("全局標題", fontsize=20, color="red", backgroundcolor="yellow")

# 檢查坐標軸標籤、刻度標籤、和子圖標題，自動調整子圖，使之填充整個繪圖區域，並消除子圖之間的重疊
plt.tight_layour(rect=[0,0,1,0.9]) # tight_layout( rect=[left, bottom, right, top])

plt.show()

執行結果：

scatter()函數-散點圖

scatter( x, y, scale, color, marker, label )
繪製標準正態分佈的散點圖範例如下：

import numpy as np # 匯入numpy庫
import matplotlib.pyplot as plt # 匯入繪圖庫

plt.rcParams['axes.unicode_minus']=False # 正常顯示負號


n=1024 # 隨機點個數，1024
x = np.random.normal(0,1,n) # 生成數據點x坐標
y = np.random.normal(0,1,n) # 生成數據點y坐標

plt.scatter(x, y, color="blue",marker='*' ) # 繪製數據點

plt.title("standard normal distribution",fontsize=20) # 設定標題
plt.text(2.5,2.5,"mean value: 0\nstandard deviation: 1") # 顯示文本

plt.xlim(-4,4) # x軸範圍
plt.ylim(-4,4) # y軸範圍

plt.xlabel('x axis', fontsize=14) # 設定x軸標籤文本
plt.ylabel('y axis', fontsize=14) # 設定y軸標籤文本

plt.show() # 顯示繪圖

執行結果：

標準正態分佈、均勻分佈的散點圖範例如下：

import numpy as np 
import matplotlib.pyplot as plt 

n=1024 
x1 = np.random.normal(0,1,n) 
y1 = np.random.normal(0,1,n)

x2 = np.random.uniform(-4,4,(1,n))
y2 = np.random.uniform(-4,4,(1,n))


plt.scatter(x1, y1, color="blue",marker='*',label="normal Distribution")
plt.scatter(x2, y2, color="yellow",marker='o',label="uniform distribution")

plt.legend()
plt.title("standard normal distribution",fontsize=20) 

plt.xlim(-4,4) 
plt.ylim(-4,4) 

plt.xlabel('x axis', fontsize=14) 
plt.ylabel('y axis', fontsize=14) 

plt.show()

折線圖 Line Chart

折線圖(Line Chart): 在散點圖的基礎上，將臨近的點用線段相連接，多用於描述變量變化的趨勢。
plot()函數：plot(x, y, color, marker, label, linewidth, markersize)
折線圖範例：
bar( left, height, width, facecolor, edgecolor, label )

執行結果：

柱形圖 Bar Chart

bar()函數：bar( left, height, width, facecolor, edgecolor, label )
柱形圖範例：

import numpy as np 
import matplotlib.pyplot as plt 

plt.rcParams['axes.unicode_minus']=False 

y1=[32,25,16,30,24,45,40,33,28,17,24,20]
y2=[-23,-35,-26,-35,-45,-43,-35,-32,-23,-17,-22,-28]

plt.bar(range(len(y1)), y1, width=0.8,facecolor='green',edgecolor='white',label='statistics1')
plt.bar(range(len(y2)), y2, width=0.8,facecolor='red',edgecolor='white',label='statistics2')

plt.title("bar chart",fontsize=20)

plt.legend()
plt.show()

執行結果：

波士頓樓價數據集視覺化

Keras 簡介

-Keras是一個高層的神經網絡和深度學習資源庫
-TensorFlow的官方AKP，可以快速建構神經網絡模型，非常易於調試和擴展
-內置了常用的公共數據集，可以通過keras.datasets模組匯入

Keras中的數據集

boston_housing：波士頓樓價書記
CIFAR10：10種類別的圖片集
CIFAR100：100種類別的圖片集
MNIST：手寫數字圖片集
Fashin-MNIST：10種時尚類別的圖片集
IMDB：電影影評數據集
reuters：路透社新聞書記

波士頓樓價數據集簡介

來自CMU(Carnegie Mellon University)，StatLib庫，1978年
涵蓋了麻省波士頓的506個不同郊區的房屋數據
404條訓練數據集，102條測試數據集
每條數據14個字段，包含13個屬性，和1個房價的平均值

數據說明：

CRIM 城鎮人均犯罪率 e.g. 0.00632
ZN 超過25000平方英尺的住宅用地所占比例 e.g. 18.0
INDUS 城鎮非零售業的商業用地所占比例 e.g. 2.31
CHAS 是否被Charles河流穿過(取值1:是;取值0:否) e.g. 0
NOX 一氧化碳濃度 e.g. 0.538
RM 每棟住宅的平均房間數 e.g. 6.575
AGE 早於1940年建成的自住房屋比例 e.g. 65.2
DIS 到波士頓5個中心區域的加權平均距離 e.g. 4.0900
RAD 到達高速公路的便利指數 e.g. 1
TAX 每10000美元的全值財產稅率 e.g. 296
PTRATIO 城鎮中師生比例 e.g. 15.3
B 反映城鎮中的黑人比例的指標，越靠近0.63越小; B=1000*(BK-0.63)2，其中BK是黑人的比例 e.g. 396.90
LSTAT 低收入人口的比例 e.g. 7.68
自住房屋房價的平均房價(單位為1000美元) e.g. 24.0

載入數據集

預設下載路徑：MacOS:(User/.keras/datasets)，Windows: (\Users\user_name.keras\datasets)

import tensorflow as tf
boston_housing = tf.keras.datasets.boston_housing # 前綴:tf.keras.datasets，數據集名稱:boston_housing
# 訓練數據集，# 測試數據集，#test_split=0.2
(train_x, train_y), (test_x, test_y) = boston_housing.load_data()

print("Training set:", len(train_x))
print("Testing set:", len(test_x))

執行結果：

1 2	Training set: 404 Testing set: 102

改變書記訓練和測試的劃分比例：

import tensorflow as tf
boston_housing = tf.keras.datasets.boston_housing
(train_x, train_y), (test_x, test_y) = boston_housing.load_data(test_split=0) # 提取全部數據作為訓練集

print("Training set:", len(train_x))
print("Testing set:", len(test_x))

執行結果：

1 2	Training set: 506 Testing set: 0

訪問數據

訪問數據集中的數據：

>>>type(train_x)
numpy.ndarray
>>>type(train_y)
numpy.ndarray

>>>print("Dim of train_x:",train_x.ndim)
Dim of train_x: 2
>>>print("Shape of train_x:",train_x.shape)
Shape of train_x: (506, 13)

>>>print("Dim of train_y:",train_y.ndim)
Dim of trainn_y: 1
>>>print("Shape of train_y:",train_y.shape)
Shape of train_y: (506,)

平均房間數與樓價的關係

以下範例將平均房間數與樓價的關係視覺化：

import matplotlib.pyplot as plt
import numpy as np
import tensorflow as tf

boston_housing = tf.keras.datasets.boston_housing
(train_x, train_y), (_,_) = boston_housing.load_data(test_split=0)

plt.figure(figsize=(5,5)) # 設定繪圖尺寸
plt.scatter(train_x[:,5],train_y) # 繪製散點圖
plt.xlabel("RM") # 設定x軸標籤文本
plt.ylabel("Price($1000's)") # 設定y軸標籤文本
plt.title("5. RM-Price") # 設定標題
plt.show() # 顯示繪圖

執行結果：

所有屬性與樓價的關係

以下範例將數據集中的所有屬性與樓價的關係視覺化：

import matplotlib.pyplot as plt
import numpy as np
import tensorflow as tf

boston_housing = tf.keras.datasets.boston_housing
(train_x, train_y), (_,_) = boston_housing.load_data(test_split=0)
plt.rcParams['axes.unicode_minus']=False

titles = ["CRIM", "ZN", "INDUS", "CHAS", "NOX", "RM", "AGE", "DIS", "RAD", "TAX", "PTRATIO", "B-1000", "LSTAT", "MEDV"]

plt.figure(figsize=(12,12))


for i in range(13):
    plt.subplot(4,4,(i+1))
    plt.scatter(train_x[:,i], train_y, s=10) # s=10: plot size
    plt.xlabel(titles[i])
    plt.ylabel("Price($1000' s)")
    plt.title(str(i+1)+ "." +titles[i]+" - Price")
    
plt.tight_layout()
plt.suptitle("Corelation between attributes and house price", x=0.5, y=1.02, fontsize=20)
plt.show()

執行結果：

下載鳶尾花數據集

鳶尾花數據集介紹

-Anderson’s Iris Data Set: 1936，Ronald Fisher，The use of multiple measurements in taxonomic problems
加拿大的加斯帕半島，同一天的同一個時間段，在相同的牧場上，由同一個人使用相同的測量儀器收集。
3種鳶尾花類別，每個類別有50個樣本，每個樣本中包括4種鳶尾花的屬性特特徵，和鳶尾花的品種。
Iris數據集：有150個樣本，4個屬性（花萼長度、花萼寬度、花瓣長度、花瓣寬度），3個標籤：山鳶尾(Setosa)，變色鳶尾(Versicolour) ，維吉尼亞鳶尾(Virginica)。

鳶尾花數據集下載方法

get_file()函數：tf.keras.utils.get_file(fname, origin, cache_dir)
fname: 下載後的文件名稱，origin: 文件的URL，cache_dir: 下載後文件的儲存位置 (預設 User/.keras/datasets)
下載數據集的基礎程式碼：

1
2
3

import tensorflow as tf
TRAIN_URL = "http://download.tensorflow.org/data/iris_training.csv"
train_path = tf.keras.utils.get_file("iris_training.csv", TRAIN_URL)

iris_training.csv: 120個樣本，4個屬性，3個標籤

進階版程式碼：利用split()函數：通過指定的分隔符對字符串進行切片，並返回一個串列，例如：

1
2
3

import tensorflow as tf
TRAIN_URL = "http://download.tensorflow.org/data/iris_training.csv" 
TRAIN_URL.split('/') # output: ['http:', '', 'download.tensorflow.org', 'data', 'iris_training.csv']

1
2
3

import tensorflow as tf
TRAIN_URL = "http://download.tensorflow.org/data/iris_training.csv" # 下載其他數據集時只需改變URL即可
train_path = tf.keras.utils.get_file(TRAIN_URL.split('/')[-1], TRAIN_URL) # 自動獲取數據集文件名稱

Pandas存取csv數據集

Pandas庫(Panel Data & Data Analysis) ：用於數據統計和分析，可以高效、方便地操作大型數據集
匯入Pandas庫：import pandas as pd

讀取csv數據集文件

pd.read_csv( filepath_or_buffer, header, names)

import tensorflow as tf
import pandas as pd
TRAIN_URL = "http://download.tensorflow.org/data/iris_training.csv"
train_path = tf.keras.utils.get_file(TRAIN_URL.split('/')[-1], TRAIN_URL)
pd.read_csv(train_path)
# df_iris=pd.readcsv(train_path)
# >>>type(df_iris)
# pandas.core.frame.Dataframe // DataFrame二維數據表是一種Pandas中常用的數據類型

執行結果：

設定列標題

pd.read_csv( filepath_or_buffer, header, names)
header 的取值是行號，行號從0開始 header=0，第1行數據做為列標題(預設設置) header=None, 沒有列標題。
header = 0，使用第1行的數據作為列標題

import tensorflow as tf
import pandas as pd
TRAIN_URL = "http://download.tensorflow.org/data/iris_training.csv"
train_path = tf.keras.utils.get_file(TRAIN_URL.split('/')[-1], TRAIN_URL)
df_iris = pd.read_csv(train_path, header=0) # header=0, 第1行數據作為列標題
df_iris.head() # 讀取前5行

執行結果：

header = None, 數據文件中沒有表頭

import tensorflow as tf
import pandas as pd
TRAIN_URL = "http://download.tensorflow.org/data/iris_training.csv"
train_path = tf.keras.utils.get_file(TRAIN_URL.split('/')[-1], TRAIN_URL)
df_iris = pd.read_csv(train_path, header=None) # header = None, 數據文件中沒有表頭
df_iris.head()

執行結果：

自定標題，names參數，用作替代header參數指定的列標題

import tensorflow as tf
import pandas as pd
TRAIN_URL = "http://download.tensorflow.org/data/iris_training.csv"
train_path = tf.keras.utils.get_file(TRAIN_URL.split('/')[-1], TRAIN_URL)

COLUMN_NAMES = ['SepalLength', 'SepalWidth', 'PetalLength', 'PetalWidth', 'Species']
df_iris = pd.read_csv(train_path, names=COLUMN_NAMES,header=0)
df_iris.head()

執行結果：

訪問數據

head()函數：head(n)，讀取前n行數據，參數為空時，預設讀取二維數據列表的前5行數據 e.g. df_iris.head(8)

tail(函數)：tail(n)，讀取後n行數據 e.g. df_iris_tail(8)

使用指針和切片：df_iris[10:16]，讀取指針(index)為10-15行的數據

顯示統計資訊

describle()函數：顯示二維數據的統計信息 e.g. df_iris.describe()，
count:總數，mean:平均值，std:標準差，min:最小值，max:最大值
執行結果：

DataFrame的常用屬性

ndim 數據表的維數 e.g. df_iris.ndim 結果：2
shape 數據表的形狀 e.g. df_iris.shape 結果：(120, 5)
size 數據表元素的總個數 e.g. df_iris.size 結果：600

轉化為NumPy陣列

使用NumPy中的創建陣列函數array()

>>>iris=np.array(df_iris)

>>>type(df_iris)
pd.core.frame.DataFrame

>>>type(iris)
numpy.ndarray

使用DataFrame中的values方法或as_matrix()方法

1 2	>>>iris = df_iris.values >>>iris = df_iris.as_matrix()

訪問陣列元素

使用指針和切片訪問陣列中的元素：
一維切片，讀取前6行 iris[0:6]
二維切片，讀取前6行中的前4列 iris[0:6,0:4]
二維切片，讀取所有行中的鳶尾花種類iris y = iris[:,4]

鳶尾花數據集視覺化

使用花瓣長度和寬度 (2個屬性) 來建立散點圖：

import tensorflow as tf
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

TRAIN_URL = "http://download.tensorflow.org/data/iris_training.csv"
train_path = tf.keras.utils.get_file(TRAIN_URL.split('/')[-1], TRAIN_URL)

COLUMN_NAMES = ['SepalLength', 'SepalWidth', 'PetalLength', 'PetalWidth', 'Species']
df_iris = pd.read_csv(train_path, names=COLUMN_NAMES,header=0)

iris = np.array(df_iris)
plt.scatter(iris[:,2],iris[:,3],c=iris[:,4],cmap='brg',s=12)
plt.title("Anderson's Iris Data Set\n(Blue: Setosa | Red: Versicolor | Green: Virginica)")
plt.xlabel(COLUMN_NAMES[2])
plt.ylabel(COLUMN_NAMES[3])
plt.show()

執行結果：

鳶尾花數據集視覺化：

import tensorflow as tf
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

TRAIN_URL="http://download.tensorflow.org/data/iris_training.csv"
train_path=tf.keras.utils.get_file(TRAIN_URL.split('/')[-1],TRAIN_URL)

column_names=['SepalLength','SepalWidth','PetalLength','PetalWidth','Species']
df_iris=pd.read_csv(train_path,header=0,names=column_names)

iris=np.array(df_iris)

fig=plt.figure('Iris Data',figsize=(15,15))

plt.suptitle("Andreson's Iris Dara Set\n(Blue: Setosa|Red: Versicolor|Green: Virginical)")

for i in range(4):
    for j in range(4):
        plt.subplot(4,4,4*i+(j+1)) # 子圖序號：4 x i +(j+1)
        if(i==j):
            plt.text(0.3,0.4,column_names[i],fontsize=15) # Output text
        else:
            plt.scatter(iris[:,j],iris[:,i],c=iris[:,4],cmap='brg',s=12)
            
        if(i==0):
            plt.title(column_names[j])
        if(j==0):
            plt.ylabel(column_names[i])

plt.show()

執行結果：