seaborn数据处理：数据可视化-Seaborn

威哥 2023-02-18 01:55:27 442

seaborn数据处理：数据可视化-Seaborn（total_bill：消费总金额，tip：小费金额，sex：性别，smoker：是否吸烟，day：消费日期，time：消费时段，size：聚餐人数）tips数据集前两条数据如下：可以毫不夸张的说，你想象力能及的图表，Seaborn都能绘制！2. 样例数据本文所有的可视乎图表都是基于Seaborn自带的餐厅顾客消费数据集tips而绘制的。

1. Seaborn简介

seaborn数据处理：数据可视化-Seaborn(1)

Seaborn是一个基于matplotlib且数据结构与Pandas统一的统计图制作库。

Seaborn库旨在以数据可视化为中心来挖掘并理解数据。

Seaborn提供的面向数据集制图函数主要是对行列索引和数组的操作，包含对整个数据集进行内部的语义映射与统计整合。

可以毫不夸张的说，你想象力能及的图表，Seaborn都能绘制！

2. 样例数据

本文所有的可视乎图表都是基于Seaborn自带的餐厅顾客消费数据集tips而绘制的。

tips数据集前两条数据如下：

seaborn数据处理：数据可视化-Seaborn(2)

（total_bill：消费总金额，tip：小费金额，sex：性别，smoker：是否吸烟，day：消费日期，time：消费时段，size：聚餐人数）

3. Seaborn总览

seaborn数据处理：数据可视化-Seaborn(3)

关系图
关系图一般是用来表达双变量关系的图表。

seaborn数据处理：数据可视化-Seaborn(4)

import numpy as np import pandas as pd import matplotlib.pyplot as plt import seaborn as sns sns.set(style='darkgrid') tips = sns.load_dataset('tips') sns.relplot(x='total_bill' y='tip' data=tips) sns.relplot(x="total_bill" y="tip" hue="smoker" data=tips); fmri = sns.load_dataset("fmri") sns.relplot(x="timepoint" y="signal" kind="line" data=fmri);

seaborn数据处理：数据可视化-Seaborn(5)

seaborn数据处理：数据可视化-Seaborn(6)

分类图
对可分类的数据进行可视化；可以通过散点图、分布图、估计图等形式呈现分类图。

seaborn数据处理：数据可视化-Seaborn(7)

import seaborn as sns import matplotlib.pyplot as plt sns.set(style='ticks' color_codes=True) tips = sns.load_dataset('tips') sns.catplot(x='day' y='total_bill' data=tips) sns.catplot(x='day' y='total_bill' kind='swarm' data=tips) sns.catplot(x='day' y='total_bill' kind='box' data=tips) diamonds = sns.load_dataset('diamonds') sns.catplot(x='color' y='price' kind='boxen' data=diamonds.sort_values('color')) sns.catplot(x="total_bill" y="day" hue="time" kind="violin" data=tips) titanic = sns.load_dataset("titanic") sns.catplot(x="sex" y="survived" hue="class" kind="point" data=titanic) sns.catplot(x="sex" y="survived" hue="class" kind="bar" data=titanic) sns.catplot(x="deck" kind="count" palette="ch:.25" data=titanic)

seaborn数据处理：数据可视化-Seaborn(8)

seaborn数据处理：数据可视化-Seaborn(9)

seaborn数据处理：数据可视化-Seaborn(10)

seaborn数据处理：数据可视化-Seaborn(11)

seaborn数据处理：数据可视化-Seaborn(12)

seaborn数据处理：数据可视化-Seaborn(13)

seaborn数据处理：数据可视化-Seaborn(14)

回归图

对数据进行回归，并绘制回归出函数。

seaborn数据处理：数据可视化-Seaborn(15)

import numpy as np import seaborn as sns import matplotlib.pyplot as plt sns.set(color_codes=True) tips = sns.load_dataset("tips") sns.lmplot(x="total_bill" y="tip" data=tips) sns.residplot(x="x" y="y" data=anscombe.query("dataset == 'II'") scatter_kws={"s": 80}) f ax = plt.subplots(figsize=(5 6)) sns.regplot(x="total_bill" y="tip" data=tips ax=ax)

seaborn数据处理：数据可视化-Seaborn(16)

seaborn数据处理：数据可视化-Seaborn(17)

seaborn数据处理：数据可视化-Seaborn(18)

分布图
用于检查单变量或双变量分布的图表。

seaborn数据处理：数据可视化-Seaborn(19)

import seaborn as sns import matplotlib.pyplot as plt from scipy import stats sns.set(color_codes=True) x = np.random.normal(size=100) sns.distplot(x) sns.kdeplot(x shade=True) mean cov = [0 1] [(1 .5) (.5 1)] data = np.random.multivariate_normal(mean cov 200) df = pd.DataFrame(data columns=["x" "y"]) sns.jointplot(x="x" y="y" data=df) iris = sns.load_dataset("iris") sns.pairplot(iris)

seaborn数据处理：数据可视化-Seaborn(20)

seaborn数据处理：数据可视化-Seaborn(21)

seaborn数据处理：数据可视化-Seaborn(22)

seaborn数据处理：数据可视化-Seaborn(23)

矩阵图
以矩阵的形式呈现可视化的数据集。

seaborn数据处理：数据可视化-Seaborn(24)

import matplotlib.pyplot as plt import seaborn as sns import pandas as pd sns.set_theme() # Load the example flights dataset and convert to long-form flights_long = sns.load_dataset("flights") flights = flights_long.pivot("month" "year" "passengers") # Draw a heatmap with the numeric values in each cell f ax = plt.subplots(figsize=(9 6)) sns.heatmap(flights annot=True fmt="d" linewidths=.5 ax=ax) sns.set_theme() # Load the brain networks example dataset df = sns.load_dataset("brain_networks" header=[0 1 2] index_col=0) # Select a subset of the networks used_networks = [1 5 6 7 8 12 13 17] used_columns = (df.columns.get_level_values("network") .astype(int) .isin(used_networks)) df = df.loc[: used_columns] # Create a categorical palette to identify the networks network_pal = sns.husl_palette(8 s=.45) network_lut = dict(zip(map(str used_networks) network_pal)) # Convert the palette to vectors that will be drawn on the side of the matrix networks = df.columns.get_level_values("network") network_colors = pd.Series(networks index=df.columns).map(network_lut) # Draw the full plot g = sns.clustermap(df.corr() center=0 cmap="vlag" row_colors=network_colors col_colors=network_colors dendrogram_ratio=(.1 .2) cbar_pos=(.02 .32 .03 .2) linewidths=.75 figsize=(12 13)) g.ax_row_dendrogram.remove()

seaborn数据处理：数据可视化-Seaborn(25)