Tags : #R/R可视化 #R/R数据科学 #R/index/01
因为我也并非逐帧翻译,所以我强烈建议你看完ggplot 的入门书籍之后,就自己手撕一下上面的教程。
开始之前
请直接加载tidyverse
套件。
这里使用数据:
Copy chic <- readr::read_csv("https://raw.githubusercontent.com/Z3tt/R-Tutorials/master/ggplot2/chicago-nmmaps.csv")
ps: read_csv 命令可以从网络读取文件。
了解一下该数据:
Copy > glimpse(chic)
Rows: 1,461
Columns: 10
$ city <chr> "chic", "chic", "chic", "chic", "chic", "chic", "chic", "ch…
$ date <date> 1997-01-01, 1997-01-02, 1997-01-03, 1997-01-04, 1997-01-05…
$ death <dbl> 137, 123, 127, 146, 102, 127, 116, 118, 148, 121, 110, 127,…
$ temp <dbl> 36.0, 45.0, 40.0, 51.5, 27.0, 17.0, 16.0, 19.0, 26.0, 16.0,…
$ dewpoint <dbl> 37.500, 47.250, 38.000, 45.500, 11.250, 5.750, 7.000, 17.75…
$ pm10 <dbl> 13.052268, 41.948600, 27.041751, 25.072573, 15.343121, 9.36…
$ o3 <dbl> 5.659256, 5.525417, 6.288548, 7.537758, 20.760798, 14.94087…
$ time <dbl> 3654, 3655, 3656, 3657, 3658, 3659, 3660, 3661, 3662, 3663,…
$ season <chr> "Winter", "Winter", "Winter", "Winter", "Winter", "Winter",…
$ year <dbl> 1997, 1997, 1997, 1997, 1997, 1997, 1997, 1997, 1997, 1997,…
1. ggplot 的元素对象
这些元素对象并不都是必须的,但都对应着不同的元素: 但一般来说,data 和Geometries 是必须的,我们必须告诉ggplot 用什么数据,画什么图。
Copy Data: The raw data that you want to plot.
Geometries geom_: The geometric shapes that will represent the data.
Aesthetics aes(): Aesthetics of the geometric and statistical objects, such as position, color, size, shape, and transparency
Scales scale_: Maps between the data and the aesthetic dimensions, such as data range to plot width or factor values to colors.
Statistical transformations stat_: Statistical summaries of the data, such as quantiles, fitted curves, and sums.
Coordinate system coord_: The transformation used for mapping data coordinates into the plane of the data rectangle.
Facets facet_: The arrangement of the data into a grid of plots.
Visual themes theme(): The overall visual defaults of a plot, such as background, grids, axes, default typeface, sizes and colors.
2. ggplot2 的几何对象
3. 映射
关于aes 相关参数可以直接为这些参数赋值为相关的变量,通过映射的方式,按照函数默认方式为它们赋值。
Copy ggplot(data = test)+
geom_point(mapping = aes(x = Sepal.Length,
y = Petal.Length,
color = Species))
如果想要将以上的参数赋值为手动定义的内容,则需要将其抽出aes 函数内。
Copy ggplot(data = test)+
geom_point(mapping = aes(x = Sepal.Length,
y = Petal.Length),
color = "red")
手动设置与映射
shape
color/fill
为了区分图形的轮廓与内部颜色,分别使用color 与fill 对应:
Copy ggplot(chic, aes(x = date, y = temp)) +
geom_point(shape = 21, size = 2, stroke = 1,
color = "#3cc08f", fill = "#c08f3c") +
labs(x = "Year", y = "Temperature (°F)")
分类变量颜色
如果我们想要给映射的颜色进行自定义,可以使用函数scale_color_manual :
Copy ga + scale_color_manual(values = c("dodgerblue4",
"darkolivegreen4",
"darkorchid3",
"goldenrod1"))
或者也可以指定调色板:
Copy # ga + scale_color_brewer(palette = "Set1")
library(ggthemes)
ga + scale_color_tableau()
处理连续性变量颜色
对于连续性变量,R 会自动进行识别,但我们并不能像分类变量一样直接指定颜色,我们可以通过函数scale_color_gradient 修改:
Copy gb + scale_color_gradient(low = "darkkhaki",
high = "darkgreen")
除此之外,我们还可以指定数值的中间变化点:
Copy mid <- mean(chic$temp) ## midpoint
gb + scale_color_gradient2(midpoint = mid)
# 也可以同时指定颜色
gb + scale_color_gradient2(midpoint = mid, low = "#dd8a0b",
mid = "grey92", high = "#32a676")
在新版本的ggplot 中,我们可以
更多颜色的知识
这里有本关于颜色的pdf:http://www.stat.columbia.edu/~tzheng/files/Rcolor.pdf
关于颜色,可以参见我的专题:
4. 分面
其实也就是在本来的x, y等映射之上,增加了分面的映射,我们不仅可以按照行也可以按照列做应映射,其中主要包括两个函数:facet_wrap
,对单一变量映射,但可以调整分面后图片在每层与每列的数目;facet_grid
,可以接受两个变量映射。
facetgrid()
Copy ggplot(mpg) +
geom_point(aes(displ,hwy,color=drv)) +
facet_grid(drv ~ cyl)
facet_warp()
facet_grid 对多图形的分面显示不是特别友好,而facet_warp() 则可以设定分面行与列的数目。
对比一下
Copy ggplot(mpg) +
geom_point(aes(displ,hwy,color=drv)) +
facet_grid(class~.)
Copy ggplot(mpg) +
geom_point(aes(displ,hwy,color=drv)) +
facet_wrap(~class, ncol = 3)
warp与grid 的区别
一些参数
Copy ggplot(chic, aes(x = date, y = temp)) +
geom_point(color = "orangered", alpha = .3) +
theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust = 1)) +
labs(x = "Year", y = "Temperature (°F)") +
facet_wrap(~ year, ncol = 2, scales = "free")
我们可以修改scales 参数,让其稍微好看一些:
Copy ggplot(chic, aes(x = date, y = temp)) +
geom_point(color = "orangered", alpha = .3) +
theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust = 1)) +
labs(x = "Year", y = "Temperature (°F)") +
facet_wrap(season ~ year, ncol = 4, scales = "free_x")
5. 坐标轴
限定坐标区域
我们可以调整坐标轴大小:
Copy scale_y_continuous(limits = c(0, 50))
# 限制数据范围,超出范围数据不显示
coord_cartesian(ylim = c(0, 50))
# 直接限制图的坐标
调整坐标比例
Copy ggplot(chic, aes(x = temp, y = temp + rnorm(nrow(chic), sd = 20))) +
geom_point(color = "sienna") +
labs(x = "Temperature (°F)", y = "Temperature (°F) + random noise") +
xlim(c(0, 100)) + ylim(c(0, 150)) +
coord_fixed()
利用函数处理
这个通常可以用来批量对坐标上的标记进行处理:
Copy ggplot(chic, aes(x = date, y = temp)) +
geom_point(color = "firebrick") +
labs(x = "Year", y = NULL) +
scale_y_continuous(label = function(x) {return(paste(x, "Degrees Fahrenheit"))})
其他参数
其他参数包括:
Copy expand_limits(x = 0, y = 0)
# 强制锁定坐标轴初始位点
# 和 coord_cartesian(xlim = c(0, NA), ylim = c(0, NA)) 效果一致
coord_cartesian(clip = "off")
# 允许坐标画在坐标轴上
6. 主题
文本属性
通过theme
函数,我们可以修改一些主题中的元素。 比如通过labs 添加的文本,可以通过theme 修改其位置、大小、颜色等属性,包括:
Copy axis.title.x # x轴标题
axis.text # 坐标轴文本标记
axis.ticks # 坐标轴标记点
plot.subtitle # 亚标题
plot.caption # 注释
legend.title # 图例标题
legend.text # 图例文本
legend.background # 图例背景
legend.key # 图例标记背景
其中的参数有:
Copy vjust # 上下移动,正为下,负为下
hjust # 左右移动
lineheight # 也可以用来改变所在的高度,值越大越高,接近0 表示该文本与其他文本位置重合
size # 大小
# 大小可以利用rel 函数,如rel(1.5),就表示增大到原先的1.5倍
angle # 偏转角度,默认为水平
margin = margin(t = 10)
# 图轴上移动
margin = margin(r = 10)
# 图轴右移动
margin = margin(10, 10, 10, 10)
## t r l b(trouble) 上右左下
face = "italic" # 字体
color = "firebrick" # 颜色
范例:
Copy ggplot(chic, aes(x = date, y = temp)) +
geom_point(color = "firebrick") +
labs(x = "Year", y = "Temperature (°F)") +
theme(axis.title.x = element_text(margin = margin(t = 10), size = 15),
axis.title.y = element_text(margin = margin(r = 10), size = 15))
直接取消文本:
Copy ggplot(chic, aes(x = date, y = temp)) +
geom_point(color = "firebrick") +
labs(x = "Year", y = "Temperature (°F)") +
theme(axis.ticks.y = element_blank(),
axis.text.y = element_blank())
Copy ggplot(chic, aes(x = date, y = temp)) +
geom_point(color = "firebrick") +
labs(x = NULL, y = "")
但并不是所有文本都可以在labs 中被定义,比如图例的标题:
Copy ggplot(chic, aes(x = date, y = temp, color = season)) +
geom_point() +
labs(x = "Year", y = "Temperature (°F)") +
theme(legend.title = element_blank())
文本位置
除了通过hjust 等调整,我们还可以使用参数plot.xx.position
:
一般包括plot
与panel
两种。
对于legend,还有"none"
,表示不显示图例:
Copy ggplot(chic, aes(x = date, y = temp)) +
geom_point(aes(color = season)) +
labs(x = "Year", y = "Temperature (°F)",
title = "Temperatures in Chicago",
subtitle = "Seasonal pattern of daily temperatures from 1997 to 2001",
caption = "Data: NMMAPS",
tag = "Fig. 1") + theme_classic() +
theme(text = element_text(family = "gochi"), legend.position = "none")
关于图例的位置,在下一部分介绍。
和图例较劲
参见:
背景与画布
我们可以用ggplot 提供的自带主题来修改背景,比如我个人最喜欢的theme_classic 就直接呈现一个白板,特别简洁。
当然我们也可以自定义背景。
包括的参数有:
Copy panel.background # 画布
panel.border # 画布及画布边界
plot.background # 背景
如果我们希望把背景颜色换一下,可以使用:
Copy ggplot(chic, aes(x = date, y = temp)) +
geom_point(color = "#1D8565", size = 2) +
labs(x = "Year", y = "Temperature (°F)") +
theme(panel.background = element_rect(
fill = "#64D2AA", color = "#64D2AA", size = 2)
)
Copy ggplot(chic, aes(x = date, y = temp)) +
geom_point(color = "firebrick") +
labs(x = "Year", y = "Temperature (°F)") +
theme(plot.background = element_rect(fill = "gray60",
color = "gray30", size = 2))
网格
用来调整坐标上的网格:
Copy # panel.grid # 全部网格
# panel.grid.major # 主网格
# panel.grid.minor # 副网格
Copy ggplot(chic, aes(x = date, y = temp)) +
geom_point(color = "firebrick") +
labs(x = "Year", y = "Temperature (°F)") +
theme(panel.grid.major = element_line(size = .5, linetype = "dashed"),
panel.grid.minor = element_line(size = .25, linetype = "dotted"),
panel.grid.major.x = element_line(color = "red1"),
panel.grid.major.y = element_line(color = "blue1"),
panel.grid.minor.x = element_line(color = "red4"),
panel.grid.minor.y = element_line(color = "blue4"))
我们也可以通过坐标轴处理函数scale_y_continuous 来限定网格的距离:
Copy ggplot(chic, aes(x = date, y = temp)) +
geom_point(color = "firebrick") +
labs(x = "Year", y = "Temperature (°F)") +
scale_y_continuous(breaks = seq(0, 100, 10),
minor_breaks = seq(0, 100, 2.5))
边界
基础包绘图时,我们可能会用到mai 或mar 来控制边界,同样的,ggplot 中也提供了参数: plot.margin = margin
:
Copy ggplot(chic, aes(x = date, y = temp)) +
geom_point(color = "firebrick") +
labs(x = "Year", y = "Temperature (°F)") +
theme(plot.background = element_rect(fill = "gray60"),
plot.margin = margin(t = 1, r = 3, b = 1, l = 8, unit = "cm"))
调整分面的文字带
比如:
Copy ggplot(chic, aes(x = date, y = temp)) +
geom_point(color = "orangered", alpha = .3) +
labs(x = "Year", y = "Temperature (°F)") +
facet_grid(season ~ year) +
theme(strip.text = element_text(face = "bold", color = "chartreuse4",
hjust = 0, size = 20),
strip.background = element_rect(fill = "chartreuse3", linetype = "dotted"))
作者这里还提供了两套函数,借助ggtext 包,对strip 文本进行美化:
Copy library(ggtext)
library(rlang)
# 美化文本
element_textbox_highlight <- function(..., hi.labels = NULL, hi.fill = NULL,
hi.col = NULL, hi.box.col = NULL, hi.family = NULL) {
structure(
c(element_textbox(...),
list(hi.labels = hi.labels, hi.fill = hi.fill, hi.col = hi.col, hi.box.col = hi.box.col, hi.family = hi.family)
),
class = c("element_textbox_highlight", "element_textbox", "element_text", "element")
)
}
# 高亮分面
element_grob.element_textbox_highlight <- function(element, label = "", ...) {
if (label %in% element$hi.labels) {
element$fill <- element$hi.fill %||% element$fill
element$colour <- element$hi.col %||% element$colour
element$box.colour <- element$hi.box.col %||% element$box.colour
element$family <- element$hi.family %||% element$family
}
NextMethod()
}
# 画图
g + facet_wrap(year ~ season, nrow = 4, scales = "free_x") +
theme(
strip.background = element_blank(),
strip.text = element_textbox_highlight(
family = "Playfair", size = 12, face = "bold",
fill = "white", box.color = "chartreuse4", color = "chartreuse4",
halign = .5, linetype = 1, r = unit(5, "pt"), width = unit(1, "npc"),
padding = margin(5, 0, 3, 0), margin = margin(0, 1, 3, 1),
hi.labels = c("1997", "1998", "1999", "2000"),
hi.fill = "chartreuse4", hi.box.col = "black", hi.col = "white"
)
)
或者,还可以从分面的图像中高亮其中某块:
Copy ggplot(chic, aes(x = date, y = temp)) +
geom_point(aes(color = season == "Summer"), alpha = .3) +
labs(x = "Year", y = "Temperature (°F)") +
facet_wrap(~ season, nrow = 1) +
scale_color_manual(values = c("gray40", "firebrick"), guide = "none") +
theme(
axis.text.x = element_text(angle = 45, vjust = 1, hjust = 1),
strip.background = element_blank(),
strip.text = element_textbox_highlight(
size = 12, face = "bold",
fill = "white", box.color = "white", color = "gray40",
halign = .5, linetype = 1, r = unit(0, "pt"), width = unit(1, "npc"),
padding = margin(2, 0, 1, 0), margin = margin(0, 1, 3, 1),
hi.labels = "Summer", hi.family = "Bangers",
hi.fill = "firebrick", hi.box.col = "firebrick", hi.col = "white"
)
)
自带主题
ggplot2 提供了多种自带的主题,我们可以直接使用它们:
Copy theme_gray() 默认主题,灰色。
theme_bw() 非常适合显示透明度的映射内容。
theme_void() 去除非数据外的全部内容。
theme_classic() # 经典ggplot 主题,白板背景。
有个专门的R 包ggtheme 提供了各种杂志
需要注意的是,当我们使用了自带主题之后,先前的所有theme 设定都会被覆盖,因此如果想在默认主题下进行额外的操作,需要在之后添加。
7. ggplot 中的独立对象
title
labs
包含了ggplot 图形中的各种文本类型对象:
Copy ggplot(chic, aes(x = date, y = temp)) +
geom_point(color = "firebrick") +
labs(x = "Year", y = "Temperature (°F)",
title = "Temperatures in Chicago",
subtitle = "Seasonal pattern of daily temperatures from 1997 to 2001",
caption = "Data: NMMAPS",
tag = "Fig. 1")
如果是修改图例的标题,可以使用图例对应的aes 属性修改,比如创建的是在aes 中定义了color,则可以在labs 中指定:
Copy ggplot(chic, aes(x = date, y = temp, color = season)) +
geom_point() +
labs(x = "Year", y = "Temperature (°F)",
color = "Seasons\nindicated\nby colors:")
8. 拼图
我目前还是主要使用aplot 与patchwork。
个人认为,其语法上没有patchwork 简洁:
易错点
对于color, shape 等不连续的变量区分参数,不适于映射连续变量。(其一无法体现连续变量的变化趋势,其二这些不连续的参数其数量有限,无法有效区分连续变量)对于连续变量可以选择size, alpha等。