table1
# A tibble: 6 x 4
country year cases population
<chr> <int> <int> <int>
1 Afghanistan 1999 745 19987071
2 Afghanistan 2000 2666 20595360
3 Brazil 1999 37737 172006362
4 Brazil 2000 80488 174504898
5 China 1999 212258 1272915272
6 China 2000 213766 1280428583
各行は(国、年、タイプ)の組み合わせを表します。countには、各typeの値を含みます。
table2
# A tibble: 12 x 4
country year type count
<chr> <int> <chr> <int>
1 Afghanistan 1999 cases 745
2 Afghanistan 1999 population 19987071
3 Afghanistan 2000 cases 2666
4 Afghanistan 2000 population 20595360
5 Brazil 1999 cases 37737
6 Brazil 1999 population 172006362
# … with 6 more rows
table3
# A tibble: 6 x 3
country year rate
* <chr> <int> <chr>
1 Afghanistan 1999 745/19987071
2 Afghanistan 2000 2666/20595360
3 Brazil 1999 37737/172006362
4 Brazil 2000 80488/174504898
5 China 1999 212258/1272915272
6 China 2000 213766/1280428583
people <- tribble(
~name, ~key, ~value,
#-----------------|--------|------
"Phillip Woods", "age", 45,
"Phillip Woods", "height", 186,
"Phillip Woods", "age", 50,
"Jessica Cordero", "age", 37,
"Jessica Cordero", "height", 156
)
people %>%
spread(key = key, value = value)
エラー: Each row of output must be identified by a unique combination of keys.
Keys are shared for 2 rows:
* 1, 3
Do you need to create unique ID with tibble::rowid_to_column()?
Call `rlang::last_error()` to see a backtrace
このような場合は、一意に特定するための複合主キーを作成します。
people %>%
group_by(name, key) %>%
mutate(id = row_number()) %>%
spread(key = key, value = value, fill = NA)
# A tibble: 3 x 4
# Groups: name [2]
name id age height
<chr> <int> <dbl> <dbl>
1 Jessica Cordero 1 37 156
2 Phillip Woods 1 45 186
3 Phillip Woods 2 50 NA
preg <- tribble(
~pregnant, ~male, ~female,
"yes", NA, 10,
"no", 20, 12
)
preg
# A tibble: 2 x 3
pregnant male female
<chr> <dbl> <dbl>
1 yes NA 10
2 no 20 12
preg %>%
gather(male, female, key = "sex", value = "count")
# A tibble: 4 x 3
pregnant sex count
<chr> <chr> <dbl>
1 yes male NA
2 no male 20
3 yes female 10
4 no female 12
9.4 分割と接合
練習問題1 separate()のextraとfillは何をするのでしょうか。
tibble(x = c("a,b,c", "d,e,f,g", "h,i,j")) %>%
separate(x, c("one", "two", "three"))
# A tibble: 3 x 3
one two three
<chr> <chr> <chr>
1 a b c
2 d e f
3 h i j
警告メッセージ:
Expected 3 pieces. Additional pieces discarded in 1 rows [2].
tibble(x = c("a,b,c", "d,e", "f,g,i")) %>%
separate(x, c("one", "two", "three"))
# A tibble: 3 x 3
one two three
<chr> <chr> <chr>
1 a b c
2 d e NA
3 f g i
警告メッセージ:
Expected 3 pieces. Missing pieces filled with `NA` in 1 rows [2].
tibble(x = c("a,b,c", "d,e,f,g", "h,i,j")) %>%
separate(x, c("one", "two", "three"), extra = "drop")
# A tibble: 3 x 3
one two three
<chr> <chr> <chr>
1 a b c
2 d e f
3 h i j
separate()のextra = "merge"はカラムが不足している場合、その値を残します。
tibble(x = c("a,b,c", "d,e,f,g", "h,i,j")) %>%
separate(x, c("one", "two", "three"), extra = "merge")
# A tibble: 3 x 3
one two three
<chr> <chr> <chr>
1 a b c
2 d e f,g
3 h i j
この例では、2行目の要素が少ないため、NAが発生します。
tibble(x = c("a,b,c", "d,e", "f,g,i")) %>%
separate(x, c("one", "two", "three"))
# A tibble: 3 x 3
one two three
<chr> <chr> <chr>
1 a b c
2 d e NA
3 f g i
警告メッセージ:
Expected 3 pieces. Missing pieces filled with `NA` in 1 rows [2].
tibble(x = c("a,b,c", "d,e", "f,g,i")) %>%
separate(x, c("one", "two", "three"), fill = "right")
# A tibble: 3 x 3
one two three
<chr> <chr> <chr>
1 a b c
2 d e NA
3 f g i
tibble(x = c("a,b,c", "d,e", "f,g,i")) %>%
separate(x, c("one", "two", "three"), fill = "left")
# A tibble: 3 x 3
one two three
<chr> <chr> <chr>
1 a b c
2 NA d e
3 f g i
tibble(x = c("a,b,c", "d,e", "f,g,i")) %>%
separate(x, c("one", "two", "three"), remove = FALSE)
# A tibble: 3 x 4
x one two three
<chr> <chr> <chr> <chr>
1 a,b,c a b c
2 d,e d e NA
3 f,g,i f g i
警告メッセージ:
Expected 3 pieces. Missing pieces filled with `NA` in 1 rows [2].
tibble(x = c("a,b,c", "d,e", "h,i,j")) %>%
separate(x, c("one", "two", "three")) %>%
gather(key = no, value = val, one:three) %>%
mutate(id = row_number(),
noid = paste(no, id)) %>%
select(-id) %>%
spread(key = no, val = val, fill = 0)
# A tibble: 9 x 4
noid one three two
<chr> <chr> <chr> <chr>
1 one 1 a 0 0
2 one 2 d 0 0
3 one 3 h 0 0
4 three 7 0 c 0
5 three 8 0 0 0
6 three 9 0 j 0
7 two 4 0 0 b
8 two 5 0 0 e
9 two 6 0 0 i
complete()のfillは、特定の値をリストで指定し、欠損値を補完することが可能です。
tibble(group = c(1:2, 1),
id = c(1:2, 2),
val1 = 1:3,
val2 = 1:3) %>%
complete(group, id,
fill = list(val1 = 100, val2 = 1000))
# A tibble: 4 x 4
group id val1 val2
<dbl> <dbl> <dbl> <dbl>
1 1 1 1 1
2 1 2 3 3
3 2 1 100 1000
4 2 2 2 2