Pivot data from a wide to a long format suitable for plotting Sankey diagrams.
Usage
pivot_stages_longer(
data,
stages_from,
values_from,
additional_aes_from,
invert_nodes = FALSE
)
Arguments
- data
A
data.frame
(or an object inheriting thedata.frame
class), which needs to be pivoted.- stages_from
A
vector
of column names, which represent the stages.- values_from
A
vector
of column names, which containsnumeric
values that represent the size of the edges in Sankey diagrams. When there are multiple values for a single edge, they are summed.- additional_aes_from
A
vector
of column names of data that you want to use to decorate elements in your Sankey diagram. This argument is optional. See alsovignette("data_management")
andvignette("decorating")
.- invert_nodes
When pivoting information from
stages_from
, its data is converted into afactor
. Setinvert_nodes
toTRUE
if you want to invert the order of the levels of thefactor
.
Value
Returns a dplyr::tibble with all the selected columns from data
pivoted.
The stages will be listed in the column named stage
and nodes in the column named
node
. The result will contain two new columns: a column named connector
indicating
whether the row in the tibble
reflects the source of an edge (value 'from'
) or
destination of an edge (value 'to'
); and a column named edge_id
, containing a
unique identifier for each edge. The edge_id
is required for the plotting routine
in order to identify which edge source should be connected with which edge destination.
Details
Typically, data to be displayed as a Sankey, is collected and stored in a
wide format, where each stage (i.e., x-axis of a Sankey diagram) is in a
column. The ggplot2
philosophy requires the data to be in a long format,
such that diagram decorations (aesthetics) can be mapped to specific
columns.
This function pivots wide data in an appropriate long format, by indicating which columns contain the stages, and in which order they should appear in the Sankey.
For more details see vignette("data_management")
Examples
data("ecosystem_services")
ecosystem_services_p1 <-
pivot_stages_longer(
data = ecosystem_services,
stages_from = c("activity_type", "pressure_cat",
"biotic_group", "service_division"),
values_from = "RCSES")
## suppose we want to decorate our Sankey
## with information on the 'section' of the services:
ecosystem_services_p2 <-
pivot_stages_longer(
data = ecosystem_services,
stages_from = c("activity_type", "pressure_cat",
"biotic_group", "service_division"),
values_from = "RCSES",
additional_aes_from = "service_section")