Similar to
as.numeric()
, but it also
performs some text sanitising before coercing text to numerics.
Details
The following steps are performed to sanitise text before coercing it to numerics:
Notes labelled with
"x"
or"\*"
are removed.Operators (
">"
,">="
,"<"
,"<="
,"~"
,"="
,"ca"
,"er"
) are removed.Text between brackets (
"()"
) is removed (including the brackets)Comma's are considered to be a thousand separator when they are located at any fourth character (from the right) and removed. Comma's at any other location is assumed to be a decimal separator and is replaced by a period.
If there is a hyphen present (not preceded by an "
"e"
or"E"
) it is probably representing a range of values. Whenrange_fun
isNULL
it will result in aNA
. Otherwise, the numbers are split at the hyphen and aggregated withrange_fun
It is your own responsibility to check if the sanitising steps are appropriate for your analyses.
Examples
## a vector of commonly used notations in the database to represent
## numeric values
char_num <- c("10", " 2", "3 ", "~5", "9.2*", "2,33",
"2,333", "2.1(1.0 - 3.2)", "1-5", "1e-3")
## Text fields reported as ranges are returned as `NA`:
as_numeric_ecotox(char_num, warn = FALSE)
#> [1] 10.000 2.000 3.000 5.000 9.200 2.330 2333.000 2.100
#> [9] NA 0.001
#> attr(,"has_notation")
#> [1] FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE
#> attr(,"has_brackets")
#> [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## Text fields reported as ranges are processed with `range_fun`
as_numeric_ecotox(char_num, range_fun = median)
#> [1] 10.000 2.000 3.000 5.000 9.200 2.330 2333.000 2.100
#> [9] 3.000 0.001
#> attr(,"has_notation")
#> [1] FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE
#> attr(,"has_brackets")
#> [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE