Build an SQLite database from zip archived tables downloaded from EPA website
Source:R/init.r
build_ecotox_sqlite.Rd
This function is called automatically after download_ecotox_data()
. The database
files can also be downloaded manually from the EPA website from which a local
database can be build using this function.
Usage
build_ecotox_sqlite(source, destination = get_ecotox_path(), write_log = TRUE)
Arguments
- source
A
character
string pointing to the directory path where the text files with the raw tables are located. These can be obtained by extracting the zip archive from https://cfpub.epa.gov/ecotox/ and look for 'Download ASCII Data'.- destination
A
character
string representing the destination path for the SQLite file. By default this isget_ecotox_path()
.- write_log
A
logical
value indicating whether a log file should be written in the destination pathTRUE
. The log contains information on the source and destination path, the version of this package, the creation date, and the operating system on which the database was created.
Details
Raw data downloaded from the EPA website is in itself not very efficient to work with in R. The files are large and would put a large strain on R when loading completely into the system's memory. Instead use this function to build an SQLite database from the tables. That way, the data can be queried without having to load it all into memory.
EPA provides the raw table from the ECOTOX database as text files with pipe-characters ('|') as table column separators. Although not documented, the tables appear not to contain comment or quotation characters. There are records containing the reserved pipe-character that will confuse the table parser. For these records, the pipe-character is replaced with a dash character ('-').
In addition, while reading the tables as text files, this package attempts to decode the text as UTF8. Unfortunately, this process appears to be platform-dependent, and may therefore result in different end-results on different platforms. This problem only seems to occur for characters that are listed as 'control characters' under UTF8. This will have consequences for reproducibility, but only if you build search queries that look for such special characters. It is therefore advised to stick to common (non-accented) alpha-numerical characters in your searches, for the sake of reproducibility.
Use 'suppressMessages()
' to suppress the progress report.
Examples
if (FALSE) { # \dontrun{
## This example will only work properly if 'dir' points to an existing directory
## with the raw tables from the ECOTOX database. This function will be called
## automatically after a call to 'download_ecotox_data()'.
test <- check_ecotox_availability()
if (test) {
files <- attributes(test)$files[1,]
dir <- gsub(".sqlite", "", files$database, fixed = T)
path <- files$path
if (dir.exists(file.path(path, dir))) {
## This will build the database in your temp directory:
build_ecotox_sqlite(source = file.path(path, dir), destination = tempdir())
}
}
} # }