Major API change
extract_ functions are now glean_.
tidyverse is loaded after
ohvbd, there are no direct namespace collisions.Full list of function name changes:
extract() -> glean()extract_ad() -> glean_ad()extract_gbif() -> glean_gbif()extract_vd() -> glean_vd()extract_vt() -> glean_vt()fetch_extract_vd_chunked() ->
fetch_glean_vd_chunked()fetch_extract_vt_chunked() ->
fetch_glean_vt_chunked()New functions & arguments:
ohvbd now interfaces with GBIF for occurrence data.
*_gbif functions (e.g. fetch_gbif())
allow for retrieving and extracting data from GBIF.rgbif package are required to
retrieve data from GBIF.tee() command allows one to extract data from the
middle of a pipeline and save it to an environment.
ohvbd workflows,
and can be used in any base R pipeline (|>). It has not
been tested in magrittr pipelines but should work as-is.filter_db() command allows for filtering out of
only one database’s results from hub searches.check_db_status() now returns (invisibly) whether all
databases are up or not.fetch_citation() and fetch_citation_*
commands provide an interface to attempt to retrieve citations from a
vectorbyte dataset.
force_db() function enables one to force
ohvbd to consider a particular object as having a
particular provenance.simplify argument to search_hub()
makes hub searches return an ohvbd.ids object if only one
database was searched for. This behaviour is on by default.
filter_db() will now transparently
return ohvbd.ids objects if it gets them.taxonomy argument to search_hub()
allows for filtering searches by GBIF backbone IDs.match_species() function allows for quick and
flexible matching of species names to their GBIF backbone IDs.match_country() function allows for matching of
country names to WKT polygons via naturalearth.ohvbd_db(), has_db(), and
is_from() functions allow for quick testing of object
provenance (according to ohvbd).get_default_ohvbd_cache() function allows for
custom functions that interface with cached ohvbd data
files.list_ohvbd_cache() and
clean_ohvbd_cache() functions enable better interactive
cache management.
clean_ad_cache() has been removed as it is
now unnecessary.search_x_smart() functions can now take
"tags" as a search field, enabling support for tagged
datasets.Other:
\dontrun{} so they
should be runnable from an installed version of the package.ohvbd is now
covered with unit tests (using the vcr package).fetch_vd() no longer tries to retrieve ids with no
pages of data.set_ohvbd_compat() as unexpected
SSL errors should break pipelines by default.
fetch() on an ohvbd.hub.search or
glean() on an ohvbd.ids object now provides a
hint that you may have forgotten something.
fetch() command
and run search_hub() |> glean() which didn’t previously
give an interpretable error.vcr to massively reduce their build
time. This should only matter to developers of ohvbd, or
users who download from github and build the vignettes themselves.ohvbd.ids() now warns you and fixes the problem if you
provide ids with duplicate values.glean_vt() and glean_vd() now force the
inclusion of the dataset ID when filtering columns (using the
cols argument).
glean_ad() now correctly returns a matrix even when
there is only 1 row or column.fetch_vd_counts() is now significantly faster, more
robust, and temporarily caches data.
fetch_vd() under the hood,
particularly if you are running it multiple times in a day.fetch_ad() for
metrics and search_vt_smart() for operators and fields) is
now fuzzy, allowing for a small amount of deviation from the actual term
name.assoc_ad() now tries to guess LatLong column names if
none (or the wrong ones) are provided.NULL rather than
NA for default missing values (except date arguments to
AD-related functions, where NA is more reasonable in the grand
scheme).fetch_ad() now caches and tries to read from cache by
default.
refresh_cache = TRUE or use_cache = FALSE
(depending on if you want to replace your existing cache or not).search_hub() function enables searching across
multiple databases at once via vbdhub.
generate_vt_template() which quickly
generates a VecTraits template for later upload.ohvbd now only uses base R pipes
(|>).%>%) is no longer used
internally, nor is it exported for use.httr2 v1.1.1 deprecated the pool argument
of req_perform_parallel() which broke fetch()
commands across ohvbd.
max_active
argument, which does simplify everything a bit.httr2 to
be v1.1.1.fetch_ad() now searches for and retrieves the most
up-to-date GID2 files from AREAdata.timeout parameter of fetch_ad() to
control timeouts of AD downloads. Defaults to 4 minutes.assoc_ad() now correctly extracts data (this
functionality regressed in 0.5.0 as a consequence of the new dynamic
method dispatch approach to data retrieval).assoc_ad() also gives now consistent output even when a
1-dimensional output is returned from extract_ad()fetch_ functions now have a default
connections argument of 2, leading to faster retrieval
across the board.check_src argument has been removed from all functions.
It no longer serves much of a purpose due to the sanity checking changes
implemented in 0.5.0.fetch_vd() now correctly returns all data from datasets
over 50 rows.fetch_vd() also now tells you how much data you are
retrieving and a coarse estimate of how long this will
take.fetch_vd_counts() allows for quick
checking of dataset sizes. This is very important as some datasets in
VecDyn are over 40,000 rows long!fetch_ functions (and thus also
fetch()) now use parallel data retrieval, even when only 1
connection is used. This seems to lead to a 20% gain in download speed
for no cost.get_ functions have been split into two new types of
function, based upon exact usage.
find_ functions retrieve metadata such as column
definitions and ids.fetch_ functions retrieve actual datasets.ohvbd.ids,
ohvbd.responses, ohvbd.data.frame,
ohvbd.ad.matrix) to allow for nicer checks of data
integrity.
fetch_ functions when indexing the
output of find_x_ids() functions.fetch() and
extract() leverage dynamic method dispatch along with the
above classes to infer the correct underlying fetch_ and
extract_ functions to use.
find_vt_ids() |> fetch() |> extract() without having
to remember the correct extractor to use.ohvbd.data.frame in the same way as just subsetting a
normal df).ohvbd.ids() allows users to create objects
of the same S3 class as output by the find_ and
search_ functions.is_cached() function enables a simple check to see
if an object has been loaded from the cache by ohvbd.get_ad() -> fetch_ad()get_extract_vd_chunked() ->
fetch_extract_vd_chunked()get_extract_vt_chunked() ->
fetch_extract_vt_chunked()get_gadm_sfs() -> fetch_gadm_sfs()get_vd() -> fetch_vd()get_vt() -> fetch_vt()get_vd_columns() ->
find_vd_columns()get_vd_current_ids() ->
find_vd_ids()get_vt_current_ids() ->
find_vt_ids()check_ohvbd_config() allows easy printing
of the current status of ohvbd’s options.clean_ad_cache() function enables users to clean
their cached AREAdata files easily.tools::R_user_dir()).use-areadata vignette now has part of its content
complete.get_ and search_
function error handlingset_ohvbd_compat() if these are detected.get_ calls requesting more than 10 ids run a
pre-flight ssl check before attempting the whole thing.get_vd() and get_vt() now also return a
list of ids that were missing and any curl errors that were found in the
process of trying to get data.set_ohvbd_compat() now asks for user confirmation in
interactive mode. This makes running on linux a little annoying, but is
worth it due to the seriousness of disabling SSL identity
verification.retrieving-data vignette now only enables compatibility
mode if running under linux. Generally it is best to keep package usage
of set_ohvbd_compat() to an absolute minimum.get_x() and
get_extract_x() functions.check_src allows for toggling of id-sanity
checking for most functions.retrieving-data vignette now contains instructions for
the use of search_x_smart().get_x_byid() -> get_x()extract_x_data() -> extract_x()assoc_x_y() -> assoc_x()get_extract_x_byid_chunked() ->
get_extract_x_chunked()verb_target_modifier().get_x_y() functions always retrieve data
from database x with y specifying any special
type of data.extract_x() functions always extract
data.get_extract_x_chunked()get_vd_current_ids() |> get_vt().format_time_overlap_bar() allows for
visually formatting a range of dates combined with another set of target
dates to see where overlaps do or do not take place.extract_ad() however it can also be used independently. It
was designed to fill a more general role within UI design using the cli
package, and should be usable (or hackable) by others needing the same
tool.extract_ad() now errors when all
targetdate entries are outside of the range of the AREAdata
dataset.assoc_ad() associates arbitrary data including
lon/lat columns with AREAdata.get_vd_columns() provides quick reference about the
currently present VecDyn columns. (This is currently not possible
for VecTraits, but the feasibility is being investigated.)assoc_gadm() function associates gadm ids at all
spatial scales with arbitrary data that include lon/lat columns.*_basereq() calls are no longer required as the first
argument for functions.vb_basereq() |>.basereq argument of these functions, which can be
generated using vb_basereq().unsafe = TRUE for vb_basereq().set_ohvbd_compat().extract_ad() now allows targetdate to be
specified as a vector of full dates,
e.g. c("2023-08-04", "2023-09-21").cli package to provide
a nicer cli interface.cli package to provide a
nicer cli interface.cli package to provide a
nicer cli interface.retrieving-data vignette now builds significantly
quicker.get_ad() now caches data from AREAdata to reduce
extraneous data downloading and speed up re-execution and
development.use_cache=TRUE and
caches by default in the user directory.check_db_status() allows for easy checking of the
online status of various data providers.ohvbd now interfaces with the AREAdata repository for
historical climate data.retrieving-data vignette
(courtesy of @willpearse).retrieving-data vignette to explain the basic
process of downloading and extracting data from Vectraits and
VecDyn.search_vd() and search_vd_smart() now
allow for searching of VecDyn in the same manner as for VecTraits.get_vb_basereq() renamed to vb_basereq()
for ease of writing.ohvbd now interfaces with the VecDyn database for
vector population dynamic data.vd replacing vt in the
function names (e.g. get_vd())search_vt() allows for keyword-based searching of
VecTraits.get_vt_current_ids() now handles 404 responses
gracefully.search_vt_smart() allows for field-based searching
of VecTraits.get_vt() now leverages
httr2::req_perform_sequential() for more efficient dataset
retrieval.