Getting Started

rsynthbio is an R package that provides a convenient interface to the Synthesize Bio API, allowing users to generate realistic gene expression data based on specified biological conditions. This package enables researchers to easily access AI-generated transcriptomic data for various modalities including bulk RNA-seq and single-cell RNA-seq.

Alternatively, you can AI generate datasets from our web platform.

How to install

You can install rsynthbio from CRAN:

install.packages("rsynthbio")

If you want the development version, you can install using the remotes package to install from GitHub:

if (!("remotes" %in% installed.packages())) {
  install.packages("remotes")
}
remotes::install_github("synthesizebio/rsynthbio")

Once installed, load the package:

library(rsynthbio)

Authentication

Before using the Synthesize Bio API, you need to set up your API token. The package provides a secure way to handle authentication:

# Securely prompt for and store your API token
# The token will not be visible in the console
set_synthesize_token()

# You can also store the token in your system keyring for persistence
# across R sessions (requires the 'keyring' package)
set_synthesize_token(use_keyring = TRUE)

Loading your API key for a session.

# In future sessions, load the stored token
load_synthesize_token_from_keyring()

# Check if a token is already set
has_synthesize_token()

You can manually set the token, but don’t commit it to version control!

set_synthesize_token(token = "your-token-here")

You can obtain an API token by registering at Synthesize Bio.

Available Model Types

Synthesize Bio provides several types of models for different use cases:

Baseline Models

Generate synthetic gene expression data from metadata alone. You describe the biological conditions (tissue type, disease state, perturbations, etc.) and the model generates realistic expression profiles.

See the Baseline Models vignette for detailed usage.

Reference Conditioning Models

Generate expression data conditioned on a real reference sample. This allows you to “anchor” to an existing expression profile while applying perturbations or modifications.

See the Reference Conditioning vignette for detailed usage.

Metadata Prediction Models

Infer metadata from observed expression data. Given a gene expression profile, predict the likely biological characteristics (cell type, tissue, disease state, etc.).

See the Metadata Prediction vignette for detailed usage.

Only baseline models are available to all users. You can check which models are available programmatically, use list_models(). Contact us at if you have any questions.

Listing Available Models

You can check which models are available programmatically:

# Check available models
list_models()

Quick Start

Here’s a quick example using a baseline model:

# Get an example query structure
query <- get_example_query(model_id = "gem-1-bulk")$example_query

# Submit the query and get results
result <- predict_query(query, model_id = "gem-1-bulk")

# Access the results
metadata <- result$metadata
expression <- result$expression

For more detailed examples and advanced usage, see the model-specific vignettes linked above.

Session info

sessionInfo()

Additional Resources