S3 is where datom turns into shared, governed infrastructure. By the end of this article you can:
- Give a teammate read-only access to the project – your cloud admin
generates scoped credentials, the teammate passes them to
datom_store_s3(), and datom’s reader role works out of the box. - Register the project in a shared governance portfolio, discoverable by name across your organization.
Moving to S3 also lays the foundation for capabilities datom will integrate over time: access logs, automated retention rules, cross-account replication. None of that requires any code changes on your part – it follows from choosing object storage as the backend.
From your code, nothing changes: the same datom_write(),
datom_read(), and datom_history() calls you’ve
used so far.
Two ways to read this article
- Starting fresh on S3 – skip the promotion sections (labeled below); jump straight to Set up AWS credentials.
- Promoting an existing local-filesystem project to S3 – follow in order. You’ll snapshot the current data, retire the local project, then re-establish it on S3.
Both paths converge at Build the S3 store.
Where we left off (promotion path): STUDY-001 has four tables (
dm,ex,lb,ae), all in a local filesystem store. The data git repo is on GitHub. No governance layer is attached yet – articles 1-3 stayed deliberately local-only.
What promotion looks like today (promotion path only)
Starting fresh on S3? Skip to Set up AWS credentials.
A built-in, history-preserving migration
(datom_migrate_data()) is planned but not yet shipped.
Today, promoting a project means:
- Snapshot the current version of each table.
- Retire the local project (
datom_decommission()). - Initialize a new project on S3 with the same name.
- Re-write the snapshotted tables as version 1 on S3.
The trade-off is that per-table version history from the local era is not carried forward – only the latest version of each table is. For a study with a few months of extracts this is cheap; the git commit log preserves the narrative even when the data history restarts.
If preserving full per-version history across the move matters to you right now, the cleaner path is to start a new project directly on S3 (fresh-start path) and write your data there going forward.
Set up AWS credentials
datom_store_s3() takes access_key and
secret_key as plain strings. How you supply them is up to
you:
# Option A: inline (fine for interactive sessions; don't commit to git)
access_key <- "AKIAIOSFODNN7EXAMPLE"
secret_key <- "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY"
# Option B: environment variables (CI/CD, Docker)
access_key <- Sys.getenv("AWS_ACCESS_KEY_ID")
secret_key <- Sys.getenv("AWS_SECRET_ACCESS_KEY")
# Option C: keyring (recommended for interactive developer machines)
access_key <- keyring::key_get("AWS_ACCESS_KEY_ID")
secret_key <- keyring::key_get("AWS_SECRET_ACCESS_KEY")The rest of this article uses the keyring form as a placeholder – substitute whichever pattern fits your environment.
You will also need:
- A bucket you can read and write to. datom does not create buckets – bucket lifecycle (encryption, versioning, retention) is your organization’s policy domain, not datom’s.
- A prefix within the bucket. For raw clinical data
we recommend one bucket per study with an empty prefix
at the bucket root. Derived data products (ADaM, TLF) then live under
named prefixes (
adam/,tlf/) in the same bucket. See Buckets and Prefixes for the full convention and alternatives.
The full credential reference – including scoped reader credentials and how to handle assume-role flows – is in Credentials in Practice.
Resume the prior state (promotion path only)
Starting fresh on S3? Skip to Build the S3 store.
state <- source(
system.file("vignette-setup", "resume_article_4.R", package = "datom")
)$value
old_conn <- state$conn
dev_dir <- state$dev_dirold_conn is the local-backend conn from article 3. We’ll
use it to read the four current tables, then decommission it.
Snapshot the current data (promotion path only)
Before tearing anything down, capture the latest version of each table in memory:
snapshot <- list(
dm = datom_read(old_conn, "dm"),
ex = datom_read(old_conn, "ex"),
lb = datom_read(old_conn, "lb"),
ae = datom_read(old_conn, "ae")
)Decommission the local project (promotion path only)
datom_decommission() deletes the data GitHub repo,
clears the local clone and parquet store. It is
destructive and requires you to type the project name
as confirm to proceed.
datom_decommission(old_conn, confirm = "STUDY_001")
#> i Deleting data storage objects...
#> v Data storage objects deleted.
#> i Deleting GitHub repo "your-org/study-001-data"...
#> v Deleted GitHub repo "your-org/study-001-data".
#> i Removing local clone /tmp/.../study_001_dev...
#> v Removed local clone.
#> v Decommissioned "STUDY_001".Because no governance layer was attached, decommission is data-side only: nothing to unregister from gov.
Build the S3 store
Both paths resume here.
library(datom)
library(fs)
dev_dir <- path(tempdir(), "study_001_dev") # fresh local clone target
aws_data <- datom_store_s3(
bucket = "study-001-datom", # <-- one bucket per study (Pattern A)
prefix = "", # raw data at the bucket root
region = "us-east-1",
access_key = keyring::key_get("AWS_ACCESS_KEY_ID"),
secret_key = keyring::key_get("AWS_SECRET_ACCESS_KEY")
)
store <- datom_store(
governance = NULL, # no gov yet; attached below
data = aws_data,
github_pat = keyring::key_get("GITHUB_PAT")
)Initialize STUDY_001 on S3
datom_init_repo(
path = dev_dir,
project_name = "STUDY_001",
store = store,
create_repo = TRUE,
repo_name = "study-001-data"
)
conn <- datom_get_conn(path = dev_dir, store = store)
print(conn)
#> -- datom connection
#> * Project: "STUDY_001"
#> * Role: "developer"
#> * Backend: "s3"
#> * Root: "study-001-datom"
#> * Prefix: ""
#> * Governance: not attachedThe data backend is now "s3". Every
datom_write() will upload parquet to S3 and every
datom_read() will stream it back from S3.
Write your first tables
Promotion path: re-write the snapshotted tables as version 1 on S3.
# Promotion path only -- snapshot was captured above
datom_write(conn, snapshot$dm, "dm",
message = "Re-establish dm on S3 (was local through 2026-03-28)")
datom_write(conn, snapshot$ex, "ex",
message = "Re-establish ex on S3 (was local through 2026-03-28)")
datom_write(conn, snapshot$lb, "lb",
message = "Re-establish lb on S3 (was local through 2026-03-28)")
datom_write(conn, snapshot$ae, "ae",
message = "Re-establish ae on S3 (was local through 2026-03-28)")Per-table version history from the local era is not carried forward – the commit messages above are your audit trail.
Fresh-start path: write the first extract directly.
# Fresh-start path only -- use the built-in example data
cutoff <- "2026-01-28"
datom_write(conn, datom_example_data("dm", cutoff_date = cutoff), "dm",
message = paste("dm: first extract, cutoff", cutoff))
datom_write(conn, datom_example_data("ex", cutoff_date = cutoff), "ex",
message = paste("ex: first extract, cutoff", cutoff))
datom_write(conn, datom_example_data("lb", cutoff_date = cutoff), "lb",
message = paste("lb: first extract, cutoff", cutoff))
datom_write(conn, datom_example_data("ae", cutoff_date = cutoff), "ae",
message = paste("ae: first extract, cutoff", cutoff))Attach the governance layer
Now that STUDY-001 lives somewhere a teammate can reach, register it in a shared governance repo. Governance is a two-step setup:
-
Once per organization:
datom_init_gov()creates the gov GitHub repo and seeds the skeleton (projects/directory, README, etc.). Run this the very first time anyone in your org adopts governance. -
Once per project:
datom_attach_gov()records STUDY-001’s data location inprojects/STUDY_001/ref.jsonand updatesproject.yamlso any future conn from this clone knows where gov lives.
gov_store <- datom_store_s3(
bucket = "acme-datom-gov", # <-- one dedicated gov bucket per organization
prefix = "", # dedicated bucket -> empty prefix
region = "us-east-1",
access_key = keyring::key_get("AWS_ACCESS_KEY_ID"),
secret_key = keyring::key_get("AWS_SECRET_ACCESS_KEY")
)
gov_dir <- path(tempdir(), "datom-governance") # explicit local path
# Step 1: seed the gov repo (once per organization)
gov_repo_url <- datom_init_gov(
gov_store = gov_store,
gov_local_path = gov_dir,
create_repo = TRUE,
repo_name = "datom-governance",
github_pat = keyring::key_get("GITHUB_PAT")
)
#> v Created gov GitHub repo `datom-governance`
#> v Seeded skeleton (projects/, README.md)
#> v Pushed initial commit
# Step 2: attach this project to gov (once per project)
conn <- datom_attach_gov(
conn = conn,
gov_store = gov_store,
gov_repo_url = gov_repo_url,
gov_local_path = gov_dir
)
#> v Registered STUDY_001 in governance
#> v Updated project.yaml with governance pointer
print(conn)
#> -- datom connection
#> * Project: "STUDY_001"
#> * Role: "developer"
#> * Backend: "s3"
#> * Root: "study-001-datom"
#> * Prefix: ""
#> * Gov backend: "s3"
#> * Gov root: "acme-datom-gov"Once attached, gov cannot be detached – project.yaml’s
storage.governance block is permanent. Subsequent projects
in the same organization reuse the same gov repo and bucket; you only
run create_repo = TRUE once.
Confirm
datom_list(conn)
#> name current_version current_data_sha last_updated
#> 1 ae 19f44e3a e91d04ff 2026-04-29T...
#> 2 dm 8a3b21cc c2e80a14 2026-04-29T...
#> 3 ex 5d72e0f1 88a73e02 2026-04-29T...
#> 4 lb c1ffea90 4c3812dd 2026-04-29T...Where you are
- STUDY_001 lives on S3. Your local clone is just a working copy of the git metadata.
- Governance is attached: STUDY_001 is registered in a shared gov repo and gov bucket. Future projects in the same organization reuse both.
-
ref.jsonin the gov repo points at the S3 bucket; any teammate who clones the gov repo and has S3 read credentials can discover and read the data.
In the next article, you hand the project off to a statistician who needs to read the data without write access – the canonical reader role.
Teardown
Skip this if you plan to continue to the next article – the S3 project and gov registration are the starting state for Article 5: Handing Off.
If you want to clean up:
# Remove all datom artefacts for this project (S3 data, GitHub data repo,
# local clone, gov storage prefix, gov git registration).
datom_decommission(conn, confirm = "STUDY_001")After decommission, the only remaining artefact is the governance infrastructure itself (gov GitHub repo and gov S3 bucket root). These are shared across projects and datom does not destroy them automatically. Delete them manually once you are done with all gov-dependent articles:
# Delete the gov GitHub repo
system2("gh", c("repo", "delete", "your-org/datom-governance", "--yes"))The gov S3 bucket content was already removed by
datom_decommission(). If the bucket is otherwise empty,
delete it via your AWS console or S3 management tooling.
````