Where we left off: STUDY-001 is on S3. The statistician can read.
The study has grown enough that you bring in a second data engineer. Unlike the statistician, the second engineer needs to write – to register the month-4 lab extract while you’re on vacation.
This article introduces datom_clone() (the way a new
engineer enters an existing project) and demonstrates the
pull-before-push discipline that keeps two engineers
from clobbering each other.
This article assumes you’ve completed Article 4. Resume the prior state if needed:
state <- source(
system.file("vignette-setup", "resume_article_6.R", package = "datom")
)$value
conn <- state$conn # original engineer
dev_dir <- state$dev_dirThe second engineer clones
The second engineer (call her Bea) sets up her credentials the same way you did:
keyring::key_set("GITHUB_PAT")
keyring::key_set("AWS_ACCESS_KEY_ID")
keyring::key_set("AWS_SECRET_ACCESS_KEY")Bea then clones into a fresh directory on her laptop. She uses the engineer store (read+write):
library(datom)
library(fs)
bea_dir <- path(tempdir(), "bea_study_001_data") # different from yours
bea_store <- datom_store(
governance = state$gov_component, # same gov as you
data = state$data_s3, # same bucket as you
github_pat = keyring::key_get("GITHUB_PAT"),
data_repo_url = state$data_repo_url,
gov_repo_url = state$gov_repo_url,
gov_local_path = path(tempdir(), "bea_gov_clone")
)
bea_conn <- datom_clone(path = bea_dir, store = bea_store)
print(bea_conn)
#> -- datom connection
#> * Project: "STUDY_001"
#> * Role: "developer"
#> * Backend: "s3"datom_clone() does three things in order: clones the
data git repo into bea_dir, clones the gov repo (if not
already present), and returns a developer conn. Bea now sees the same
four tables you do.
datom_list(bea_conn)
#> name current_version current_data_sha last_updated
#> 1 ae 19f44e3a e91d04ff 2026-04-29T...
#> 2 dm 8a3b21cc c2e80a14 2026-04-29T...
#> 3 ex 5d72e0f1 88a73e02 2026-04-29T...
#> 4 lb c1ffea90 4c3812dd 2026-04-29T...Concurrent writes
You and Bea both have month-4 extracts in your inboxes. You write
lb:
# In your session
lb_m4 <- datom_example_data("lb", cutoff_date = "2026-04-28")
datom_write(conn, lb_m4, "lb",
message = "lb extract through 2026-04-28")
#> v Wrote "lb" (full): "..."At the same time, Bea writes ae from her clone:
# In Bea's session
ae_m4 <- datom_example_data("ae", cutoff_date = "2026-04-28")
datom_write(bea_conn, ae_m4, "ae",
message = "ae extract through 2026-04-28")What happens to Bea’s write? It depends on which one of you committed first. The second commit fails on push:
#> x Failed to push to origin/main: rejected (non-fast-forward)
#> i Run datom_pull(conn) to incorporate origin's changes, then retry the write.This is the same conflict signature any git user has seen. datom’s response is the standard one: pull, then retry.
Pull, then retry
Bea runs datom_pull() to bring her clone up to date with
your lb write:
datom_pull(bea_conn)
#> i Fetching from origin...
#> v 1 commit pulled.
#> v Manifest updated: lb -> new versionNow Bea retries her ae write. Because ae is
a different table, there’s no conflict at the data level – the two
writes commute, and after the pull they layer cleanly:
datom_write(bea_conn, ae_m4, "ae",
message = "ae extract through 2026-04-28")
#> v Wrote "ae" (full): "..."If Bea had been writing lb (the same table you wrote),
her retry would produce a third version of
lb – datom does not have a same-table merge concept,
because there’s no semantic merge of two parquet files. Each
datom_write() is a snapshot; the version order tells you
who wrote when.
Push discipline
The rule for two-engineer projects:
- Pull before each writing session, not just after a conflict. A clean pull at the top of the day prevents most conflicts.
- Pull again before push if the writing session was long. Conflicts detected at push time are the more annoying kind.
- Communicate which tables you’re touching when working in parallel. datom doesn’t have row-level locks; it’s still a human coordination problem, just one that’s now mechanically enforced.
Where you are
- Two engineers, one project, on the same S3 store.
- Concurrent writes to different tables are conflict-free after a pull.
- Concurrent writes to the same table produce sequential versions, ordered by who pushed first.
- Bea has the same capabilities you do – there is no “primary” engineer; the project is shared.
The user-journey track continues with the manager view in Governing a Study Portfolio, where STUDY-002 joins your gov repo and you start caring about the registry shape across projects.
````