Extends the mlr3 package with a data backend to transparently work with databases. Two additional backends are currently implemented:

  • DataBackendDplyr: Relies internally on the abstraction of dplyr and dbplyr.
  • DataBackendDuckDB: Connector to duckdb.


You can install the released version of mlr3db from CRAN with:

And the development version from GitHub with:

# install.packages("devtools")



# Create a classification task:
task = tsk("spam")

# Convert the task backend from a data.table backend to a DuckDB backend.
# By default, a temporary directory is used to store the database files.
# Note that the in-memory data is now used anymore, its memory will get freed
# by the garbage collector.
task$backend = as_duckdb_backend(task$backend)

# The requested data will be queried from the database in the background:
learner = lrn("classif.rpart")
ids = sample(task$row_ids, 3000)
learner$train(task, row_ids = ids)