Getting Started with qryflow

library(qryflow)

What is `qryflow`?

qryflow lets you write multi-step SQL workflows in plain .sql files and run them from R with a single function call. Specially formatted tags tell R how to execute each SQL chunk and what to name the results. This allows you to:

Keep multiple SQL statements in the same file.
Control how each SQL “chunk” is executed.
Return results as named R objects.
Pass metadata that can be used later in R workflows

In short: You can define and run multi-step SQL workflows with one function call, and get your results back as a structured R object.

Basic usage

The main function is qryflow, which accepts SQL tagged with special comments and a connection to DBI-compliant database. Note, the SQL can be a character vector, like in the example below, or a filepath to a file that contains SQL.

# Connection to In-Memory DB with table populated from mtcars
con <- example_db_connect(mtcars)

sql <- "
-- @exec: drop_cyl_6
DROP TABLE IF EXISTS cyl_6;

-- @exec: prep_cyl_6
CREATE TABLE cyl_6 AS
SELECT *
FROM mtcars
WHERE cyl = 6;

-- @query: df_cyl_6
SELECT *
FROM cyl_6;
"

# Pass tagged SQL to `qryflow`
results <- qryflow(con, sql, verbose = TRUE)
#> Running 3 chunks
#> [1/3] drop_cyl_6 [exec]
#>       ✓ success  0s
#> [2/3] prep_cyl_6 [exec]
#>       ✓ success  0s
#> [3/3] df_cyl_6 [query]
#>       ✓ success  0s
#> Done in 0s — 3 success, 0 error, 0 skipped

# Access the results from the chunk named `df_cyl_6`
head(results$df_cyl_6)
#>    mpg cyl  disp  hp drat    wt  qsec vs am gear carb
#> 1 21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
#> 2 21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
#> 3 21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
#> 4 18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
#> 5 19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4
#> 6 17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4

By default, the package supports @exec tags, which are executed with DBI::dbExecute() and @query tags, which are executed with DBI::dbGetQuery().

When you run qryflow():

The SQL script is split into chunks using tag lines like -- @query: df_mtcars.
Each chunk is assigned a type (e.g., query or exec)
Chunks are executed in order, using the associated execution type
The results are returned as named objects

Defining a Chunk

In qryflow, a chunk is a grouped section of SQL code, representing a single executable unit within a larger multi-step SQL workflow, and preceded by one or more tag lines (e.g., the pattern -- @<tag>: <value>).

Tagged lines act as markers that start a new chunk.
All lines (comments and SQL) immediately following a contiguous group of tagged lines belong to that chunk until another tag line starts the next chunk.
If the script has no tags, the entire script is treated as one single chunk.

Tags and Aliases

Each SQL chunk must be tagged with a type so qryflow knows how to execute it. If a chunk is not provided with a tag, the qryflow engine will use the value of the default_type argument, which can be provided directly or set with getOption("qryflow.default_type", "query"). It defaults to “query”, as getting data out is the most common use case.

Tags use SQL-style comments (--) and follow the format:

-- @<tag>: <value>

Important Tags

Each chunk should have both a name (the name of the object when returned to R) and a type (execution mode for the chunk). Users can set these explicitly with the following tags:

@type apecifies execution type (-- @type: query)
@name assigns a name to the chunk’s result (-- @name: df_users)

For registered types, users can use shorthand to supply both name and type in one line. For example, @query and @exec are aliases for setting both @type and @name in one line.

Aliased form (preferred):

-- @query: df_mtcars
SELECT *
FROM mtcars;

Explicit form (equivalent):

-- @type: query
-- @name: df_mtcars
SELECT *
FROM mtcars;

Type Identification

During parsing, qryflow determines its type using the following rules:

If a chunk includes an explicit -- @type: tag, that value is used as the chunk type.
If there is no @type: tag, qryflow checks for any other tag that matches a registered type (@query:, @exec:, etc.) . The first match found is used as the type.
If no recognized tag is found, the type defaults to the value of getOption("qryflow.default_type", "query").

Passing Additional Tags

You can include additional tags to carry metadata into your R workflow, that follow the tagging structure:

-- @exec: df_mtcars
-- @src: dbo.mtcars
-- @topic: cars
SELECT *
FROM mtcars;

Important Arguments

on_error

The on_error argument controls what happens when a single chunk fails:

"stop" (default): halts execution immediately and raises an error.
"warn": records the error and signals a warning, but continues running remaining chunks.
"collect": silently collects all errors across all chunks and raises a single combined error at the end.

# on_error = "stop" (default): halts on first failure
bad_sql <- "
-- @exec: prep_cyl_6
CREATE TABLE cyl_6 AS SELECT * FROM mtcars WHERE cyl = 6;

-- @query: df_missing
SELECT * FROM nonexistent_table;

-- @query: df_mtcars
SELECT * FROM mtcars;
"

qryflow(con, bad_sql, on_error = "stop")
#> Error:
#> ! Chunk 'prep_cyl_6' failed with message: 'table cyl_6 already exists'

# Warn collects errors and signals a warning
qryflow(con, bad_sql, on_error = "warn")
#> Warning: table cyl_6 already exists
#> Warning: no such table: nonexistent_table
#> $prep_cyl_6
#> NULL
#> 
#> $df_missing
#> NULL
#> 
#> $df_mtcars
#>     mpg cyl  disp  hp drat    wt  qsec vs am gear carb
#> 1  21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
#> 2  21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
#> 3  22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
#> 4  21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
#> 5  18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2
#> 6  18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
#> 7  14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4
#> 8  24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
#> 9  22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2
#> 10 19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4
#> 11 17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4
#> 12 16.4   8 275.8 180 3.07 4.070 17.40  0  0    3    3
#> 13 17.3   8 275.8 180 3.07 3.730 17.60  0  0    3    3
#> 14 15.2   8 275.8 180 3.07 3.780 18.00  0  0    3    3
#> 15 10.4   8 472.0 205 2.93 5.250 17.98  0  0    3    4
#> 16 10.4   8 460.0 215 3.00 5.424 17.82  0  0    3    4
#> 17 14.7   8 440.0 230 3.23 5.345 17.42  0  0    3    4
#> 18 32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1
#> 19 30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2
#> 20 33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1
#> 21 21.5   4 120.1  97 3.70 2.465 20.01  1  0    3    1
#> 22 15.5   8 318.0 150 2.76 3.520 16.87  0  0    3    2
#> 23 15.2   8 304.0 150 3.15 3.435 17.30  0  0    3    2
#> 24 13.3   8 350.0 245 3.73 3.840 15.41  0  0    3    4
#> 25 19.2   8 400.0 175 3.08 3.845 17.05  0  0    3    2
#> 26 27.3   4  79.0  66 4.08 1.935 18.90  1  1    4    1
#> 27 26.0   4 120.3  91 4.43 2.140 16.70  0  1    5    2
#> 28 30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2
#> 29 15.8   8 351.0 264 4.22 3.170 14.50  0  1    5    4
#> 30 19.7   6 145.0 175 3.62 2.770 15.50  0  1    5    6
#> 31 15.0   8 301.0 335 3.54 3.570 14.60  0  1    5    8
#> 32 21.4   4 121.0 109 4.11 2.780 18.60  1  1    4    2

# on_error = "collect": runs everything, then reports all failures together
qryflow(con, bad_sql, verbose = TRUE, on_error = "collect")
#> Running 3 chunks
#> [1/3] prep_cyl_6 [exec]
#>       ✗ error    0s
#> [2/3] df_missing [query]
#>       ✗ error    0s
#> [3/3] df_mtcars [query]
#>       ✓ success  0s
#> Done in 0s — 1 success, 2 error, 0 skipped
#> Error:
#> ! 2 chunks failed:
#>   - prep_cyl_6: table cyl_6 already exists
#>   - df_missing: no such table: nonexistent_table

verbose

By default, qryflow is quiet. However, for long running queries with multiple chunks, you may want feedback on which chunks are currently running. You can use verbose = TRUE to get updates during execution.

simplify

When simplify = TRUE, in the case where there is only one chunk, qryflow() will return a single object (as opposed to a named list of results). For example:

sql1 <- "
-- @query: df_mtcars
SELECT *
FROM mtcars;
"

sql2 <- "
-- @query: df_mtcars
SELECT *
FROM mtcars;

-- @query: df_mtcars_cyl6
SELECT *
FROM mtcars
WHERE cyl = 6;
"

# Pass tagged SQL to `qryflow`
res1 <- qryflow(con, sql1, simplify = TRUE)
res2 <- qryflow(con, sql2, simplify = TRUE)
res3 <- qryflow(con, sql1, simplify = FALSE)

class(res1) # simplifies the result to the single data.frame() because only one chunk
#> [1] "data.frame"
class(res2) # returns named list
#> [1] "list"
class(res3) # returns named list, because simplify = FALSE
#> [1] "list"

This design choice is to facilitate easy interactive use and is a common use-case. Because qryflow() might return a named list or a single data.frame depending on the input, the qryflow package exports other functions so users can prioritize reliability in return objects. The next section explores functions like qryflow_run() and qryflow_results() further.

The Core API

While qryflow() covers most use cases, users who want more control and consistency may prefer to use the functions that qryflow() leverages:

qryflow_run()
qrflow_results()
qryflow_parse()
qryflow_execute

`qryflow_run()` and `qryflow_results()`

qryflow_run() performs parsing and execution, returning a full qryflow object - including all chunk metadata, not just the query results.

obj <- qryflow_run(con, sql)

# A qryflow object
class(obj)
#> [1] "qryflow"

# Chunk names are top-level list names
names(obj)
#> [1] "drop_cyl_6" "prep_cyl_6" "df_cyl_6"

obj # Print Method
#> ── qryflow ───────────────────────────────────────────────────────────────────── 
#>   chunks: 3 | status: success | duration: 0s 
#> 
#>   drop_cyl_6  [exec ]  ✓ success  0s
#>   prep_cyl_6  [exec ]  ✓ success  0s
#>   df_cyl_6    [query]  ✓ success  0s

Each element is a qryflow_chunk:

class(obj$df_cyl_6)
#> [1] "qryflow_chunk"

# Print the chunk
obj$df_cyl_6
#> ── qryflow_chunk: df_cyl_6 ───────────────────────────────────────────────────── 
#>   type: query | ✓ success | duration: 0s
#> 
#>   SELECT *
#>   FROM cyl_6;

To extract only the query results (equivalent to what qryflow() returns), use qryflow_results():

results <- qryflow_results(obj)
class(results$df_cyl_6)
#> [1] "data.frame"
head(results$df_cyl_6)
#>    mpg cyl  disp  hp drat    wt  qsec vs am gear carb
#> 1 21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
#> 2 21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
#> 3 21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
#> 4 18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
#> 5 19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4
#> 6 17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4

`qryflow_parse()` and `qryflow_execute()`

For even more control, you can parse and execute separately:

# Step 1: Parse the SQL into structured chunks
filepath <- example_sql_path()
workflow <- qryflow_parse(filepath)

class(workflow)
#> [1] "qryflow"
length(workflow)
#> [1] 4
names(workflow)
#> [1] "drop_cyl_6" "prep_cyl_6" "df_mtcars"  "df_cyl_6"

# Inspect a chunk before execution
workflow$df_mtcars
#> ── qryflow_chunk: df_mtcars ──────────────────────────────────────────────────── 
#>   type: query |   pending  
#> 
#>   SELECT *
#>   FROM mtcars;

Each qryflow_chunk contains:

$type: the execution type (e.g., "query")
$name: the chunk name
$sql: the SQL body
$tags: any additional tags
$results: NULL before execution; populated after

# Step 2: Execute the parsed workflow
executed <- qryflow_execute(con, workflow)

class(executed)
#> [1] "qryflow"
names(executed)
#> [1] "drop_cyl_6" "prep_cyl_6" "df_mtcars"  "df_cyl_6"
executed
#> ── qryflow ───────────────────────────────────────────────────────────────────── 
#>   chunks: 4 | status: success | duration: 0s 
#> 
#>   drop_cyl_6  [exec ]  ✓ success  0s
#>   prep_cyl_6  [exec ]  ✓ success  0s
#>   df_mtcars   [query]  ✓ success  0s
#>   df_cyl_6    [query]  ✓ success  0s

Metadata

Both the worfklow object (qryflow) and the chunk objects (qryflow_chunk) store metadata about the execution. You can access this information with the qryflow_meta() function:

qryflow_meta(executed) # The whole workflow
#> $source
#> [1] "-- @exec: drop_cyl_6\nDROP TABLE IF EXISTS cyl_6;\n\n-- @exec: prep_cyl_6\nCREATE TABLE cyl_6 AS\nSELECT *\nFROM mtcars\nWHERE cyl = 6;\n\n-- @query: df_mtcars\nSELECT *\nFROM mtcars;\n\n-- @query: df_cyl_6\nSELECT *\nFROM cyl_6;\n"
#> 
#> $start_time
#> [1] "2026-06-08 14:41:14 UTC"
#> 
#> $end_time
#> [1] "2026-06-08 14:41:14 UTC"
#> 
#> $duration
#> [1] 0.001773834
#> 
#> $status
#> [1] "success"

qryflow_meta(executed[[1]]) # The whole chunk
#> $source
#> [1] "-- @exec: drop_cyl_6\nDROP TABLE IF EXISTS cyl_6;\n"
#> 
#> $start_time
#> [1] "2026-06-08 14:41:14 UTC"
#> 
#> $end_time
#> [1] "2026-06-08 14:41:14 UTC"
#> 
#> $duration
#> [1] 0.0004479885
#> 
#> $status
#> [1] "success"

Summary

Function	What it does
`qryflow()`	Parse + execute + return query results.
`qryflow_run()`	Parse + execute, returning a full `qryflow` object with metadata.
`qryflow_results()`	Extract query results from a `qryflow` object.
`qryflow_parse()`	Parse SQL into structured `qryflow` object - No execution.
`qryflow_execute()`	Execute a parsed `qryflow` object against a connection.
`qryflow_meta()`	Access metadata (status, duration, timing) on a workflow or chunk.

For a guide on registering custom chunk types and extending qryflow’s behaviour, see vignette("extend-qryflow", package = "qryflow").

Examples

Example 1 - Script with no tags

CREATE TABLE cyl_6 AS
SELECT *
FROM mtcars
WHERE cyl = 6;

Result - The entire script is one chunk containing all lines.

Why? Without tags, qryflow treats the whole script as a single step.

Example 2 - Script with one tag at the start

-- @query: get_6cyl
SELECT *
FROM mtcars
WHERE cyl = 6;

Result - One chunk starting at the tag, containing the rest of the script.

Because the tag is at line 1, the chunk starts there and continues to the end.

Example 3 - Script with one tag in the middle

SELECT *
FROM mtcars
WHERE cyl = 6;

-- @query: df_mtcars
SELECT *
FROM mtcars;

Result - Two chunks:

Chunk 1: lines before the tag (untagged SQL).
Chunk 2: from the tag line to the end.

This preserves any pre-tag SQL as a separate chunk.

Example 4 - Script with multiple tags

-- @exec: drop_cyl_6
DROP TABLE IF EXISTS cyl_6;

-- @exec: prep_cyl_6
CREATE TABLE cyl_6 AS
SELECT *
FROM mtcars
WHERE cyl = 6;

-- @query: df_mtcars
SELECT *
FROM mtcars;

-- @query: df_cyl_6
SELECT *
FROM cyl_6

Result - Four chunks, each starting at its respective tag line.
Each chunk is parsed and executed independently in sequence.

What is qryflow?