--- title: "Advanced Usage: reticulate-backed TDT Workflows" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Advanced Usage: reticulate-backed TDT Workflows} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set(collapse = TRUE, comment = "#>") library(tdtr) ``` This vignette is for workflows where the getting-started interface is not enough: large recordings, direct Python `tdt` reader controls, live Python objects, event-aligned reads, and memory diagnostics. ```{r} example_path <- tdtr_example_block_path() ``` ## Python State And Saving Since TDT data are read using Python under the hood, there are some details to consider when saving analysis state. `read_block()` returns a `tdt_block_py` object by default. This object is a small R handle to a live Python `tdt` object in the current R session. In this context, "handle" means the R object points to data managed by Python. It does not mean all of the underlying stream arrays are stored inside the R object. That design is useful because it avoids copying every stream into R when you only want to inspect names, metadata, or a bounded subset. It also means ordinary R save functions should not be used as the durable record of a Python-backed block. Saving the handle does not save the associated Python session. A durable workflow has two parts: 1. save the path and read parameters needed to recreate the Python-backed block; 2. save explicitly collected R objects when you need durable analysis data. ```{r} read_plan <- list( block_path = example_path, args = list( evtype = c("epocs", "streams"), t1 = 0, t2 = 5, verbose = 0 ) ) block <- do.call( read_block, c(list(block_path = read_plan$block_path), read_plan$args) ) class(block) stream_names(block) epoc_names(block) summary(block)$streams ``` Save a read plan or a collected subset, not the live Python-backed handle as long-term state. ```{r, eval = FALSE} saveRDS(read_plan, "tdt-read-plan.rds") selected <- collect_block(block, stores = c("_405A", "_465A")) saveRDS(selected, "selected-tdt-block.rds") ``` ## Python-side Processing For Python-backed objects, `stream()` returns a live Python stream object. The stream's `data` field is a NumPy array. An advanced R/Python user can do work on that array in Python and return only a small result to R. ```{r} py_stream <- stream(block, "_465A") py_data <- py_stream$data class(py_data) ``` This example computes small quality-control summaries in Python. The full trace stays on the Python side; only the small dictionary of results crosses back into R. ```{r} reticulate::py_run_string(" import numpy as np def qc_trace(x): x = np.asarray(x) return { 'n_samples': int(x.shape[0]), 'mean': float(np.nanmean(x)), 'sd': float(np.nanstd(x)), 'p01': float(np.nanpercentile(x, 1)), 'p99': float(np.nanpercentile(x, 99)), } ") reticulate::py$qc_trace(py_data) ``` This pattern is useful when the next few operations are naturally NumPy/Python operations, or when returning a small result to R avoids copying a large array. ## Reader Controls `read_block()` exposes common `tdt.read_block()` controls directly. This example uses `evtype`, `channel`, `t1`, and `t2`; the event-aligned example below uses `ranges`. Other controls, including `store`, `headers`, `nodata`, `sortname`, and export options, are documented in `?read_block_py` and are forwarded to Python `tdt` rather than reimplemented in R. ```{r} bounded <- read_block( example_path, evtype = "streams", channel = 1, t1 = 1, t2 = 3, verbose = 0 ) stream_names(bounded) summary(bounded)$streams ``` Extra keyword arguments can be passed through `...` when the underlying Python package supports an option that the R wrapper has not named explicitly. ## Store Filters And Names Python `tdt` distinguishes the original four-character TDT store ID from the sanitized Python `StructType` key used in returned objects. In this fixture, the original photometry store IDs are `405A` and `465A`, while the returned stream names are `_405A` and `_465A` because Python identifiers cannot start with a digit. Header stores retain the original IDs in their `.name` fields; after data are read, Python `tdt` exposes the sanitized stream names, and those are the names returned by `stream_names()`. `read_block()` inspects the block header before a filtered read and accepts either spelling. Ordinary store names such as `Fi1r` still work as expected. ```{r} fi1r_only <- read_block( example_path, evtype = "streams", store = "Fi1r", t1 = 1, t2 = 3, verbose = 0 ) stream_names(fi1r_only) summary(fi1r_only)$streams ``` The same high-level reader can filter by the stream name returned from `stream_names()`. ```{r} photometry_filter <- read_block( example_path, evtype = "streams", store = "_465A", t1 = 1, t2 = 3, verbose = 0 ) stream_names(photometry_filter) ``` Passing the original TDT store ID is equivalent. ```{r} photometry_original <- read_block( example_path, evtype = "streams", store = "465A", t1 = 1, t2 = 3, verbose = 0 ) stream_names(photometry_original) ``` Use `read_block_py()` when you need the underlying Python behavior exactly. In Python `tdt` 0.7.3, read-time `store` matching happens before key sanitization, so the raw wrapper expects `465A`, not `_465A`, for this store. ```{r} raw_python_filter <- read_block_py( example_path, evtype = "streams", store = "_465A", t1 = 1, t2 = 3, verbose = 0 ) stream_names(raw_python_filter) ``` After reading, tdtr accessors and collection helpers use the stream names shown by `stream_names()`. ```{r} photometry <- collect_stream( photometry_filter, "_465A", as = "list", include_time = TRUE ) list( samples = nrow(photometry$data), channels = ncol(photometry$data), first_times = head(photometry$time, 3) ) ``` ## Explicit Collection Collection is the point where data become ordinary R objects. ```{r} signal <- collect_stream(bounded, "_465A", as = "list", include_time = TRUE) list( samples = nrow(signal$data), channels = ncol(signal$data), first_times = head(signal$time, 3), r_size = format(utils::object.size(signal$data), units = "auto") ) ``` Use `collect_block()` when you want a durable R object for a selected subset. The collected object no longer depends on a live Python session. ```{r} selected <- collect_block(bounded, stores = "_465A") class(selected) summary(selected)$streams ``` ## Event-aligned Reads Event tables can be translated to the `2 x N` range matrix expected by Python `tdt`. This lets you define event windows in R and still reduce the read before collecting stream samples. ```{r} ticks <- collect_epocs(block, store = "Tick") ranges <- ranges_from_epocs(head(ticks, 2), pre = -0.1, post = 0.2) ranges ``` ```{r} aligned <- read_block( example_path, evtype = "streams", ranges = ranges, verbose = 0 ) summary(aligned)$streams ``` ## Profiling Before Scaling Up Use `profile_tdt_memory()` on a bounded read before trying to load a large recording. It profiles the read, metadata accessors, optional summary, optional stream collection, and event collection. ```{r} profile_tdt_memory( example_path, evtype = c("epocs", "streams"), t1 = 0, t2 = 5, stream = "_465A", verbose = 0 ) ``` Interpret the result as a local diagnostic, not as exact total process memory. The helper uses `Rprofmem()` for R allocations and Python `tracemalloc` for Python allocations. Those tools do not perfectly measure every allocation made by R, reticulate, NumPy, Python C extensions, or the operating system. Treat the output as a way to compare alternative reads on the same machine and in the same session. Differences smaller than roughly 10-30% should be treated cautiously; large differences are usually informative. The expected shape is: - `metadata_accessors` stays small because it reads names and metadata; - `collect_stream:*` allocates R memory because it intentionally copies selected stream samples into R; - the Python peak column is useful context, but it is not the same as total resident memory for the R process. Use the high-level workflow when selected data fit comfortably in memory and the downstream analysis expects matrices or tibbles. Use the Python-backed workflow when you need to inspect first, reduce at read time, keep large arrays on the Python side, or persist a reproducible read plan separately from collected R analysis data.