Deep Reinforcement Learning with Time-Scale Invariant Memory

Abstract

Temporal credit assignment is difficult for both biological and artificial agents, especially when the same task appears at different time scales. This work integrates a scale-invariant memory model into deep reinforcement learning and demonstrates strong, stable learning across temporally rescaled conditions. The key idea is to build a log-compressed memory of past inputs so that temporal rescaling appears as translation in internal state, reducing the need to retune agents for each scale.

What We Did

Replaced standard recurrent memory with CogRNN, a memory module based on Laplace-domain encoding and approximate inverse reconstruction.

Why It Matters

Animal timing behavior is approximately scale invariant, while typical RL memory models often learn in a scale-dependent way.

What We Found

CogRNN agents maintain high performance and more consistent learning speed across task scales, with temporal activity patterns that align with scale-invariant coding principles.

Time-Scale Invariant Memory

Let $f(t)$ denote encoded observations over time. CogRNN builds a bank of exponentially decaying traces, equivalent to a real-domain Laplace transform, then applies an analytic inverse approximation to recover a sequence of temporal basis functions $\tilde{f}$ over log-spaced internal time constants.

Continuous-Time Memory Encoding

The first stage accumulates history using exponentially weighted traces over a spectrum of decay rates $s$.

F(s;t)=\int_0^t e^{-s(t-t')} f(t')\,dt'

Small $s$ keeps long-range context; large $s$ emphasizes recent input.

Differential Form

The same transform can be written as a linear differential equation, which is convenient for recurrent implementation.

\frac{dF(s;t)}{dt}=-sF(s;t)+f(t)

This shows a simple decay-plus-drive process at each temporal scale.

Approximate Inverse Transform

The second stage reconstructs a temporally localized code from Laplace-domain traces.

\tilde{f}(\overset{*}{\tau};t)=\mathcal{L}_k^{-1}F(s;t)=\frac{(-1)^k}{k!}s^{k+1}\frac{d^k}{ds^k}F(s;t)

Using $\overset{*}{\tau}=k/s$, units tile time on a compressed axis and support sequential time-cell-like activity.

Discrete-Time Recurrent Update

For neural networks, the memory update is implemented directly as a recurrence.

F_{s,t}=\mathbf{L}\,F_{s,t-1}+f_t

The diagonal operator $\mathbf{L}$ stores analytically chosen decay rates across memory channels.

Impulse Response of Reconstructed Memory

This expression describes the temporal profile of each reconstructed unit.

\tilde{f}_{\overset{*}{\tau},t}=\frac{1}{t}\frac{k^{k+1}}{k!}\left(\frac{t}{\overset{*}{\tau}}\right)^{k+1}e^{-k\frac{t}{\overset{*}{\tau}}} \] \[ \overset{*}{\tau}_i=(1+c)^{i-1}\overset{*}{\tau}_{\min}, \qquad \Delta=\log_{1+c}(a) \] \[ f(at)\;\Rightarrow\;\tilde{f}_{i}(at)\approx \tilde{f}_{i+\Delta}(t)

With log-spaced $\overset{*}{\tau}$ values, temporal rescaling in the input produces an approximately constant index shift in memory coordinates; this translation-like behavior is why downstream policies can preserve performance across scales instead of relearning separate dynamics for each temporal stretch factor.

CogRNN response to impulse inputs, showing decays and time-cell-like peaks. — A: Decay traces and reconstructed sequential activation in CogRNN memory. B: Log-compressed memory states for temporally rescaled inputs. C: Re-indexed memory representation where temporal rescaling appears as translation.

Log-compressed memory for scale=1,2,4 signals. — A: Decay traces and reconstructed sequential activation in CogRNN memory. B: Log-compressed memory states for temporally rescaled inputs. C: Re-indexed memory representation where temporal rescaling appears as translation.

Experimental Setup

RL Agent Architecture

The RL agent architecture consists of three components:

Encoder: Convolutional layers for feature extraction (3D environments).
Core: Recurrent memory (CogRNN, LSTM, or RNN).
Agent: Policy network ($\pi$) and value network ($V$).

Agent architecture with encoder, recurrent memory, policy and value heads. — Encoder, memory core, and policy/value heads in the RL pipeline.

Tasks

Interval timing (1D and 3D): Decides whether an interval is short or long based on sensory input.
Interval discrimination: Distinguishes between different time intervals based on sensory cues.
Delayed match to sample: Determines if two stimuli separated by a delay are the same.
Interval reproduction: Reproduces a time interval after observing it.

T-maze interval timing environment snapshots. — Static snapshots of the interval-timing task: trial start at the red line, post-interval gate opening, and left/right decision endpoints used to classify short vs long intervals.

Environment rollout showing interval timing setup. — Rollout visualization of the interval-timing environment used in training and evaluation.

Results

Stable Learning Across Temporal Scales

Across four tasks, CogRNN shows consistently strong performance under rescaling. LSTM learns in several settings but exhibits stronger scale dependence, especially in harder conditions.

CogRNN panel A: interval timing 1D. — Performance across representative tasks and temporal scales for CogRNN and LSTM models.

CogRNN panel B: interval discrimination. — Performance across representative tasks and temporal scales for CogRNN and LSTM models.

From Scale Covariance to Approximate Invariance

Memory traces are covariant under temporal rescaling (shift in internal coordinates). Convolution and pooling convert this into approximate invariance, allowing transfer from one scale to others without re-optimizing policy dynamics from scratch.

Legend for Figure 5A scale conditions. — A: Convolution and pooling align shifted memory traces and B: support stronger cross-scale performance than a standard RNN baseline.

Figure 5A: convolution and pooling output across scales. — A: Convolution and pooling align shifted memory traces and B: support stronger cross-scale performance than a standard RNN baseline.

Temporal Neural Dynamics Align with Scale-Invariant Coding

Time-cell-like responses appear in multiple architectures, but CogRNN shows the clearest log-compressed temporal progression consistent with the intended representation geometry.

RNN neuron activity heatmap. — Representative neural activity maps from trained agents, sorted by peak time.

LSTM neuron activity heatmap. — Representative neural activity maps from trained agents, sorted by peak time.

Conclusion

This research demonstrates that incorporating computational principles from neuroscience into deep learning architectures can enhance their adaptability and robustness. Scale-invariant representations may be crucial for developing AI systems that can flexibly adjust to new environments without extensive hyperparameter tuning - much like biological organisms navigate the world across vastly different spatial and temporal scales.

Future directions

Combining scale-invariant memory with power-law temporal discounting.
Extending to spatial scale invariance for navigation tasks.
Applications beyond timing tasks to general temporal reasoning.

Citation

If this work helps your research, please cite:

@article{Kabir_Mochizuki-Freeman_Tiganj_2025,
  title={Deep Reinforcement Learning with Time-Scale Invariant Memory},
  volume={39},
  url={https://ojs.aaai.org/index.php/AAAI/article/view/32124},
  DOI={10.1609/aaai.v39i2.32124},
  abstractNote={The ability to estimate temporal relationships is critical for both animals and artificial agents. Cognitive science and neuroscience provide remarkable insights into behavioral and neural aspects of temporal credit assignment. In particular, scale invariance of learning dynamics, observed in behavior and supported by neural data, is one of the key principles that governs animal perception: proportional rescaling of temporal relationships does not alter the overall learning efficiency. Here we integrate a computational neuroscience model of scale invariant memory into deep reinforcement learning (RL) agents. We first provide a theoretical analysis and then demonstrate through experiments that such agents can learn robustly across a wide range of temporal scales, unlike agents built with commonly used recurrent memory architectures such as LSTM. This result illustrates that incorporating computational principles from neuroscience and cognitive science into deep neural networks can enhance adaptability to complex temporal dynamics, mirroring some of the core properties of human learning.},
  number={2},
  journal={Proceedings of the AAAI Conference on Artificial Intelligence},
  author={Kabir, Md Rysul and Mochizuki-Freeman, James and Tiganj, Zoran},
  year={2025},
  month={Apr.},
  pages={1345-1354}
}