AAAI 2025 | Proceedings of the AAAI Conference on Artificial Intelligence

Deep Reinforcement Learningwith Time-Scale Invariant Memory

Md Rysul Kabir, James Mochizuki-Freeman, Zoran Tiganj

Department of Computer Science, Luddy School of Informatics, Computing, and Engineering, Indiana University Bloomington

A neuroscience-inspired memory mechanism that allows RL agents to retain robust performance when temporal structure in the environment is rescaled.

Abstract

Temporal credit assignment is difficult for both biological and artificial agents, especially when the same task appears at different time scales. This work integrates a scale-invariant memory model into deep reinforcement learning and demonstrates strong, stable learning across temporally rescaled conditions. The key idea is to build a log-compressed memory of past inputs so that temporal rescaling appears as translation in internal state, reducing the need to retune agents for each scale.

What We Did

Replaced standard recurrent memory with CogRNN, a memory module based on Laplace-domain encoding and approximate inverse reconstruction.

Why It Matters

Animal timing behavior is approximately scale invariant, while typical RL memory models often learn in a scale-dependent way.

What We Found

CogRNN agents maintain high performance and more consistent learning speed across task scales, with temporal activity patterns that align with scale-invariant coding principles.

Time-Scale Invariant Memory

Let $f(t)$ denote encoded observations over time. CogRNN builds a bank of exponentially decaying traces, equivalent to a real-domain Laplace transform, then applies an analytic inverse approximation to recover a sequence of temporal basis functions $\tilde{f}$ over log-spaced internal time constants.

Continuous-Time Memory Encoding

The first stage accumulates history using exponentially weighted traces over a spectrum of decay rates $s$.

\[ F(s;t)=\int_0^t e^{-s(t-t')} f(t')\,dt' \]

Small $s$ keeps long-range context; large $s$ emphasizes recent input.

Differential Form

The same transform can be written as a linear differential equation, which is convenient for recurrent implementation.

\[ \frac{dF(s;t)}{dt}=-sF(s;t)+f(t) \]

This shows a simple decay-plus-drive process at each temporal scale.

Approximate Inverse Transform

The second stage reconstructs a temporally localized code from Laplace-domain traces.

\[ \tilde{f}(\overset{*}{\tau};t)=\mathcal{L}_k^{-1}F(s;t)=\frac{(-1)^k}{k!}s^{k+1}\frac{d^k}{ds^k}F(s;t) \]

Using $\overset{*}{\tau}=k/s$, units tile time on a compressed axis and support sequential time-cell-like activity.

Discrete-Time Recurrent Update

For neural networks, the memory update is implemented directly as a recurrence.

\[ F_{s,t}=\mathbf{L}\,F_{s,t-1}+f_t \]

The diagonal operator $\mathbf{L}$ stores analytically chosen decay rates across memory channels.

Impulse Response of Reconstructed Memory

This expression describes the temporal profile of each reconstructed unit.

\[ \tilde{f}_{\overset{*}{\tau},t}=\frac{1}{t}\frac{k^{k+1}}{k!}\left(\frac{t}{\overset{*}{\tau}}\right)^{k+1}e^{-k\frac{t}{\overset{*}{\tau}}} \] \[ \overset{*}{\tau}_i=(1+c)^{i-1}\overset{*}{\tau}_{\min}, \qquad \Delta=\log_{1+c}(a) \] \[ f(at)\;\Rightarrow\;\tilde{f}_{i}(at)\approx \tilde{f}_{i+\Delta}(t) \]

With log-spaced $\overset{*}{\tau}$ values, temporal rescaling in the input produces an approximately constant index shift in memory coordinates; this translation-like behavior is why downstream policies can preserve performance across scales instead of relearning separate dynamics for each temporal stretch factor.

A CogRNN response to impulse inputs, showing decays and time-cell-like peaks.
B Log-compressed memory for scale=1,2,4 signals.
C Rescaling turns into translation over memory index.
A: Decay traces and reconstructed sequential activation in CogRNN memory. B: Log-compressed memory states for temporally rescaled inputs. C: Re-indexed memory representation where temporal rescaling appears as translation.

Experimental Setup

RL Agent Architecture

The RL agent architecture consists of three components:

  1. Encoder: Convolutional layers for feature extraction (3D environments).
  2. Core: Recurrent memory (CogRNN, LSTM, or RNN).
  3. Agent: Policy network ($\pi$) and value network ($V$).
Agent architecture with encoder, recurrent memory, policy and value heads.
Encoder, memory core, and policy/value heads in the RL pipeline.

Tasks

  • Interval timing (1D and 3D): Decides whether an interval is short or long based on sensory input.
  • Interval discrimination: Distinguishes between different time intervals based on sensory cues.
  • Delayed match to sample: Determines if two stimuli separated by a delay are the same.
  • Interval reproduction: Reproduces a time interval after observing it.
T-maze interval timing environment snapshots.
Static snapshots of the interval-timing task: trial start at the red line, post-interval gate opening, and left/right decision endpoints used to classify short vs long intervals.
Environment rollout showing interval timing setup.
Rollout visualization of the interval-timing environment used in training and evaluation.

Results

Stable Learning Across Temporal Scales

Across four tasks, CogRNN shows consistently strong performance under rescaling. LSTM learns in several settings but exhibits stronger scale dependence, especially in harder conditions.

Interval Timing 1D Interval Discrimination Delayed match to sample Interval timing 3D

CogRNN

CogRNN panel A: interval timing 1D. CogRNN panel B: interval discrimination. CogRNN panel C: delayed match to sample. CogRNN panel D: interval timing 3D.

LSTM

LSTM panel A: interval timing 1D. LSTM panel B: interval discrimination. LSTM panel C. LSTM panel D.
Legend for scale values used in performance plots.
Performance across representative tasks and temporal scales for CogRNN and LSTM models.

From Scale Covariance to Approximate Invariance

Memory traces are covariant under temporal rescaling (shift in internal coordinates). Convolution and pooling convert this into approximate invariance, allowing transfer from one scale to others without re-optimizing policy dynamics from scratch.

Legend for Figure 5A scale conditions.
A Figure 5A: convolution and pooling output across scales.
B Performance comparison between CogRNN and RNN across scales.
A: Convolution and pooling align shifted memory traces and B: support stronger cross-scale performance than a standard RNN baseline.

Temporal Neural Dynamics Align with Scale-Invariant Coding

Time-cell-like responses appear in multiple architectures, but CogRNN shows the clearest log-compressed temporal progression consistent with the intended representation geometry.

RNN RNN neuron activity heatmap.
LSTM LSTM neuron activity heatmap.
CogRNN CogRNN neuron activity heatmap.
Representative neural activity maps from trained agents, sorted by peak time.

Conclusion

This research demonstrates that incorporating computational principles from neuroscience into deep learning architectures can enhance their adaptability and robustness. Scale-invariant representations may be crucial for developing AI systems that can flexibly adjust to new environments without extensive hyperparameter tuning - much like biological organisms navigate the world across vastly different spatial and temporal scales.

Future directions

  • Combining scale-invariant memory with power-law temporal discounting.
  • Extending to spatial scale invariance for navigation tasks.
  • Applications beyond timing tasks to general temporal reasoning.

Citation

If this work helps your research, please cite:

@article{Kabir_Mochizuki-Freeman_Tiganj_2025,
  title={Deep Reinforcement Learning with Time-Scale Invariant Memory},
  volume={39},
  url={https://ojs.aaai.org/index.php/AAAI/article/view/32124},
  DOI={10.1609/aaai.v39i2.32124},
  abstractNote={The ability to estimate temporal relationships is critical for both animals and artificial agents. Cognitive science and neuroscience provide remarkable insights into behavioral and neural aspects of temporal credit assignment. In particular, scale invariance of learning dynamics, observed in behavior and supported by neural data, is one of the key principles that governs animal perception: proportional rescaling of temporal relationships does not alter the overall learning efficiency. Here we integrate a computational neuroscience model of scale invariant memory into deep reinforcement learning (RL) agents. We first provide a theoretical analysis and then demonstrate through experiments that such agents can learn robustly across a wide range of temporal scales, unlike agents built with commonly used recurrent memory architectures such as LSTM. This result illustrates that incorporating computational principles from neuroscience and cognitive science into deep neural networks can enhance adaptability to complex temporal dynamics, mirroring some of the core properties of human learning.},
  number={2},
  journal={Proceedings of the AAAI Conference on Artificial Intelligence},
  author={Kabir, Md Rysul and Mochizuki-Freeman, James and Tiganj, Zoran},
  year={2025},
  month={Apr.},
  pages={1345-1354}
}