Neural Brain: A Neuroscience-inspired Framework for Embodied Agents

📢 Note: If you have any suggestions, feel free to post an issue or pull a request, we will address asap!

1. Introduction

[2025.5.13]: 🔥🔥 We release the official repository of paper Neural Brain: A Neuroscience-inspired Framework for Embodied Agents. This paper is the first to introduce an innovative perspective by defining Neural Brain of embodied agents through the lens of neuroscience. We not only propose this pioneering definition but also provide a comprehensive design framework for the purpose.

Additionally, we revisit the existing literature in alignment with this novel framework, highlighting gaps and challenges, and outlining promising directions for future research. The proposed framework seeks to replicate key principles of biological cognition, including active sensing, a tightly coupled perception-cognition-action loop, etc. By integrating theoretical insights with practical engineering considerations, we aim to advance AI beyond task-specific optimization, laying the groundwork for achieving generalizable embodied intelligence.

The evolution from AI to embodied AI. (a) AI excels in pattern recognition but lacks physical interaction with the real world. (b) Embodied AI enables robots like Atlas of Boston Dynamics and Unitree G1 to perceive and act in their environment. (c) Inspired by the human brain, intelligence arises from neural processes that integrate sensing, perception, cognition, action, and memory. (d) This work proposes a concept of Neural Brain for Embodied Agents, combining neuroscience to achieve generalizable embodied AI.

2. Human Brain to Neural Brain

2.1 Human Brain: Insights from Neuroscience

The human brain comprises four key components: sensing, function (perception, cognition, action), memory (short-term and long-term), and implementation features, such as sparse activation, event-driven processing, predictive coding, and distributed and parallel mechanisms. Inspired by insights from neuroscience, we propose the concept of a Neural Brain for Embodied Agents, which integrates these principles into four distinct modules. The sensing module incorporates multimodal fusion, active sensing, and adaptive calibration to enhance perceptual capabilities. The function module encompasses predictive perception, cognitive reasoning, and action, including an action-closed loop to ensure continuous interaction with the environment. The memory module features a hierarchical architecture, neuroplastic adaptation, and context awareness, enabling agents to store and retrieve information dynamically and efficiently. Finally, the hardware/software module is characterized by event-driven processing, neuromorphic architecture, and hardware-software co-design, ensuring robust and flexible operation. These four core ideas, derived from the structure and functionality of the human brain, aim to empower embodied agents to adapt, learn, and perform effectively in real-world, embodied environments.

2.2 Definition of Neural Brain from Neuroscience

The Neural Brain for embodied agents is a biologically inspired computational framework that synthesizes principles from neuroscience, robotics, and machine learning to facilitate autonomous and adaptive interaction within unstructured environments. Designed to emulate the hierarchical and distributed architecture of the human brain, it integrates multimodal and active sensing (Sensing), closed-loop perception-cognition-action cycles (Function), neuroplasticity-driven memory systems (Memory), and energy-efficient neuromorphic hardware-software co-design (Hardware/Software), as shown below.

3. Sensing for Neural Brain

3.1 Embodied Agent Sensing

3.1.1 Multimodal and Hierarchical Sensing

(a) Vision

Stereo from uncalibrated cameras [Paper]
Comparison of Kinect v1 and v2 Depth Images in Terms of Accuracy and Precision [Paper]
Event-based Vision: A Survey [Paper]
Lidar System Architectures and Circuits [Paper]

(b) Audition

Eargle's The Microphone Book From Mono to Stereo to Surround - A Guide to Microphone Design and Application [Paper]
DeepEar: Sound Localization With Binaural Microphones [Paper]
Springer Handbook of Speech Processing [Paper]
Ultrasonic transducers and arrays [Paper]

(c) Tactile

Tactile Sensing—From Humans to Humanoids [Paper]
Force Push: Robust Single-Point Pushing with Force Feedback [Paper]
The Feel of MEMS Barometers: Inexpensive and Easily Customized Tactile Array Sensors [Paper]
GelSlim: A High-Resolution, Compact, Robust, and Calibrated Tactile-sensing Finger [Paper]

(d) Olfaction

1D Metal Oxide Semiconductor Materials for Chemiresistive Gas Sensors: A Review [Paper]
Comparisons between mammalian and artificial olfaction based on arrays of carbon black-polymer composite vapor detectors [Paper]
Piezoelectric Sensor [Paper]
Recent Advance in the Design of Colorimetric Sensors for Environmental Monitoring [Paper]

(e) Spatial/Time

Global navigation satellite systems [Paper]
Robust mobile robot localization using optical flow sensors and encoders [Paper]
A Tutorial on Quantitative Trajectory Evaluation for Visual(-Inertial) Odometry [Paper]
Spatio-temporal fusion for remote sensing data: an overview and new benchmark [Paper]

3.1.2 Active Sensing and Adaptive Calibration

(a) Active Sensing

Designing and Evaluating a Social Gaze-Control System for a Humanoid Robot [Paper]
Robot Active Neural Sensing and Planning in Unknown Cluttered Environments [Paper]
Multi-robot coordination through dynamic Voronoi partitioning for informative adaptive sampling in communication-constrained environments [Paper]
Deep Learning for Omnidirectional Vision: A Survey and New Perspectives [Paper]

(b) Adaptive Calibration

Deep Learning for Camera Calibration and Beyond: A Survey [Paper]
Polymer-Based Self-Calibrated Optical Fiber Tactile Sensor [Paper]
Self-validating sensor technology and its application in artificial olfaction: A review [Paper]
GVINS: Tightly Coupled GNSS-Visual-Inertial Fusion for Smooth and Consistent State Estimation [Paper]

3.2 Remarks and Discussions

References

FAST-LIVO2: Fast, Direct LiDAR-Inertial-Visual Odometry [Paper]
VILENS: Visual, Inertial, Lidar, and Leg Odometry for All-Terrain Legged Robots [Paper]
Fast Foveating Cameras for Dense Adaptive Resolution [Paper]
A Survey on Active Simultaneous Localization and Mapping: State of the Art and New Frontiers [Paper]
A New Wave in Robotics: Survey on Recent mmWave Radar Applications in Robotics [Paper]
Electronic Nose and Its Applications: A Survey [Paper]
Real-Time Temporal and Rotational Calibration of Heterogeneous Sensors Using Motion Correlation Analysis [Paper]

4. Neural Brain Perception-Cognition-Action (Function)

4.1 Embodied Agent Perception-Cognition-Action

4.1.1 Perception in AI

(a) Large Language Models (LLMs)

2023

UL2: Unifying Language Learning Paradigms [Paper] [Code]
LLaMA: Open and Efficient Foundation Language Models [Paper] [Code]
LLaMA 2: Open Foundation and Fine-tuned Chat Models [Paper] [Code]

2020

XLNet: Generalized Autoregressive Pretraining for Language Understanding [Paper] [Code]
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer [Paper] [Code]
Language Models are Few-Shot Learners [Paper]

2018

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding [Paper] [Code]

(b) Large Vision Models (LVMs)

2024

DINOv2: Learning Robust Visual Features without Supervision [Paper] [Code]
SAM 2: Segment Anything in Images and Videos [Paper] [Code]

2023

Segment Anything [Paper] [Code]
Segment Everything Everywhere All at Once [Paper] [Code]

2022

Point-BERT: Pre-training 3D Point Cloud Transformers with Masked Point Modeling [Paper] [Code]
PointNeXt: Revisiting PointNet++ with Improved Training and Scaling Strategies [Paper] [Code]
BEVFormer: Learning Bird’s-Eye-View Representation from Multi-Camera Images via Spatiotemporal Transformers [Paper] [Code]

2021

Deformable DETR: Deformable Transformers for End-to-End Object Detection [Paper] [Code]
Emerging Properties in Self-Supervised Vision Transformers [Paper] [Code]

2020

End-to-End Object Detection with Transformers [Paper] [Code]

Backbones

Swin Transformer: Hierarchical Vision Transformer using Shifted Windows [Paper] [Code]
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale [Paper] [Code]
Deep Residual Learning for Image Recognition [Paper] [Code]

(c) Large Multimodal Models (LMMs)

Vision-Language

2025

Perception Encoder: The best visual embeddings are not at the output of the network [Paper] [Code]

2024

Grounding DINO: Marrying Language and Object Detection with Transformers [Paper] [Code]
Grounded SAM: Assembling Open-World Models for Diverse Visual Tasks [Paper] [Code]

2021

ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision [Paper] [Code]
Unifying Vision-and-Language Tasks via Text Generation [Paper] [Code]
Learning Transferable Visual Models From Natural Language Supervision [Paper] [Code]
ALIGN: Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision [Paper]

2020

VL-BERT: Pre-training of Generic Visual-Linguistic Representations [Paper] [Code]
UNITER: UNiversal Image-TExt Representation Learning [Paper] [Code]
Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks [Paper] [Code]

2019

LXMERT: Learning Cross-Modality Encoder Representations from Transformers [Paper] [Code]
ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks [Paper]

Text-Audio

2023

Robust Speech Recognition via Large-Scale Weak Supervision [Paper] [Code]
AudioGen: Textually Guided Audio Generation [Paper] [Code]

2022

AudioCLIP: Extending CLIP to Image, Text and Audio [Paper] [Code]

2021

HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units [Paper] [Code]
VATT: Transformers for Multimodal Self-Supervised Learning from Raw Video, Audio and Text [Paper] [Code]

2020

wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations [Paper] [Code]

Text-Video

2024

Video generation models as world simulators [Paper]

2023

All in One: Exploring Unified Video-Language Pre-training [Paper] [Code]

2022

Make-A-Video: Text-to-Video Generation Without Text-Video Data [Paper]
NUWA: Visual Synthesis Pre-training for Neural Visual World Creation [Paper] [Code]
Phenaki: Variable Length Video Generation from Open-Domain Textual Description [Paper]

2021

VideoCLIP: Contrastive Pretraining for Zero-Shot Video-Text Understanding [Paper] [Code]
Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval [Paper] [Code]

Vision-Tactile-Language

2024

Multimodal Visual-Tactile Representation Learning through Self-Supervised Contrastive Pre-Training [Paper] [Code]
Binding Touch to Everything: Learning Unified Multimodal Tactile Representations [Paper] [Code]
A Touch, Vision, and Language Dataset for Multimodal Alignment [Paper] [Code]

2023

Touching a NeRF: Leveraging Neural Radiance Fields for Tactile Sensory Data Generation [Paper]

2022

Self-Supervised Visuo-Tactile Pretraining to Locate and Follow Garment Features [Paper]
Touch and Go: Learning from Human-Collected Vision and Touch [Paper] [Code]

4.1.2 Perception-Cognition in AI

(a) Large Vision-Language Models (LVLMs)

2022

BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation [Paper] [Code]
Flamingo: a Visual Language Model for Few-Shot Learning [Paper]
CoCa: Contrastive Captioners are Image-Text Foundation Models [Paper]

2019

VisualBERT: A Simple and Performant Baseline for Vision and Language [Paper] [Code]

(b) Multimodal Large Language Models (MLLMs)

2024

Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling [Paper] [Code]
GPT-4o System Card [Paper]
Gemini: A Family of Highly Capable Multimodal Models [Paper]
Visual Instruction Tuning [Paper] [Code]

2023

The Dawn of LMMs: Preliminary Explorations with GPT-4V(ision) [Paper]
Qwen Technical Report [Paper] [Code]

Benchmarks and Evaluations

EmbodiedEval: Evaluate Multimodal LLMs as Embodied Agents [Paper] [Code]
EmbodiedBench: Comprehensive Benchmarking Multi-modal Large Language Models for Vision-Driven Embodied Agents [Paper] [Code]
DeepPerception: Advancing R1-like Cognitive Visual Perception in MLLMs for Knowledge-Intensive Visual Grounding [Paper] [Code]
MME-Survey: A Comprehensive Survey on Evaluation of Multimodal LLMs [Paper] [Code]
PCA-Bench: Evaluating Multimodal Large Language Models in Perception-Cognition-Action Chain [Paper] [Code]

(c) Neuro-Symbolic AI

2025

NeSyC: A Neuro-symbolic Continual Learner For Complex Embodied Tasks In Open Domains [Paper]
From Understanding the World to Intervening in It: A Unified Multi-Scale Framework for Embodied Cognition [Paper]

2024

Can-Do! A Dataset and Neuro-Symbolic Grounded Framework for Embodied Planning with Large Multimodal Models [Paper]

2022

JARVIS: A Neuro-Symbolic Commonsense Reasoning Framework for Conversational Embodied Agents [Paper]

Other Relevant Pages

Page1, Page2, Page3, Page4, Page5, Page6

(d) World Models

2024

WALL-E: World Alignment by Rule Learning Improves World Model-based LLM Agents [Paper] [Code]
Grounding Large Language Models In Embodied Environment With Imperfect World Models [Paper]
GenRL: Multimodal-foundation world models for generalization in embodied agents [Paper] [Code]
AeroVerse: UAV-Agent Benchmark Suite for Simulating, Pre-training, Finetuning, and Evaluating Aerospace Embodied World Models [Paper]

2023

Do Embodied Agents Dream of Pixelated Sheep: Embodied Decision Making using Language Guided World Modelling [Paper])

4.1.3 Perception-Action in AI

(a) Vision-Language-Action Models (VLA)

2024

PIVOT: Iterative Visual Prompting Elicits Actionable Knowledge for VLMs [Paper] [Code]
ReKep: Spatio-Temporal Reasoning of Relational Keypoint Constraints for Robotic Manipulation [Paper] [Code]
DP3: 3D Diffusion Policy: Generalizable Visuomotor Policy Learning via Simple 3D Representations [Paper] [Code]
RDT-1B: a Diffusion Foundation Model for Bimanual Manipulation [Paper] [Code]
MBA: Motion Before Action: Diffusing Object Motion as Manipulation Condition [Paper] [Code]
RVT-2: Learning Precise Manipulation from Few Demonstrations [Paper] [Code]

2023

DP: Diffusion Policy: Visuomotor Policy Learning via Action Diffusion [Paper] [Code]
RVT: Robotic View Transformer for 3D Object Manipulation [Paper] [Code]
ACT: Learning finegrained bimanual manipulation with low-cost hardware [Paper] [Code]
Q-Transformer: Scalable Offline Reinforcement Learning via Autoregressive Q-Functions [Paper] [Code]

2022

RT-1: Robotics Transformer for Real-World Control at Scale [Paper] [Code]
BC-Z: Zero-Shot Task Generalization with Robotic Imitation Learning [Paper] [Code]
CLIPort: What and Where Pathways for Robotic Manipulation [Paper] [Code]

2021

Transporter Networks: Rearranging the Visual World for Robotic Manipulation [Paper] [Code]

2020

MCIL: Language Conditioned Imitation Learning over Unstructured Data [Paper] [Code]

(b) Vision-Language-Navigation Models (VLN)

2023

BEVBert: Multimodal Map Pre-training for Language-guided Navigation [Paper] [Code]

2022

CM2: Cross-modal Map Learning for Vision and Language Navigation [Paper] [Code]
Talk2nav: Talk2Nav: Long-Range Vision-and-Language Navigation with Dual Attention and Spatial Memory [Paper] [Code]
Waypoints Predictor: Bridging the Gap Between Learning in Discrete and Continuous Environments for Vision-and-Language Navigation [Paper] [Code]

2021

HAMT: History Aware Multimodal Transformer for Vision-and-Language Navigation [Paper] [Code]
WPN: Waypoint Models for Instruction-guided Navigation in Continuous Environments [Paper] [Code]
Sim-to-Real: Sim-to-Real Transfer for Vision-and-Language Navigation [Paper] [Code]

2018

Speaker-Follower: Speaker-Follower Models for Vision-and-Language Navigation [Paper] [Code]
VLN-Graph: Vision-and-Language Navigation: Interpreting visually-grounded navigation instructions in real environments [Paper] [Code]

4.1.4 Perception-Cognition-Action in AI

(a) Vision-Language-Action Models (VLA)

2025

π_0.5: a VLA with Open-World Generalization [Paper] [Code]

2024

π₀: A Vision-Language-Action Flow Model for General Robot Control [Paper] [Code]
RT-H: Action Hierarchies Using Language [Paper] [Code]
RT-X: Open X-Embodiment: Robotic Learning Datasets and RT-X Models [Paper] [Code]
OpenVLA: An Open-Source Vision-Language-Action Model [Paper] [Code]

2023

RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control [Paper] [Code]

(b) Vision-Language-Navigation Models (VLN)

2024

MapGPT: Map-Guided Prompting with Adaptive Path Planning for Vision-and-Language Navigation [Paper] [Code]
InstructNav: Zero-shot System for Generic Instruction Navigation in Unexplored Environment [Paper] [Code]
NavGPT: Explicit Reasoning in Vision-and-Language Navigation with Large Language Models [Paper] [Code]
NavGPT-2: Unleashing Navigational Reasoning Capability for Large Vision-Language Models [Paper] [Code]
NaVid: Video-based VLM Plans the Next Step for Vision-and-Language Navigation [Paper] [Code]

2022

CLIP-Nav: Using CLIP for Zero-Shot Vision-and-Language Navigation [Paper]

4.2. Remarks and Discussions

References

Multimodal Machine Learning: A Survey and Taxonomy [Paper]
Multimodal Interaction, Interfaces, and Communication: A Survey [Paper]
Measuring Domain Shift for Deep Learning in Histopathology [Paper]
Generalized Out-of-Distribution Detection: A Survey [Paper]

5. Neural Brain Memory Storage and Update

5.1 Embodied Agent Knowledge Storage and Update

5.1.1 Memory Architectures for Embodied Agents

(a) Neural Memory Systems

2025

Mem2Ego: Empowering Vision-Language Models with Global-to-Ego Memory for Long-Horizon Embodied Navigation [Paper]
MINDSTORES: Memory-Informed Neural Decision Synthesis for Task-Oriented Reinforcement in Embodied Systems [Paper]

2024

Embodied VideoAgent: Persistent Memory from Egocentric Videos and Embodied Sensors Enables Dynamic Scene Understanding [Paper]
KARMA: Augmenting Embodied AI Agents with Long-and-short Term Memory Systems [Paper]
Skip-SCAR: Hardware-Friendly High-Quality Embodied Visual Navigation [Paper]

2021

End-to-End Egospheric Spatial Memory [Paper] [Code]

2020

Distributed Associative Memory Network with Memory Refreshing Loss [Paper] [Code]

2014

Neural Turing Machines [Paper]

(b) Structured and Symbolic Memory

2025

LLM-Powered Decentralized Generative Agents with Adaptive Hierarchical Knowledge Graph for Cooperative Planning [Paper] [Code]
AdaptBot: Combining LLM with Knowledge Graphs and Human Input for Generic-to-Specific Task Decomposition and Knowledge Refinement [Paper] [Code]
EmbodiedVSR: Dynamic Scene Graph-Guided Chain-of-Thought Reasoning for Visual Spatial Tasks [Paper]

2024

Scene-Driven Multimodal Knowledge Graph Construction for Embodied AI [Paper] [Code]
Aligning Knowledge Graph with Visual Perception for Object-goal Navigation [Paper] [Code]
Safety Control of Service Robots with LLMs and Embodied Knowledge Graphs [Paper]
ESGNN: Towards Equivariant Scene Graph Neural Network for 3D Scene Understanding [Paper]
3DGraphLLM: Combining Semantic Graphs and Large Language Models for 3D Scene Understanding [Paper] [Code]
Embodied-RAG: General Non-Parametric Embodied Memory for Retrieval and Generation [Paper]

2023

SGRec3D: Self-Supervised 3D Scene Graph Learning via Object-Level Scene Reconstruction [Paper]
Modeling Dynamic Environments with Scene Graph Memory [Paper] [Code]
Structure-CLIP: Towards Scene Graph Knowledge to Enhance Multi-modal Structured Representations [Paper] [Code]

(c) Spatial and Episodic Memory

2025

STMA: A Spatio-Temporal Memory Agent for Long-Horizon Embodied Task Planning [Paper]

2024

Spatially-Aware Transformer for Embodied Agents [Paper] [Code]
3D-Mem: 3D Scene Memory for Embodied Exploration and Reasoning [Paper] [Code]
Planning from Imagination: Episodic Simulation and Episodic Memory for Vision-and-Language Navigation [Paper]

5.1.2 Knowledge Update Mechanisms

(a) Adaptive Learning Over Time

2025

NeSyC: A Neuro-symbolic Continual Learner For Complex Embodied Tasks In Open Domains [Paper]
Active Learning for Continual Learning: Keeping the Past Alive in the Present [Paper]

2024

Online Continual Learning For Interactive Instruction Following Agents [Paper] [Code]

2023

Voyager: An Open-Ended Embodied Agent with Large Language Models [Paper] [Code]
Embodied Lifelong Learning for Task and Motion Planning [Paper]
Fast-Slow Test-Time Adaptation for Online Vision-and-Language Navigation [Paper] [Code]
Building Open-Ended Embodied Agent via Language-Policy Bidirectional Adaptation [Paper]

2021

AFEC: Active Forgetting of Negative Transfer in Continual Learning [Paper] [Code]

(b) Self-Guided and Efficient Learning

2025

DRESS: Disentangled Representation-based Self-Supervised Meta-Learning for Diverse Tasks [Paper] [Code]
ReMA: Learning to Meta-Think for LLMs with Multi-Agent Reinforcement Learning [Paper]

2024

Self-Supervised Meta-Learning for All-Layer DNN-Based Adaptive Control with Stability Guarantees [Paper]

2023

Modality-Agnostic Self-Supervised Learning with Meta-Learned Masked Auto-Encoder [Paper] [Code]
Unleash Model Potential: Bootstrapped Meta Self-Supervised Learning [Paper]

2022

Multimodal Masked Autoencoders Learn Transferable Representations [Paper] [Code]

(c) Multimodal Integration and Knowledge Fusion

2024

UniCL: A Universal Contrastive Learning Framework for Large Time Series Models [Paper]
Binding Touch to Everything: Learning Unified Multimodal Tactile Representations [Paper] [Code]

2023

Meta-Transformer: A Unified Framework for Multimodal Learning [Paper] [Code]
MM-ReAct: Prompting ChatGPT for Multimodal Reasoning and Action [Paper] [Code]

2022

Uni-Perceiver v2: A Generalist Model for Large-Scale Vision and Vision-Language Tasks [Paper] [Code]
General-Purpose, Long-Context Autoregressive Modeling with Perceiver AR [Paper] [Code]

5.2. Remarks and Discussions

References

Lifelong Learning of Large Language Model based Agents: A Roadmap [Paper] [Code]
Aligning Cyber Space with Physical World: A Comprehensive Survey on Embodied AI [Paper] [Code]
MEIA: Multimodal Embodied Perception and Interaction in Unknown Environments [Paper]

6. Neural Brain Hardware and Software

6.1. Embodied Agent Hardware & Software

6.1.1. Neuromorphic Hardware

Loihi: A neuromorphic manycore processor with on-chip learning [Paper]
Neuronflow: A hybrid neuromorphic--dataflow processor architecture for AI workloads [Paper]
Efficient neuromorphic signal processing with loihi 2 [Paper]
The BrainScaleS-2 accelerated neuromorphic system with hybrid plasticity [Paper]
Neuromorphic artificial intelligence systems [Paper]
Speck: A smart event-based vision sensor with a low latency 327k neuron convolutional neuronal network processing pipeline [Paper]
Prime: A novel processing-in-memory architecture for neural network computation in reram-based main memory [Paper]
NeuroSim: A circuit-level macro model for benchmarking neuro-inspired architectures in online learning [Paper]
A phase-change memory model for neuromorphic computing [Paper]
PUMA: A programmable ultra-efficient memristor-based accelerator for machine learning inference [Paper]
ODIN: A bit-parallel stochastic arithmetic based accelerator for in-situ neural network processing in phase change RAM [Paper]
Lighton optical processing unit: Scaling-up AI and HPC with a non von neumann co-processor [Paper]
An on-chip photonic deep neural network for image classification [Paper]
Quantum reservoir computing in finite dimensions [Paper]
Theoretical error performance analysis for variational quantum circuit based functional regression [Paper]

6.1.2. Software Frameworks for Neural Brains

Speaker-follower models for vision-and-language navigation [Paper]
AudioCLIP: Extending CLIP to Image, Text and Audio [Paper]
VLN BERT: A recurrent vision-and-language bert for navigation [Paper]
Blip-2: Bootstrapping language-image pre-training with frozen image encoders and large language models [Paper]
GPT-4 technical report [Paper]
Palm 2 technical report [Paper]
Qwen technical report [Paper]
Rt-2: Vision-language-action models transfer web knowledge to robotic control [Paper]
Deepseek-V3 technical report [Paper]
NEST: A network simulation and prototyping testbed [Paper]
Bindsnet: A machine learning-oriented spiking neural networks library in python [Paper]
Brian 2, an intuitive and efficient neural simulator [Paper]
Spinnaker-a spiking neural network architecture [Paper]
Norse-A deep learning library for spiking neural networks [Paper]
Tensorrt inference with tensorflow [Paper]
Compiling onnx neural network models using mlir [Paper]
Impact of thermal throttling on long-term visual inference in a CPU-based edge device [Paper]
Comparison and benchmarking of AI models and frameworks on mobile devices [Paper]
Tensorflow lite micro: Embedded machine learning for tinyml systems [Paper]

6.1.3. Energy-Efficient Learning at the Edge

Sparse convolutional neural networks [Paper]
Training sparse neural networks [Paper]
SCNN: An accelerator for compressed-sparse convolutional neural networks [Paper]
Sparse computation in adaptive spiking neural networks [Paper]
Sbnet: Sparse blocks network for fast inference [Paper]
Big bird: Transformers for longer sequences [Paper]
Glam: Efficient scaling of language models with mixture-of-experts [Paper]
Switch transformers: Scaling to trillion parameter models with simple and efficient sparsity [Paper]
Base layers: Simplifying training of large, sparse models [Paper]
Distilling the knowledge in a neural network [Paper]
Quantization and training of neural networks for efficient integer-arithmetic-only inference [Paper]
STDP-based pruning of connections and weight quantization in spiking neural networks for energy-efficient recognition [Paper]
Quantization networks [Paper]
Quantization framework for fast spiking neural networks [Paper]
Efficient neural networks for edge devices [Paper]
A million spiking-neuron integrated circuit with a scalable communication network and interface [Paper]
Eyeriss: A spatial architecture for energy-efficient dataflow for convolutional neural networks [Paper]
Maeri: Enabling flexible dataflow mapping over DNN accelerators via reconfigurable interconnects [Paper]
DNNBuilder: An automated tool for building high-performance DNN hardware accelerators for FPGAs [Paper]
TVM: An automated End-to-End optimizing compiler for deep learning [Paper]
SECDA: Efficient hardware/software co-design of FPGA-based DNN accelerators for edge inference [Paper]

6.2. Remarks and Discussions

References

Challenges for large-scale implementations of spiking neural networks on FPGAs [Paper]
Progress and Challenges in Large Scale Spiking Neural Networks for AI and Neuroscience [Paper]
A review on methods, issues and challenges in neuromorphic engineering [Paper]
Real-Time Neuromorphic Navigation: Guiding Physical Robots with Event-Based Sensing and Task-Specific Reconfigurable Autonomy Stack [Paper]

Citation

If you find the paper useful, please consider citing our paper.

@article{2025neuralbrain,
  title={Neural Brain: A Neuroscience-inspired Framework for Embodied Agents},
  author={Liu, Jian and Shi, Xiongtao and Nguyen, Thai and Zhang, Haitian and Zhang, Tianxiang and Sun, Wei and Li, Yanjie and Vasilakos, Athanasios and Iacca, Giovanni and Khan, Arshad and others},
  journal={arXiv preprint arXiv:2505.07634},
  year={2025}
}

Contact

Due to the one-sided nature of our knowledge, if you find any issues or have any suggestions, please feel free to post an issue or contact us via email

Name		Name	Last commit message	Last commit date
Latest commit History 59 Commits
image		image
LICENSE		LICENSE
README.md		README.md

License

BioRAILab/Neural-Brain-for-Embodied-Agents

Folders and files

Latest commit

History

Repository files navigation

Neural Brain: A Neuroscience-inspired Framework for Embodied Agents

1. Introduction | 2. Human Brain to Neural Brain

3. Sensing | 4. Function | 5. Memory | 6. Hardware/Software

1. Introduction

2. Human Brain to Neural Brain

2.1 Human Brain: Insights from Neuroscience

2.2 Definition of Neural Brain from Neuroscience

3. Sensing for Neural Brain

3.1 Embodied Agent Sensing

3.1.1 Multimodal and Hierarchical Sensing

3.1.2 Active Sensing and Adaptive Calibration

3.2 Remarks and Discussions

4. Neural Brain Perception-Cognition-Action (Function)

4.1 Embodied Agent Perception-Cognition-Action

4.1.1 Perception in AI

2023

2020

2018

2024

2023

2022

2021

2020

Backbones

2025

2024

2021

2020

2019

2023

2022

2021

2020

2024

2023

2022

2021

2024

2023

2022

4.1.2 Perception-Cognition in AI

2022

2019

2024

2023

Benchmarks and Evaluations

2025

2024

2022

Other Relevant Pages

2024

2023

4.1.3 Perception-Action in AI

2024

2023

2022

2021

2020

2023

2022

2021

2018

4.1.4 Perception-Cognition-Action in AI

2025

2024

2023

2024

2022

4.2. Remarks and Discussions

5. Neural Brain Memory Storage and Update

5.1 Embodied Agent Knowledge Storage and Update

5.1.1 Memory Architectures for Embodied Agents

2025

2024

Packages