Skip to content

BioRAILab/Neural-Brain-for-Embodied-Agents

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

59 Commits
 
 
 
 
 
 

Repository files navigation

📢 Note: If you have any suggestions, feel free to post an issue or pull a request, we will address asap!

1. Introduction

  • [2025.5.13]: 🔥🔥 We release the official repository of paper Neural Brain: A Neuroscience-inspired Framework for Embodied Agents. This paper is the first to introduce an innovative perspective by defining Neural Brain of embodied agents through the lens of neuroscience. We not only propose this pioneering definition but also provide a comprehensive design framework for the purpose.

Additionally, we revisit the existing literature in alignment with this novel framework, highlighting gaps and challenges, and outlining promising directions for future research. The proposed framework seeks to replicate key principles of biological cognition, including active sensing, a tightly coupled perception-cognition-action loop, etc. By integrating theoretical insights with practical engineering considerations, we aim to advance AI beyond task-specific optimization, laying the groundwork for achieving generalizable embodied intelligence.

intro

The evolution from AI to embodied AI. (a) AI excels in pattern recognition but lacks physical interaction with the real world. (b) Embodied AI enables robots like Atlas of Boston Dynamics and Unitree G1 to perceive and act in their environment. (c) Inspired by the human brain, intelligence arises from neural processes that integrate sensing, perception, cognition, action, and memory. (d) This work proposes a concept of Neural Brain for Embodied Agents, combining neuroscience to achieve generalizable embodied AI.

2. Human Brain to Neural Brain

2.1 Human Brain: Insights from Neuroscience

The human brain comprises four key components: sensing, function (perception, cognition, action), memory (short-term and long-term), and implementation features, such as sparse activation, event-driven processing, predictive coding, and distributed and parallel mechanisms. Inspired by insights from neuroscience, we propose the concept of a Neural Brain for Embodied Agents, which integrates these principles into four distinct modules. The sensing module incorporates multimodal fusion, active sensing, and adaptive calibration to enhance perceptual capabilities. The function module encompasses predictive perception, cognitive reasoning, and action, including an action-closed loop to ensure continuous interaction with the environment. The memory module features a hierarchical architecture, neuroplastic adaptation, and context awareness, enabling agents to store and retrieve information dynamically and efficiently. Finally, the hardware/software module is characterized by event-driven processing, neuromorphic architecture, and hardware-software co-design, ensuring robust and flexible operation. These four core ideas, derived from the structure and functionality of the human brain, aim to empower embodied agents to adapt, learn, and perform effectively in real-world, embodied environments.

intro

2.2 Definition of Neural Brain from Neuroscience

The Neural Brain for embodied agents is a biologically inspired computational framework that synthesizes principles from neuroscience, robotics, and machine learning to facilitate autonomous and adaptive interaction within unstructured environments. Designed to emulate the hierarchical and distributed architecture of the human brain, it integrates multimodal and active sensing (Sensing), closed-loop perception-cognition-action cycles (Function), neuroplasticity-driven memory systems (Memory), and energy-efficient neuromorphic hardware-software co-design (Hardware/Software), as shown below.

intro

3. Sensing for Neural Brain

3.1 Embodied Agent Sensing

3.1.1 Multimodal and Hierarchical Sensing

(a) Vision
  • Stereo from uncalibrated cameras [Paper]
  • Comparison of Kinect v1 and v2 Depth Images in Terms of Accuracy and Precision [Paper]
  • Event-based Vision: A Survey [Paper]
  • Lidar System Architectures and Circuits [Paper]
(b) Audition
  • Eargle's The Microphone Book From Mono to Stereo to Surround - A Guide to Microphone Design and Application [Paper]
  • DeepEar: Sound Localization With Binaural Microphones [Paper]
  • Springer Handbook of Speech Processing [Paper]
  • Ultrasonic transducers and arrays [Paper]
(c) Tactile
  • Tactile Sensing—From Humans to Humanoids [Paper]
  • Force Push: Robust Single-Point Pushing with Force Feedback [Paper]
  • The Feel of MEMS Barometers: Inexpensive and Easily Customized Tactile Array Sensors [Paper]
  • GelSlim: A High-Resolution, Compact, Robust, and Calibrated Tactile-sensing Finger [Paper]
(d) Olfaction
  • 1D Metal Oxide Semiconductor Materials for Chemiresistive Gas Sensors: A Review [Paper]
  • Comparisons between mammalian and artificial olfaction based on arrays of carbon black-polymer composite vapor detectors [Paper]
  • Piezoelectric Sensor [Paper]
  • Recent Advance in the Design of Colorimetric Sensors for Environmental Monitoring [Paper]
(e) Spatial/Time
  • Global navigation satellite systems [Paper]
  • Robust mobile robot localization using optical flow sensors and encoders [Paper]
  • A Tutorial on Quantitative Trajectory Evaluation for Visual(-Inertial) Odometry [Paper]
  • Spatio-temporal fusion for remote sensing data: an overview and new benchmark [Paper]

3.1.2 Active Sensing and Adaptive Calibration

(a) Active Sensing
  • Designing and Evaluating a Social Gaze-Control System for a Humanoid Robot [Paper]
  • Robot Active Neural Sensing and Planning in Unknown Cluttered Environments [Paper]
  • Multi-robot coordination through dynamic Voronoi partitioning for informative adaptive sampling in communication-constrained environments [Paper]
  • Deep Learning for Omnidirectional Vision: A Survey and New Perspectives [Paper]
(b) Adaptive Calibration
  • Deep Learning for Camera Calibration and Beyond: A Survey [Paper]
  • Polymer-Based Self-Calibrated Optical Fiber Tactile Sensor [Paper]
  • Self-validating sensor technology and its application in artificial olfaction: A review [Paper]
  • GVINS: Tightly Coupled GNSS-Visual-Inertial Fusion for Smooth and Consistent State Estimation [Paper]

3.2 Remarks and Discussions

References
  • FAST-LIVO2: Fast, Direct LiDAR-Inertial-Visual Odometry [Paper]
  • VILENS: Visual, Inertial, Lidar, and Leg Odometry for All-Terrain Legged Robots [Paper]
  • Fast Foveating Cameras for Dense Adaptive Resolution [Paper]
  • A Survey on Active Simultaneous Localization and Mapping: State of the Art and New Frontiers [Paper]
  • A New Wave in Robotics: Survey on Recent mmWave Radar Applications in Robotics [Paper]
  • Electronic Nose and Its Applications: A Survey [Paper]
  • Real-Time Temporal and Rotational Calibration of Heterogeneous Sensors Using Motion Correlation Analysis [Paper]

4. Neural Brain Perception-Cognition-Action (Function)

4.1 Embodied Agent Perception-Cognition-Action

4.1.1 Perception in AI

(a) Large Language Models (LLMs)

2023

2020

  • XLNet: Generalized Autoregressive Pretraining for Language Understanding [Paper] [Code]
  • Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer [Paper] [Code]
  • Language Models are Few-Shot Learners [Paper]

2018

  • BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding [Paper] [Code]
(b) Large Vision Models (LVMs)

2024

  • DINOv2: Learning Robust Visual Features without Supervision [Paper] [Code]
  • SAM 2: Segment Anything in Images and Videos [Paper] [Code]

2023

2022

  • Point-BERT: Pre-training 3D Point Cloud Transformers with Masked Point Modeling [Paper] [Code]
  • PointNeXt: Revisiting PointNet++ with Improved Training and Scaling Strategies [Paper] [Code]
  • BEVFormer: Learning Bird’s-Eye-View Representation from Multi-Camera Images via Spatiotemporal Transformers [Paper] [Code]

2021

  • Deformable DETR: Deformable Transformers for End-to-End Object Detection [Paper] [Code]
  • Emerging Properties in Self-Supervised Vision Transformers [Paper] [Code]

2020

Backbones

  • Swin Transformer: Hierarchical Vision Transformer using Shifted Windows [Paper] [Code]
  • An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale [Paper] [Code]
  • Deep Residual Learning for Image Recognition [Paper] [Code]
(c) Large Multimodal Models (LMMs)
Vision-Language

2025

  • Perception Encoder: The best visual embeddings are not at the output of the network [Paper] [Code]

2024

  • Grounding DINO: Marrying Language and Object Detection with Transformers [Paper] [Code]
  • Grounded SAM: Assembling Open-World Models for Diverse Visual Tasks [Paper] [Code]

2021

  • ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision [Paper] [Code]
  • Unifying Vision-and-Language Tasks via Text Generation [Paper] [Code]
  • Learning Transferable Visual Models From Natural Language Supervision [Paper] [Code]
  • ALIGN: Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision [Paper]

2020

  • VL-BERT: Pre-training of Generic Visual-Linguistic Representations [Paper] [Code]
  • UNITER: UNiversal Image-TExt Representation Learning [Paper] [Code]
  • Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks [Paper] [Code]

2019

  • LXMERT: Learning Cross-Modality Encoder Representations from Transformers [Paper] [Code]
  • ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks [Paper]
Text-Audio

2023

  • Robust Speech Recognition via Large-Scale Weak Supervision [Paper] [Code]
  • AudioGen: Textually Guided Audio Generation [Paper] [Code]

2022

  • AudioCLIP: Extending CLIP to Image, Text and Audio [Paper] [Code]

2021

  • HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units [Paper] [Code]
  • VATT: Transformers for Multimodal Self-Supervised Learning from Raw Video, Audio and Text [Paper] [Code]

2020

  • wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations [Paper] [Code]
Text-Video

2024

  • Video generation models as world simulators [Paper]

2023

  • All in One: Exploring Unified Video-Language Pre-training [Paper] [Code]

2022

  • Make-A-Video: Text-to-Video Generation Without Text-Video Data [Paper]
  • NUWA: Visual Synthesis Pre-training for Neural Visual World Creation [Paper] [Code]
  • Phenaki: Variable Length Video Generation from Open-Domain Textual Description [Paper]

2021

  • VideoCLIP: Contrastive Pretraining for Zero-Shot Video-Text Understanding [Paper] [Code]
  • Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval [Paper] [Code]
Vision-Tactile-Language

2024

  • Multimodal Visual-Tactile Representation Learning through Self-Supervised Contrastive Pre-Training [Paper] [Code]
  • Binding Touch to Everything: Learning Unified Multimodal Tactile Representations [Paper] [Code]
  • A Touch, Vision, and Language Dataset for Multimodal Alignment [Paper] [Code]

2023

  • Touching a NeRF: Leveraging Neural Radiance Fields for Tactile Sensory Data Generation [Paper]

2022

  • Self-Supervised Visuo-Tactile Pretraining to Locate and Follow Garment Features [Paper]
  • Touch and Go: Learning from Human-Collected Vision and Touch [Paper] [Code]

4.1.2 Perception-Cognition in AI

(a) Large Vision-Language Models (LVLMs)

2022

  • BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation [Paper] [Code]
  • Flamingo: a Visual Language Model for Few-Shot Learning [Paper]
  • CoCa: Contrastive Captioners are Image-Text Foundation Models [Paper]

2019

  • VisualBERT: A Simple and Performant Baseline for Vision and Language [Paper] [Code]
(b) Multimodal Large Language Models (MLLMs)

2024

  • Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling [Paper] [Code]
  • GPT-4o System Card [Paper]
  • Gemini: A Family of Highly Capable Multimodal Models [Paper]
  • Visual Instruction Tuning [Paper] [Code]

2023

  • The Dawn of LMMs: Preliminary Explorations with GPT-4V(ision) [Paper]
  • Qwen Technical Report [Paper] [Code]

Benchmarks and Evaluations

  • EmbodiedEval: Evaluate Multimodal LLMs as Embodied Agents [Paper] [Code]
  • EmbodiedBench: Comprehensive Benchmarking Multi-modal Large Language Models for Vision-Driven Embodied Agents [Paper] [Code]
  • DeepPerception: Advancing R1-like Cognitive Visual Perception in MLLMs for Knowledge-Intensive Visual Grounding [Paper] [Code]
  • MME-Survey: A Comprehensive Survey on Evaluation of Multimodal LLMs [Paper] [Code]
  • PCA-Bench: Evaluating Multimodal Large Language Models in Perception-Cognition-Action Chain [Paper] [Code]
(c) Neuro-Symbolic AI

2025

  • NeSyC: A Neuro-symbolic Continual Learner For Complex Embodied Tasks In Open Domains [Paper]
  • From Understanding the World to Intervening in It: A Unified Multi-Scale Framework for Embodied Cognition [Paper]

2024

  • Can-Do! A Dataset and Neuro-Symbolic Grounded Framework for Embodied Planning with Large Multimodal Models [Paper]

2022

  • JARVIS: A Neuro-Symbolic Commonsense Reasoning Framework for Conversational Embodied Agents [Paper]

Other Relevant Pages

Page1, Page2, Page3, Page4, Page5, Page6

(d) World Models

2024

  • WALL-E: World Alignment by Rule Learning Improves World Model-based LLM Agents [Paper] [Code]
  • Grounding Large Language Models In Embodied Environment With Imperfect World Models [Paper]
  • GenRL: Multimodal-foundation world models for generalization in embodied agents [Paper] [Code]
  • AeroVerse: UAV-Agent Benchmark Suite for Simulating, Pre-training, Finetuning, and Evaluating Aerospace Embodied World Models [Paper]

2023

  • Do Embodied Agents Dream of Pixelated Sheep: Embodied Decision Making using Language Guided World Modelling [Paper])

4.1.3 Perception-Action in AI

(a) Vision-Language-Action Models (VLA)

2024

  • PIVOT: Iterative Visual Prompting Elicits Actionable Knowledge for VLMs [Paper] [Code]
  • ReKep: Spatio-Temporal Reasoning of Relational Keypoint Constraints for Robotic Manipulation [Paper] [Code]
  • DP3: 3D Diffusion Policy: Generalizable Visuomotor Policy Learning via Simple 3D Representations [Paper] [Code]
  • RDT-1B: a Diffusion Foundation Model for Bimanual Manipulation [Paper] [Code]
  • MBA: Motion Before Action: Diffusing Object Motion as Manipulation Condition [Paper] [Code]
  • RVT-2: Learning Precise Manipulation from Few Demonstrations [Paper] [Code]

2023

  • DP: Diffusion Policy: Visuomotor Policy Learning via Action Diffusion [Paper] [Code]
  • RVT: Robotic View Transformer for 3D Object Manipulation [Paper] [Code]
  • ACT: Learning finegrained bimanual manipulation with low-cost hardware [Paper] [Code]
  • Q-Transformer: Scalable Offline Reinforcement Learning via Autoregressive Q-Functions [Paper] [Code]

2022

  • RT-1: Robotics Transformer for Real-World Control at Scale [Paper] [Code]
  • BC-Z: Zero-Shot Task Generalization with Robotic Imitation Learning [Paper] [Code]
  • CLIPort: What and Where Pathways for Robotic Manipulation [Paper] [Code]

2021

  • Transporter Networks: Rearranging the Visual World for Robotic Manipulation [Paper] [Code]

2020

  • MCIL: Language Conditioned Imitation Learning over Unstructured Data [Paper] [Code]
(b) Vision-Language-Navigation Models (VLN)

2023

  • BEVBert: Multimodal Map Pre-training for Language-guided Navigation [Paper] [Code]

2022

  • CM2: Cross-modal Map Learning for Vision and Language Navigation [Paper] [Code]
  • Talk2nav: Talk2Nav: Long-Range Vision-and-Language Navigation with Dual Attention and Spatial Memory [Paper] [Code]
  • Waypoints Predictor: Bridging the Gap Between Learning in Discrete and Continuous Environments for Vision-and-Language Navigation [Paper] [Code]

2021

  • HAMT: History Aware Multimodal Transformer for Vision-and-Language Navigation [Paper] [Code]
  • WPN: Waypoint Models for Instruction-guided Navigation in Continuous Environments [Paper] [Code]
  • Sim-to-Real: Sim-to-Real Transfer for Vision-and-Language Navigation [Paper] [Code]

2018

  • Speaker-Follower: Speaker-Follower Models for Vision-and-Language Navigation [Paper] [Code]
  • VLN-Graph: Vision-and-Language Navigation: Interpreting visually-grounded navigation instructions in real environments [Paper] [Code]

4.1.4 Perception-Cognition-Action in AI

(a) Vision-Language-Action Models (VLA)

2025

2024

  • Ď€0: A Vision-Language-Action Flow Model for General Robot Control [Paper] [Code]
  • RT-H: Action Hierarchies Using Language [Paper] [Code]
  • RT-X: Open X-Embodiment: Robotic Learning Datasets and RT-X Models [Paper] [Code]
  • OpenVLA: An Open-Source Vision-Language-Action Model [Paper] [Code]

2023

  • RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control [Paper] [Code]
(b) Vision-Language-Navigation Models (VLN)

2024

  • MapGPT: Map-Guided Prompting with Adaptive Path Planning for Vision-and-Language Navigation [Paper] [Code]
  • InstructNav: Zero-shot System for Generic Instruction Navigation in Unexplored Environment [Paper] [Code]
  • NavGPT: Explicit Reasoning in Vision-and-Language Navigation with Large Language Models [Paper] [Code]
  • NavGPT-2: Unleashing Navigational Reasoning Capability for Large Vision-Language Models [Paper] [Code]
  • NaVid: Video-based VLM Plans the Next Step for Vision-and-Language Navigation [Paper] [Code]

2022

  • CLIP-Nav: Using CLIP for Zero-Shot Vision-and-Language Navigation [Paper]

4.2. Remarks and Discussions

References
  • Multimodal Machine Learning: A Survey and Taxonomy [Paper]
  • Multimodal Interaction, Interfaces, and Communication: A Survey [Paper]
  • Measuring Domain Shift for Deep Learning in Histopathology [Paper]
  • Generalized Out-of-Distribution Detection: A Survey [Paper]

5. Neural Brain Memory Storage and Update

5.1 Embodied Agent Knowledge Storage and Update

5.1.1 Memory Architectures for Embodied Agents

(a) Neural Memory Systems

2025

  • Mem2Ego: Empowering Vision-Language Models with Global-to-Ego Memory for Long-Horizon Embodied Navigation [Paper]
  • MINDSTORES: Memory-Informed Neural Decision Synthesis for Task-Oriented Reinforcement in Embodied Systems [Paper]

2024

  • Embodied VideoAgent: Persistent Memory from Egocentric Videos and Embodied Sensors Enables Dynamic Scene Understanding [Paper]
  • KARMA: Augmenting Embodied AI Agents with Long-and-short Term Memory Systems [Paper]
  • Skip-SCAR: Hardware-Friendly High-Quality Embodied Visual Navigation [Paper]

2021

2020

  • Distributed Associative Memory Network with Memory Refreshing Loss [Paper] [Code]

2014

(b) Structured and Symbolic Memory

2025

  • LLM-Powered Decentralized Generative Agents with Adaptive Hierarchical Knowledge Graph for Cooperative Planning [Paper] [Code]
  • AdaptBot: Combining LLM with Knowledge Graphs and Human Input for Generic-to-Specific Task Decomposition and Knowledge Refinement [Paper] [Code]
  • EmbodiedVSR: Dynamic Scene Graph-Guided Chain-of-Thought Reasoning for Visual Spatial Tasks [Paper]

2024

  • Scene-Driven Multimodal Knowledge Graph Construction for Embodied AI [Paper] [Code]
  • Aligning Knowledge Graph with Visual Perception for Object-goal Navigation [Paper] [Code]
  • Safety Control of Service Robots with LLMs and Embodied Knowledge Graphs [Paper]
  • ESGNN: Towards Equivariant Scene Graph Neural Network for 3D Scene Understanding [Paper]
  • 3DGraphLLM: Combining Semantic Graphs and Large Language Models for 3D Scene Understanding [Paper] [Code]
  • Embodied-RAG: General Non-Parametric Embodied Memory for Retrieval and Generation [Paper]

2023

  • SGRec3D: Self-Supervised 3D Scene Graph Learning via Object-Level Scene Reconstruction [Paper]
  • Modeling Dynamic Environments with Scene Graph Memory [Paper] [Code]
  • Structure-CLIP: Towards Scene Graph Knowledge to Enhance Multi-modal Structured Representations [Paper] [Code]
(c) Spatial and Episodic Memory

2025

  • STMA: A Spatio-Temporal Memory Agent for Long-Horizon Embodied Task Planning [Paper]

2024

  • Spatially-Aware Transformer for Embodied Agents [Paper] [Code]
  • 3D-Mem: 3D Scene Memory for Embodied Exploration and Reasoning [Paper] [Code]
  • Planning from Imagination: Episodic Simulation and Episodic Memory for Vision-and-Language Navigation [Paper]

5.1.2 Knowledge Update Mechanisms

(a) Adaptive Learning Over Time

2025

  • NeSyC: A Neuro-symbolic Continual Learner For Complex Embodied Tasks In Open Domains [Paper]
  • Active Learning for Continual Learning: Keeping the Past Alive in the Present [Paper]

2024

  • Online Continual Learning For Interactive Instruction Following Agents [Paper] [Code]

2023

  • Voyager: An Open-Ended Embodied Agent with Large Language Models [Paper] [Code]
  • Embodied Lifelong Learning for Task and Motion Planning [Paper]
  • Fast-Slow Test-Time Adaptation for Online Vision-and-Language Navigation [Paper] [Code]
  • Building Open-Ended Embodied Agent via Language-Policy Bidirectional Adaptation [Paper]

2021

  • AFEC: Active Forgetting of Negative Transfer in Continual Learning [Paper] [Code]
(b) Self-Guided and Efficient Learning

2025

  • DRESS: Disentangled Representation-based Self-Supervised Meta-Learning for Diverse Tasks [Paper] [Code]
  • ReMA: Learning to Meta-Think for LLMs with Multi-Agent Reinforcement Learning [Paper]

2024

  • Self-Supervised Meta-Learning for All-Layer DNN-Based Adaptive Control with Stability Guarantees [Paper]

2023

  • Modality-Agnostic Self-Supervised Learning with Meta-Learned Masked Auto-Encoder [Paper] [Code]
  • Unleash Model Potential: Bootstrapped Meta Self-Supervised Learning [Paper]

2022

  • Multimodal Masked Autoencoders Learn Transferable Representations [Paper] [Code]
(c) Multimodal Integration and Knowledge Fusion

2024

  • UniCL: A Universal Contrastive Learning Framework for Large Time Series Models [Paper]
  • Binding Touch to Everything: Learning Unified Multimodal Tactile Representations [Paper] [Code]

2023

  • Meta-Transformer: A Unified Framework for Multimodal Learning [Paper] [Code]
  • MM-ReAct: Prompting ChatGPT for Multimodal Reasoning and Action [Paper] [Code]

2022

  • Uni-Perceiver v2: A Generalist Model for Large-Scale Vision and Vision-Language Tasks [Paper] [Code]
  • General-Purpose, Long-Context Autoregressive Modeling with Perceiver AR [Paper] [Code]

5.2. Remarks and Discussions

References
  • Lifelong Learning of Large Language Model based Agents: A Roadmap [Paper] [Code]
  • Aligning Cyber Space with Physical World: A Comprehensive Survey on Embodied AI [Paper] [Code]
  • MEIA: Multimodal Embodied Perception and Interaction in Unknown Environments [Paper]

6. Neural Brain Hardware and Software

6.1. Embodied Agent Hardware & Software

6.1.1. Neuromorphic Hardware
  • Loihi: A neuromorphic manycore processor with on-chip learning [Paper]

  • Neuronflow: A hybrid neuromorphic--dataflow processor architecture for AI workloads [Paper]

  • Efficient neuromorphic signal processing with loihi 2 [Paper]

  • The BrainScaleS-2 accelerated neuromorphic system with hybrid plasticity [Paper]

  • Neuromorphic artificial intelligence systems [Paper]

  • Speck: A smart event-based vision sensor with a low latency 327k neuron convolutional neuronal network processing pipeline [Paper]

  • Prime: A novel processing-in-memory architecture for neural network computation in reram-based main memory [Paper]

  • NeuroSim: A circuit-level macro model for benchmarking neuro-inspired architectures in online learning [Paper]

  • A phase-change memory model for neuromorphic computing [Paper]

  • PUMA: A programmable ultra-efficient memristor-based accelerator for machine learning inference [Paper]

  • ODIN: A bit-parallel stochastic arithmetic based accelerator for in-situ neural network processing in phase change RAM [Paper]

  • Lighton optical processing unit: Scaling-up AI and HPC with a non von neumann co-processor [Paper]

  • An on-chip photonic deep neural network for image classification [Paper]

  • Quantum reservoir computing in finite dimensions [Paper]

  • Theoretical error performance analysis for variational quantum circuit based functional regression [Paper]

6.1.2. Software Frameworks for Neural Brains
  • Speaker-follower models for vision-and-language navigation [Paper]

  • AudioCLIP: Extending CLIP to Image, Text and Audio [Paper]

  • VLN BERT: A recurrent vision-and-language bert for navigation [Paper]

  • Blip-2: Bootstrapping language-image pre-training with frozen image encoders and large language models [Paper]

  • GPT-4 technical report [Paper]

  • Palm 2 technical report [Paper]

  • Qwen technical report [Paper]

  • Rt-2: Vision-language-action models transfer web knowledge to robotic control [Paper]

  • Deepseek-V3 technical report [Paper]

  • NEST: A network simulation and prototyping testbed [Paper]

  • Bindsnet: A machine learning-oriented spiking neural networks library in python [Paper]

  • Brian 2, an intuitive and efficient neural simulator [Paper]

  • Spinnaker-a spiking neural network architecture [Paper]

  • Norse-A deep learning library for spiking neural networks [Paper]

  • Tensorrt inference with tensorflow [Paper]

  • Compiling onnx neural network models using mlir [Paper]

  • Impact of thermal throttling on long-term visual inference in a CPU-based edge device [Paper]

  • Comparison and benchmarking of AI models and frameworks on mobile devices [Paper]

  • Tensorflow lite micro: Embedded machine learning for tinyml systems [Paper]

6.1.3. Energy-Efficient Learning at the Edge
  • Sparse convolutional neural networks [Paper]

  • Training sparse neural networks [Paper]

  • SCNN: An accelerator for compressed-sparse convolutional neural networks [Paper]

  • Sparse computation in adaptive spiking neural networks [Paper]

  • Sbnet: Sparse blocks network for fast inference [Paper]

  • Big bird: Transformers for longer sequences [Paper]

  • Glam: Efficient scaling of language models with mixture-of-experts [Paper]

  • Switch transformers: Scaling to trillion parameter models with simple and efficient sparsity [Paper]

  • Base layers: Simplifying training of large, sparse models [Paper]

  • Distilling the knowledge in a neural network [Paper]

  • Quantization and training of neural networks for efficient integer-arithmetic-only inference [Paper]

  • STDP-based pruning of connections and weight quantization in spiking neural networks for energy-efficient recognition [Paper]

  • Quantization networks [Paper]

  • Quantization framework for fast spiking neural networks [Paper]

  • Efficient neural networks for edge devices [Paper]

  • A million spiking-neuron integrated circuit with a scalable communication network and interface [Paper]

  • Eyeriss: A spatial architecture for energy-efficient dataflow for convolutional neural networks [Paper]

  • Maeri: Enabling flexible dataflow mapping over DNN accelerators via reconfigurable interconnects [Paper]

  • DNNBuilder: An automated tool for building high-performance DNN hardware accelerators for FPGAs [Paper]

  • TVM: An automated End-to-End optimizing compiler for deep learning [Paper]

  • SECDA: Efficient hardware/software co-design of FPGA-based DNN accelerators for edge inference [Paper]

6.2. Remarks and Discussions

References
  • Challenges for large-scale implementations of spiking neural networks on FPGAs [Paper]
  • Progress and Challenges in Large Scale Spiking Neural Networks for AI and Neuroscience [Paper]
  • A review on methods, issues and challenges in neuromorphic engineering [Paper]
  • Real-Time Neuromorphic Navigation: Guiding Physical Robots with Event-Based Sensing and Task-Specific Reconfigurable Autonomy Stack [Paper]

Citation

If you find the paper useful, please consider citing our paper.

@article{2025neuralbrain,
  title={Neural Brain: A Neuroscience-inspired Framework for Embodied Agents},
  author={Liu, Jian and Shi, Xiongtao and Nguyen, Thai and Zhang, Haitian and Zhang, Tianxiang and Sun, Wei and Li, Yanjie and Vasilakos, Athanasios and Iacca, Giovanni and Khan, Arshad and others},
  journal={arXiv preprint arXiv:2505.07634},
  year={2025}
}

Contact

Due to the one-sided nature of our knowledge, if you find any issues or have any suggestions, please feel free to post an issue or contact us via email

About

Project Page for Paper "Neural Brain: A Neuroscience-inspired Framework for Embodied Agents".

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •