Welcome to the mesmerizing world, This project extends the exploration to understand how ants generate stable structures in their environment, reducing cognitive load while foraging. It utilizes the Q-learning algorithm to model this behavior and mirrors the intricate dance of ants in forming epistemic structures.
- Ants strategically deploy and follow pheromone trails as memory aids.
- Random exploration proves valuable for discovering new, rewarding areas.
- Maintaining an optimal exploration rate is pivotal for balancing exploitation vs. exploration.
- Project Overview
- Agents
- World
- Q-Learning Algorithm
- Experimental Conditions
- Analysis
- Results
- Conclusion
- Future Work
- Acknowledgements
Embark on a journey to uncover the mysteries of exploration strategies in reinforcement learning for multi-agent foraging. ๐ต๏ธโโ๏ธ Agents employ Q-learning to strategically lay pheromone markers, creating a web of guidance for their peers. What secrets lie within these interactions? Let's find out! ๐
Meet our agents, the unsung heroes of this exploration! ๐ Each agent, driven by the spirit of ants, detects home and food pheromones, making decisions to drop, follow, or move randomly. ๐ค๐จ Their independent learning through Q-learning without communication adds a layer of complexity to this multi-agent foraging system.
Picture a vast 50x50 grid adorned with randomly scattered homes and food sources, echoing the complexity of an ant colony. ๐๐ Agents, starting from their homes, embark on a quest to efficiently navigate to food using pheromones as their guiding light.
Witness the power of Q-learning as agents independently learn optimal policies, updating Q-values based on rewards and discounts from each step.
We throw our agents into the crucible of experimentation, testing four main conditions with variations in exploration rates and a daring strategy of replacing the worst-performing agent each iteration. Will the chaos yield surprising revelations? ๐คฏ๐ค
Let's dissect the intricate details! A meticulous analysis awaits on the distribution of actions, successful trips, Pearson correlations, and state/action transition matrices. ๐โจ Unraveling the patterns within the chaos, we seek to understand the underlying dynamics.
Surprising revelations emerge! Contrary to expectations, random actions surpass systematic pheromone actions. The Q-table functions more as condensed experience than a decision support. Condition 4, with agent replacement, emerges victorious. The unexpected is the new norm! ๐๐
In the grand finale, we unveil the revelation that the Q-table functions as an experience memory in a stochastic world filled with other agents. Random exploration becomes the unsung hero, aiding agents in effectively navigating their surroundings. ๐คฏ๐
As we bid adieu to this chapter, the path ahead beckons. Future work holds promises of larger worlds, non-grid environments, decentralized learning methods, and continuous action spaces. The journey never truly ends! ๐๐
A heartfelt thank you to our guiding light, the advisor Prof. Sanjay Chandrasekharan, HBCSE, TIFR, MUMBAI, INDIA. Whose wisdom illuminated the path of this exploration! ๐๐