You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/_sphinx_src/examples.action_prior.rst
+6Lines changed: 6 additions & 0 deletions
Original file line number
Diff line number
Diff line change
@@ -46,3 +46,9 @@ In general, you could express a prior over the action distribution
46
46
explicitly through the :code:`sample` and :code:`rollout` function in
47
47
:py:mod:`~pomdp_py.framework.basics.PolicyModel`. Refer to the `Tiger <https://h2r.github.io/pomdp-py/html/examples.tiger.html#:~:text=e.g.%20continuous).-,Next,-%2C%20we%20define%20the>`_
48
48
tutorial for more details (the paragraph on PolicyModel).
49
+
50
+
As described in :cite:`silver2010monte`, you could choose to set an initial visit count and initial value corresponding
51
+
to a preferred action; To take this into account during POMDP planning using POUCT or POMCP,
52
+
you need to supply the :py:mod:`~pomdp_py.algorithms.po_uct.ActionPrior` object
53
+
when you initialize the :py:mod:`~pomdp_py.algorithms.po_uct.POUCT`
54
+
or :py:mod:`~pomdp_py.algorithms.pomcp.POMCP` objects through the :code:`action_prior` argument.
Copy file name to clipboardExpand all lines: docs/html/_sources/examples.action_prior.rst.txt
+6Lines changed: 6 additions & 0 deletions
Original file line number
Diff line number
Diff line change
@@ -46,3 +46,9 @@ In general, you could express a prior over the action distribution
46
46
explicitly through the :code:`sample` and :code:`rollout` function in
47
47
:py:mod:`~pomdp_py.framework.basics.PolicyModel`. Refer to the `Tiger <https://h2r.github.io/pomdp-py/html/examples.tiger.html#:~:text=e.g.%20continuous).-,Next,-%2C%20we%20define%20the>`_
48
48
tutorial for more details (the paragraph on PolicyModel).
49
+
50
+
As described in :cite:`silver2010monte`, you could choose to set an initial visit count and initial value corresponding
51
+
to a preferred action; To take this into account during POMDP planning using POUCT or POMCP,
52
+
you need to supply the :py:mod:`~pomdp_py.algorithms.po_uct.ActionPrior` object
53
+
when you initialize the :py:mod:`~pomdp_py.algorithms.po_uct.POUCT`
54
+
or :py:mod:`~pomdp_py.algorithms.pomcp.POMCP` objects through the :code:`action_prior` argument.
<td><p>POUCT (Partially Observable UCT) <aclass="bibtex reference internal" href="#silver2010monte" id="id1">[1]</a> is presented in the POMCP paper as an extension of the UCT algorithm to partially-observable domains that combines MCTS and UCB1 for action selection.</p></td>
164
+
<td><p>POUCT (Partially Observable UCT) <aclass="bibtex reference internal" href="../examples.tiger.html#silver2010monte" id="id1">[1]</a> is presented in the POMCP paper as an extension of the UCT algorithm to partially-observable domains that combines MCTS and UCB1 for action selection.</p></td>
<spanid="pomdp-py-algorithms-po-rollout-module"></span><h2>pomdp_py.algorithms.po_rollout module<aclass="headerlink" href="#module-pomdp_py.algorithms.po_rollout" title="Permalink to this headline">¶</a></h2>
188
-
<p>PO-rollout: Baseline algorithm in the POMCP paper <aclass="bibtex reference internal" href="#silver2010monte" id="id3">[1]</a>.</p>
188
+
<p>PO-rollout: Baseline algorithm in the POMCP paper <aclass="bibtex reference internal" href="../examples.tiger.html#silver2010monte" id="id3">[1]</a>.</p>
189
189
<p>Quote from the POMCP paper:</p>
190
190
<blockquote>
191
191
<div><p>To provide a performance benchmark in these cases, we evaluated the
<spanid="pomdp-py-algorithms-po-uct-module"></span><h2>pomdp_py.algorithms.po_uct module<aclass="headerlink" href="#module-pomdp_py.algorithms.po_uct" title="Permalink to this headline">¶</a></h2>
242
242
<p>This algorithm is PO-UCT (Partially Observable UCT). It is
243
-
presented in the POMCP paper <aclass="bibtex reference internal" href="#silver2010monte" id="id4">[1]</a> as an extension to the UCT
243
+
presented in the POMCP paper <aclass="bibtex reference internal" href="../examples.tiger.html#silver2010monte" id="id4">[1]</a> as an extension to the UCT
244
244
algorithm <aclass="bibtex reference internal" href="#kocsis2006bandit" id="id5">[3]</a> that combines MCTS and UCB1
245
245
for action selection.</p>
246
246
<p>In other words, this is just POMCP without particle belief,
<emclass="property"><spanclass="pre">class</span><spanclass="w"></span></em><spanclass="sig-prename descclassname"><spanclass="pre">pomdp_py.algorithms.po_uct.</span></span><spanclass="sig-name descname"><spanclass="pre">POUCT</span></span><aclass="headerlink" href="#pomdp_py.algorithms.po_uct.POUCT" title="Permalink to this definition">¶</a></dt>
<p>POUCT (Partially Observable UCT) <aclass="bibtex reference internal" href="#silver2010monte" id="id9">[1]</a> is presented in the POMCP
300
+
<p>POUCT (Partially Observable UCT) <aclass="bibtex reference internal" href="../examples.tiger.html#silver2010monte" id="id9">[1]</a> is presented in the POMCP
301
301
paper as an extension of the UCT algorithm to partially-observable domains
302
302
that combines MCTS and UCB1 for action selection.</p>
303
303
<p>POUCT only works for problems with action space that can be enumerated.</p>
<spanid="pomdp-py-algorithms-value-iteration-module"></span><h2>pomdp_py.algorithms.value_iteration module<aclass="headerlink" href="#module-pomdp_py.algorithms.value_iteration" title="Permalink to this headline">¶</a></h2>
544
544
<p>Implementation of the basic policy tree based value iteration as explained
545
545
in section 4.1 of <cite>Planning and acting in partially observable stochastic
<dd><p>António Gusmao and Tapani Raiko. Towards generalizing the success of monte-carlo tree search beyond the game of go. In <em>ECAI</em>, 384–389. 2012.</p>
<dd><p>Leslie Pack Kaelbling, Michael L Littman, and Anthony R Cassandra. Planning and acting in partially observable stochastic domains. <em>Artificial intelligence</em>, 101(1-2):99–134, 1998.</p>
that supports a rollout policy based on preference-based action prior <aclass="bibtex reference internal" href="examples.tiger.html#silver2010monte" id="id1">[1]</a>.
122
+
that supports a rollout policy based on preference-based action prior <aclass="bibtex reference internal" href="api/pomdp_py.algorithms.html#silver2010monte" id="id1">[1]</a>.
<p>Note that the notion of “action prior” here is narrow; It
158
-
follows the original POMCP paper <aclass="bibtex reference internal" href="examples.tiger.html#silver2010monte" id="id2">[1]</a>.
158
+
follows the original POMCP paper <aclass="bibtex reference internal" href="api/pomdp_py.algorithms.html#silver2010monte" id="id2">[1]</a>.
159
159
In general, you could express a prior over the action distribution
160
160
explicitly through the <codeclass="code docutils literal notranslate"><spanclass="pre">sample</span></code> and <codeclass="code docutils literal notranslate"><spanclass="pre">rollout</span></code> function in
161
161
<aclass="reference internal" href="api/pomdp_py.framework.html#pomdp_py.framework.basics.PolicyModel" title="pomdp_py.framework.basics.PolicyModel"><codeclass="xref py py-mod docutils literal notranslate"><spanclass="pre">PolicyModel</span></code></a>. Refer to the <aclass="reference external" href="https://h2r.github.io/pomdp-py/html/examples.tiger.html#:~:text=e.g.%20continuous).-,Next,-%2C%20we%20define%20the">Tiger</a>
162
162
tutorial for more details (the paragraph on PolicyModel).</p>
163
+
<p>As described in <aclass="bibtex reference internal" href="api/pomdp_py.algorithms.html#silver2010monte" id="id3">[1]</a>, you could choose to set an initial visit count and initial value corresponding
164
+
to a preferred action; To take this into account during POMDP planning using POUCT or POMCP,
165
+
you need to supply the <aclass="reference internal" href="api/pomdp_py.algorithms.html#pomdp_py.algorithms.po_uct.ActionPrior" title="pomdp_py.algorithms.po_uct.ActionPrior"><codeclass="xref py py-mod docutils literal notranslate"><spanclass="pre">ActionPrior</span></code></a> object
166
+
when you initialize the <aclass="reference internal" href="api/pomdp_py.algorithms.html#pomdp_py.algorithms.po_uct.POUCT" title="pomdp_py.algorithms.po_uct.POUCT"><codeclass="xref py py-mod docutils literal notranslate"><spanclass="pre">POUCT</span></code></a>
167
+
or <aclass="reference internal" href="api/pomdp_py.algorithms.html#pomdp_py.algorithms.pomcp.POMCP" title="pomdp_py.algorithms.pomcp.POMCP"><codeclass="xref py py-mod docutils literal notranslate"><spanclass="pre">POMCP</span></code></a> objects through the <codeclass="code docutils literal notranslate"><spanclass="pre">action_prior</span></code> argument.</p>
<h1>Tiger<aclass="headerlink" href="#tiger" title="Permalink to this headline">¶</a></h1>
146
-
<p>This is a classic POMDP problem, introduced in <aclass="bibtex reference internal" href="index.html#kaelbling1998planning" id="id1">[7]</a>. The description of the tiger problem is as follows: (Quote from <aclass="reference external" href="https://cran.r-project.org/web/packages/pomdp/vignettes/POMDP.pdf">POMDP:
146
+
<p>This is a classic POMDP problem, introduced in <aclass="bibtex reference internal" href="#kaelbling1998planning" id="id1">[1]</a>. The description of the tiger problem is as follows: (Quote from <aclass="reference external" href="https://cran.r-project.org/web/packages/pomdp/vignettes/POMDP.pdf">POMDP:
147
147
Introduction to Partially Observable Markov Decision Processes</a> by
148
148
Kamalzadeh and Hahsler ):</p>
149
149
<p><cite>A tiger is put with equal probability behind one
@@ -327,7 +327,7 @@ <h1>Tiger<a class="headerlink" href="#tiger" title="Permalink to this headline">
327
327
provides a way to inject problem-specific action prior
328
328
to POMDP planning; pomdp_py allows the user to do this through
<dd><p>David Silver and Joel Veness. Monte-carlo planning in large pomdps. In <em>Advances in neural information processing systems</em>, 2164–2172. 2010.</p>
<dd><p>Leslie Pack Kaelbling, Michael L Littman, and Anthony R Cassandra. Planning and acting in partially observable stochastic domains. <em>Artificial intelligence</em>, 101(1-2):99–134, 1998.</p>
Copy file name to clipboardExpand all lines: docs/html/index.html
+2-2Lines changed: 2 additions & 2 deletions
Original file line number
Diff line number
Diff line change
@@ -138,7 +138,7 @@ <h1>pomdp_py Documentation<a class="headerlink" href="#pomdp-py-documentation" t
138
138
<h2>Overview<aclass="headerlink" href="#overview" title="Permalink to this headline">¶</a></h2>
139
139
<p><aclass="reference external" href="https://github.com/h2r/pomdp-py">pomdp_py</a> is a <strong>general purpose POMDP library</strong> written in Python and Cython. It features simple and comprehensive interfaces to describe POMDP or MDP problems. Originally written to support POMDP planning research, the interfaces also allow extensions to model-free or model-based learning in (PO)MDPs, multi-agent POMDP planning/learning, and task transfer or transfer learning.</p>
140
140
<p><strong>Why pomdp_py?</strong> It provides a POMDP framework in Python with clean and intuitive interfaces. This makes POMDP-related research or projects accessible to more people. It also helps sharing code and developing a community.</p>
<p>The code is available <aclass="reference external" href="https://github.com/h2r/pomdp-py">on github</a>. We welcome contributions to this library in:</p>
143
143
<olclass="arabic simple">
144
144
<li><p>Implementation of additional POMDP solvers (see <aclass="reference internal" href="existing_solvers.html"><spanclass="doc">Existing POMDP Solvers</span></a>)</p></li>
@@ -228,7 +228,7 @@ <h2>Tools<a class="headerlink" href="#tools" title="Permalink to this headline">
<dd><p>Leslie Pack Kaelbling, Michael L Littman, and Anthony R Cassandra. Planning and acting in partially observable stochastic domains. <em>Artificial intelligence</em>, 101(1-2):99–134, 1998.</p>
<spanclass="sig-name descname"><spanclass="pre">ACTIONS</span></span><emclass="property"><spanclass="w"></span><spanclass="p"><spanclass="pre">=</span></span><spanclass="w"></span><spanclass="pre">{TigerAction(open-left),</span><spanclass="pre">TigerAction(open-right),</span><spanclass="pre">TigerAction(listen)}</span></em><aclass="headerlink" href="#pomdp_problems.tiger.cythonize.tiger_problem.TigerProblem.ACTIONS" title="Permalink to this definition">¶</a></dt>
350
+
<spanclass="sig-name descname"><spanclass="pre">ACTIONS</span></span><emclass="property"><spanclass="w"></span><spanclass="p"><spanclass="pre">=</span></span><spanclass="w"></span><spanclass="pre">{TigerAction(open-right),</span><spanclass="pre">TigerAction(open-left),</span><spanclass="pre">TigerAction(listen)}</span></em><aclass="headerlink" href="#pomdp_problems.tiger.cythonize.tiger_problem.TigerProblem.ACTIONS" title="Permalink to this definition">¶</a></dt>
0 commit comments