Skip to content

Commit 8efc866

Browse files
committed
adding a bit of doc on using action prior
1 parent 02a36d0 commit 8efc866

11 files changed

+34
-17
lines changed

docs/_sphinx_src/examples.action_prior.rst

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -46,3 +46,9 @@ In general, you could express a prior over the action distribution
4646
explicitly through the :code:`sample` and :code:`rollout` function in
4747
:py:mod:`~pomdp_py.framework.basics.PolicyModel`. Refer to the `Tiger <https://h2r.github.io/pomdp-py/html/examples.tiger.html#:~:text=e.g.%20continuous).-,Next,-%2C%20we%20define%20the>`_
4848
tutorial for more details (the paragraph on PolicyModel).
49+
50+
As described in :cite:`silver2010monte`, you could choose to set an initial visit count and initial value corresponding
51+
to a preferred action; To take this into account during POMDP planning using POUCT or POMCP,
52+
you need to supply the :py:mod:`~pomdp_py.algorithms.po_uct.ActionPrior` object
53+
when you initialize the :py:mod:`~pomdp_py.algorithms.po_uct.POUCT`
54+
or :py:mod:`~pomdp_py.algorithms.pomcp.POMCP` objects through the :code:`action_prior` argument.

docs/_sphinx_src/examples.tiger.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -213,7 +213,7 @@ this policy model through the :code:`sample` function.
213213
provides a way to inject problem-specific action prior
214214
to POMDP planning; pomdp_py allows the user to do this through
215215
defining :py:mod:`~pomdp_py.algorithms.po_uct.ActionPrior`.
216-
See :doc:`examples.action_prior` for an example.
216+
See :doc:`examples.action_prior` for details.
217217

218218
.. note::
219219

docs/html/_sources/examples.action_prior.rst.txt

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -46,3 +46,9 @@ In general, you could express a prior over the action distribution
4646
explicitly through the :code:`sample` and :code:`rollout` function in
4747
:py:mod:`~pomdp_py.framework.basics.PolicyModel`. Refer to the `Tiger <https://h2r.github.io/pomdp-py/html/examples.tiger.html#:~:text=e.g.%20continuous).-,Next,-%2C%20we%20define%20the>`_
4848
tutorial for more details (the paragraph on PolicyModel).
49+
50+
As described in :cite:`silver2010monte`, you could choose to set an initial visit count and initial value corresponding
51+
to a preferred action; To take this into account during POMDP planning using POUCT or POMCP,
52+
you need to supply the :py:mod:`~pomdp_py.algorithms.po_uct.ActionPrior` object
53+
when you initialize the :py:mod:`~pomdp_py.algorithms.po_uct.POUCT`
54+
or :py:mod:`~pomdp_py.algorithms.pomcp.POMCP` objects through the :code:`action_prior` argument.

docs/html/_sources/examples.tiger.rst.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -213,7 +213,7 @@ this policy model through the :code:`sample` function.
213213
provides a way to inject problem-specific action prior
214214
to POMDP planning; pomdp_py allows the user to do this through
215215
defining :py:mod:`~pomdp_py.algorithms.po_uct.ActionPrior`.
216-
See :doc:`examples.action_prior` for an example.
216+
See :doc:`examples.action_prior` for details.
217217

218218
.. note::
219219

docs/html/api/pomdp_py.algorithms.html

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -161,7 +161,7 @@ <h1>pomdp_py.algorithms package<a class="headerlink" href="#pomdp-py-algorithms-
161161
<td><p>PO-rollout: Baseline algorithm in the POMCP paper</p></td>
162162
</tr>
163163
<tr class="row-even"><td><p><a class="reference internal" href="#pomdp_py.algorithms.po_uct.POUCT" title="pomdp_py.algorithms.po_uct.POUCT"><code class="xref py py-obj docutils literal notranslate"><span class="pre">POUCT</span></code></a></p></td>
164-
<td><p>POUCT (Partially Observable UCT) <a class="bibtex reference internal" href="#silver2010monte" id="id1">[1]</a> is presented in the POMCP paper as an extension of the UCT algorithm to partially-observable domains that combines MCTS and UCB1 for action selection.</p></td>
164+
<td><p>POUCT (Partially Observable UCT) <a class="bibtex reference internal" href="../examples.tiger.html#silver2010monte" id="id1">[1]</a> is presented in the POMCP paper as an extension of the UCT algorithm to partially-observable domains that combines MCTS and UCB1 for action selection.</p></td>
165165
</tr>
166166
<tr class="row-odd"><td><p><a class="reference internal" href="#pomdp_py.algorithms.pomcp.POMCP" title="pomdp_py.algorithms.pomcp.POMCP"><code class="xref py py-obj docutils literal notranslate"><span class="pre">POMCP</span></code></a></p></td>
167167
<td><p>POMCP is POUCT + particle belief representation.</p></td>
@@ -185,7 +185,7 @@ <h1>pomdp_py.algorithms package<a class="headerlink" href="#pomdp-py-algorithms-
185185
</table>
186186
<section id="module-pomdp_py.algorithms.po_rollout">
187187
<span id="pomdp-py-algorithms-po-rollout-module"></span><h2>pomdp_py.algorithms.po_rollout module<a class="headerlink" href="#module-pomdp_py.algorithms.po_rollout" title="Permalink to this headline"></a></h2>
188-
<p>PO-rollout: Baseline algorithm in the POMCP paper <a class="bibtex reference internal" href="#silver2010monte" id="id3">[1]</a>.</p>
188+
<p>PO-rollout: Baseline algorithm in the POMCP paper <a class="bibtex reference internal" href="../examples.tiger.html#silver2010monte" id="id3">[1]</a>.</p>
189189
<p>Quote from the POMCP paper:</p>
190190
<blockquote>
191191
<div><p>To provide a performance benchmark in these cases, we evaluated the
@@ -240,7 +240,7 @@ <h1>pomdp_py.algorithms package<a class="headerlink" href="#pomdp-py-algorithms-
240240
<section id="module-pomdp_py.algorithms.po_uct">
241241
<span id="pomdp-py-algorithms-po-uct-module"></span><h2>pomdp_py.algorithms.po_uct module<a class="headerlink" href="#module-pomdp_py.algorithms.po_uct" title="Permalink to this headline"></a></h2>
242242
<p>This algorithm is PO-UCT (Partially Observable UCT). It is
243-
presented in the POMCP paper <a class="bibtex reference internal" href="#silver2010monte" id="id4">[1]</a> as an extension to the UCT
243+
presented in the POMCP paper <a class="bibtex reference internal" href="../examples.tiger.html#silver2010monte" id="id4">[1]</a> as an extension to the UCT
244244
algorithm <a class="bibtex reference internal" href="#kocsis2006bandit" id="id5">[3]</a> that combines MCTS and UCB1
245245
for action selection.</p>
246246
<p>In other words, this is just POMCP without particle belief,
@@ -297,7 +297,7 @@ <h1>pomdp_py.algorithms package<a class="headerlink" href="#pomdp-py-algorithms-
297297
<dt class="sig sig-object py" id="pomdp_py.algorithms.po_uct.POUCT">
298298
<em class="property"><span class="pre">class</span><span class="w"> </span></em><span class="sig-prename descclassname"><span class="pre">pomdp_py.algorithms.po_uct.</span></span><span class="sig-name descname"><span class="pre">POUCT</span></span><a class="headerlink" href="#pomdp_py.algorithms.po_uct.POUCT" title="Permalink to this definition"></a></dt>
299299
<dd><p>Bases: <a class="reference internal" href="pomdp_py.framework.html#pomdp_py.framework.planner.Planner" title="pomdp_py.framework.planner.Planner"><code class="xref py py-class docutils literal notranslate"><span class="pre">pomdp_py.framework.planner.Planner</span></code></a></p>
300-
<p>POUCT (Partially Observable UCT) <a class="bibtex reference internal" href="#silver2010monte" id="id9">[1]</a> is presented in the POMCP
300+
<p>POUCT (Partially Observable UCT) <a class="bibtex reference internal" href="../examples.tiger.html#silver2010monte" id="id9">[1]</a> is presented in the POMCP
301301
paper as an extension of the UCT algorithm to partially-observable domains
302302
that combines MCTS and UCB1 for action selection.</p>
303303
<p>POUCT only works for problems with action space that can be enumerated.</p>
@@ -543,7 +543,7 @@ <h1>pomdp_py.algorithms package<a class="headerlink" href="#pomdp-py-algorithms-
543543
<span id="pomdp-py-algorithms-value-iteration-module"></span><h2>pomdp_py.algorithms.value_iteration module<a class="headerlink" href="#module-pomdp_py.algorithms.value_iteration" title="Permalink to this headline"></a></h2>
544544
<p>Implementation of the basic policy tree based value iteration as explained
545545
in section 4.1 of <cite>Planning and acting in partially observable stochastic
546-
domains</cite> <a class="bibtex reference internal" href="#kaelbling1998planning" id="id10">[2]</a></p>
546+
domains</cite> <a class="bibtex reference internal" href="../examples.tiger.html#kaelbling1998planning" id="id10">[1]</a></p>
547547
<p>Warning: No pruning - the number of policy trees explodes very fast.</p>
548548
<dl class="py class">
549549
<dt class="sig sig-object py" id="pomdp_py.algorithms.value_iteration.ValueIteration">
@@ -657,7 +657,7 @@ <h2>pomdp_py.algorithms.visual.visual module<a class="headerlink" href="#pomdp-p
657657
<dt class="bibtex label" id="gusmao2012towards"><span class="brackets"><a class="fn-backref" href="#id8">6</a></span></dt>
658658
<dd><p>António Gusmao and Tapani Raiko. Towards generalizing the success of monte-carlo tree search beyond the game of go. In <em>ECAI</em>, 384–389. 2012.</p>
659659
</dd>
660-
<dt class="bibtex label" id="kaelbling1998planning"><span class="brackets"><a class="fn-backref" href="#id10">2</a></span></dt>
660+
<dt class="bibtex label" id="kaelbling1998planning"><span class="brackets"><a class="fn-backref" href="#id10">1</a></span></dt>
661661
<dd><p>Leslie Pack Kaelbling, Michael L Littman, and Anthony R Cassandra. Planning and acting in partially observable stochastic domains. <em>Artificial intelligence</em>, 101(1-2):99–134, 1998.</p>
662662
</dd>
663663
</dl>

docs/html/examples.action_prior.html

Lines changed: 7 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -119,7 +119,7 @@ <h3 class="donation">Donate/support</h3>
119119
<h1>Preference-based Action Prior<a class="headerlink" href="#preference-based-action-prior" title="Permalink to this headline"></a></h1>
120120
<p>The code below is a minimum example of defining a
121121
<a class="reference internal" href="api/pomdp_py.framework.html#pomdp_py.framework.basics.PolicyModel" title="pomdp_py.framework.basics.PolicyModel"><code class="xref py py-mod docutils literal notranslate"><span class="pre">PolicyModel</span></code></a>
122-
that supports a rollout policy based on preference-based action prior <a class="bibtex reference internal" href="examples.tiger.html#silver2010monte" id="id1">[1]</a>.
122+
that supports a rollout policy based on preference-based action prior <a class="bibtex reference internal" href="api/pomdp_py.algorithms.html#silver2010monte" id="id1">[1]</a>.
123123
The action prior is specified through the
124124
<a class="reference internal" href="api/pomdp_py.algorithms.html#pomdp_py.algorithms.po_uct.ActionPrior" title="pomdp_py.algorithms.po_uct.ActionPrior"><code class="xref py py-mod docutils literal notranslate"><span class="pre">ActionPrior</span></code></a> object,
125125
which returns a set of preferred actions given a state (and/or history).</p>
@@ -155,11 +155,16 @@ <h1>Preference-based Action Prior<a class="headerlink" href="#preference-based-a
155155
</pre></div>
156156
</div>
157157
<p>Note that the notion of “action prior” here is narrow; It
158-
follows the original POMCP paper <a class="bibtex reference internal" href="examples.tiger.html#silver2010monte" id="id2">[1]</a>.
158+
follows the original POMCP paper <a class="bibtex reference internal" href="api/pomdp_py.algorithms.html#silver2010monte" id="id2">[1]</a>.
159159
In general, you could express a prior over the action distribution
160160
explicitly through the <code class="code docutils literal notranslate"><span class="pre">sample</span></code> and <code class="code docutils literal notranslate"><span class="pre">rollout</span></code> function in
161161
<a class="reference internal" href="api/pomdp_py.framework.html#pomdp_py.framework.basics.PolicyModel" title="pomdp_py.framework.basics.PolicyModel"><code class="xref py py-mod docutils literal notranslate"><span class="pre">PolicyModel</span></code></a>. Refer to the <a class="reference external" href="https://h2r.github.io/pomdp-py/html/examples.tiger.html#:~:text=e.g.%20continuous).-,Next,-%2C%20we%20define%20the">Tiger</a>
162162
tutorial for more details (the paragraph on PolicyModel).</p>
163+
<p>As described in <a class="bibtex reference internal" href="api/pomdp_py.algorithms.html#silver2010monte" id="id3">[1]</a>, you could choose to set an initial visit count and initial value corresponding
164+
to a preferred action; To take this into account during POMDP planning using POUCT or POMCP,
165+
you need to supply the <a class="reference internal" href="api/pomdp_py.algorithms.html#pomdp_py.algorithms.po_uct.ActionPrior" title="pomdp_py.algorithms.po_uct.ActionPrior"><code class="xref py py-mod docutils literal notranslate"><span class="pre">ActionPrior</span></code></a> object
166+
when you initialize the <a class="reference internal" href="api/pomdp_py.algorithms.html#pomdp_py.algorithms.po_uct.POUCT" title="pomdp_py.algorithms.po_uct.POUCT"><code class="xref py py-mod docutils literal notranslate"><span class="pre">POUCT</span></code></a>
167+
or <a class="reference internal" href="api/pomdp_py.algorithms.html#pomdp_py.algorithms.pomcp.POMCP" title="pomdp_py.algorithms.pomcp.POMCP"><code class="xref py py-mod docutils literal notranslate"><span class="pre">POMCP</span></code></a> objects through the <code class="code docutils literal notranslate"><span class="pre">action_prior</span></code> argument.</p>
163168
</section>
164169

165170

docs/html/examples.tiger.html

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -143,7 +143,7 @@ <h3 class="donation">Donate/support</h3>
143143

144144
<section id="tiger">
145145
<h1>Tiger<a class="headerlink" href="#tiger" title="Permalink to this headline"></a></h1>
146-
<p>This is a classic POMDP problem, introduced in <a class="bibtex reference internal" href="index.html#kaelbling1998planning" id="id1">[7]</a>. The description of the tiger problem is as follows: (Quote from <a class="reference external" href="https://cran.r-project.org/web/packages/pomdp/vignettes/POMDP.pdf">POMDP:
146+
<p>This is a classic POMDP problem, introduced in <a class="bibtex reference internal" href="#kaelbling1998planning" id="id1">[1]</a>. The description of the tiger problem is as follows: (Quote from <a class="reference external" href="https://cran.r-project.org/web/packages/pomdp/vignettes/POMDP.pdf">POMDP:
147147
Introduction to Partially Observable Markov Decision Processes</a> by
148148
Kamalzadeh and Hahsler ):</p>
149149
<p><cite>A tiger is put with equal probability behind one
@@ -327,7 +327,7 @@ <h1>Tiger<a class="headerlink" href="#tiger" title="Permalink to this headline">
327327
provides a way to inject problem-specific action prior
328328
to POMDP planning; pomdp_py allows the user to do this through
329329
defining <a class="reference internal" href="api/pomdp_py.algorithms.html#pomdp_py.algorithms.po_uct.ActionPrior" title="pomdp_py.algorithms.po_uct.ActionPrior"><code class="xref py py-mod docutils literal notranslate"><span class="pre">ActionPrior</span></code></a>.
330-
See <a class="reference internal" href="examples.action_prior.html"><span class="doc">Preference-based Action Prior</span></a> for an example.</p>
330+
See <a class="reference internal" href="examples.action_prior.html"><span class="doc">Preference-based Action Prior</span></a> for details.</p>
331331
</div>
332332
<div class="admonition note">
333333
<p class="admonition-title">Note</p>
@@ -526,7 +526,7 @@ <h2>Define the POMDP<a class="headerlink" href="#define-the-pomdp" title="Permal
526526
<dt class="bibtex label" id="silver2010monte"><span class="brackets">1</span><span class="fn-backref">(<a href="#id6">1</a>,<a href="#id10">2</a>)</span></dt>
527527
<dd><p>David Silver and Joel Veness. Monte-carlo planning in large pomdps. In <em>Advances in neural information processing systems</em>, 2164–2172. 2010.</p>
528528
</dd>
529-
<dt class="bibtex label" id="kaelbling1998planning"><span class="brackets"><a class="fn-backref" href="#id1">7</a></span></dt>
529+
<dt class="bibtex label" id="kaelbling1998planning"><span class="brackets"><a class="fn-backref" href="#id1">1</a></span></dt>
530530
<dd><p>Leslie Pack Kaelbling, Michael L Littman, and Anthony R Cassandra. Planning and acting in partially observable stochastic domains. <em>Artificial intelligence</em>, 101(1-2):99–134, 1998.</p>
531531
</dd>
532532
<dt class="bibtex label" id="abel2015goal"><span class="brackets"><a class="fn-backref" href="#id7">3</a></span></dt>

docs/html/index.html

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -138,7 +138,7 @@ <h1>pomdp_py Documentation<a class="headerlink" href="#pomdp-py-documentation" t
138138
<h2>Overview<a class="headerlink" href="#overview" title="Permalink to this headline"></a></h2>
139139
<p><a class="reference external" href="https://github.com/h2r/pomdp-py">pomdp_py</a> is a <strong>general purpose POMDP library</strong> written in Python and Cython. It features simple and comprehensive interfaces to describe POMDP or MDP problems. Originally written to support POMDP planning research, the interfaces also allow extensions to model-free or model-based learning in (PO)MDPs, multi-agent POMDP planning/learning, and task transfer or transfer learning.</p>
140140
<p><strong>Why pomdp_py?</strong> It provides a POMDP framework in Python with clean and intuitive interfaces. This makes POMDP-related research or projects accessible to more people. It also helps sharing code and developing a community.</p>
141-
<p>POMDP stands for <strong>P</strong>artially <strong>O</strong>bservable <strong>M</strong>arkov <strong>D</strong>ecision <strong>P</strong>rocess <a class="bibtex reference internal" href="api/pomdp_py.algorithms.html#kaelbling1998planning" id="id1">[2]</a>.</p>
141+
<p>POMDP stands for <strong>P</strong>artially <strong>O</strong>bservable <strong>M</strong>arkov <strong>D</strong>ecision <strong>P</strong>rocess <a class="bibtex reference internal" href="examples.tiger.html#kaelbling1998planning" id="id1">[1]</a>.</p>
142142
<p>The code is available <a class="reference external" href="https://github.com/h2r/pomdp-py">on github</a>. We welcome contributions to this library in:</p>
143143
<ol class="arabic simple">
144144
<li><p>Implementation of additional POMDP solvers (see <a class="reference internal" href="existing_solvers.html"><span class="doc">Existing POMDP Solvers</span></a>)</p></li>
@@ -228,7 +228,7 @@ <h2>Tools<a class="headerlink" href="#tools" title="Permalink to this headline">
228228
<li><p><a class="reference internal" href="search.html"><span class="std std-ref">Search Page</span></a></p></li>
229229
</ul>
230230
<p id="bibtex-bibliography-index-0"><dl class="bibtex citation">
231-
<dt class="bibtex label" id="kaelbling1998planning"><span class="brackets"><a class="fn-backref" href="#id1">2</a></span></dt>
231+
<dt class="bibtex label" id="kaelbling1998planning"><span class="brackets"><a class="fn-backref" href="#id1">1</a></span></dt>
232232
<dd><p>Leslie Pack Kaelbling, Michael L Littman, and Anthony R Cassandra. Planning and acting in partially observable stochastic domains. <em>Artificial intelligence</em>, 101(1-2):99–134, 1998.</p>
233233
</dd>
234234
</dl>

docs/html/objects.inv

2 Bytes
Binary file not shown.

docs/html/problems/pomdp_problems.tiger.cythonize.html

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -347,7 +347,7 @@ <h2>pomdp_problems.tiger.cythonize.tiger_problem.cpython-38-x86_64-linux-gnu mod
347347
<dd><p>Bases: <a class="reference internal" href="../api/pomdp_py.framework.html#pomdp_py.framework.basics.POMDP" title="pomdp_py.framework.basics.POMDP"><code class="xref py py-class docutils literal notranslate"><span class="pre">pomdp_py.framework.basics.POMDP</span></code></a></p>
348348
<dl class="py attribute">
349349
<dt class="sig sig-object py" id="pomdp_problems.tiger.cythonize.tiger_problem.TigerProblem.ACTIONS">
350-
<span class="sig-name descname"><span class="pre">ACTIONS</span></span><em class="property"><span class="w"> </span><span class="p"><span class="pre">=</span></span><span class="w"> </span><span class="pre">{TigerAction(open-left),</span> <span class="pre">TigerAction(open-right),</span> <span class="pre">TigerAction(listen)}</span></em><a class="headerlink" href="#pomdp_problems.tiger.cythonize.tiger_problem.TigerProblem.ACTIONS" title="Permalink to this definition"></a></dt>
350+
<span class="sig-name descname"><span class="pre">ACTIONS</span></span><em class="property"><span class="w"> </span><span class="p"><span class="pre">=</span></span><span class="w"> </span><span class="pre">{TigerAction(open-right),</span> <span class="pre">TigerAction(open-left),</span> <span class="pre">TigerAction(listen)}</span></em><a class="headerlink" href="#pomdp_problems.tiger.cythonize.tiger_problem.TigerProblem.ACTIONS" title="Permalink to this definition"></a></dt>
351351
<dd></dd></dl>
352352

353353
<dl class="py attribute">

0 commit comments

Comments
 (0)