adding a bit of doc on using action prior

zkytony · zkytony · commit 8efc86652b4a · 2022-04-03T15:57:30.000-04:00
diff --git a/docs/_sphinx_src/examples.action_prior.rst b/docs/_sphinx_src/examples.action_prior.rst
@@ -46,3 +46,9 @@ In general, you could express a prior over the action distribution
 explicitly through the :code:`sample` and :code:`rollout` function in
 :py:mod:`~pomdp_py.framework.basics.PolicyModel`. Refer to the `Tiger <https://h2r.github.io/pomdp-py/html/examples.tiger.html#:~:text=e.g.%20continuous).-,Next,-%2C%20we%20define%20the>`_
 tutorial for more details (the paragraph on PolicyModel).
+
+As described in :cite:`silver2010monte`, you could choose to set an initial visit count and initial value corresponding
+to a preferred action; To take this into account during POMDP planning using POUCT or POMCP,
+you need to supply the :py:mod:`~pomdp_py.algorithms.po_uct.ActionPrior` object
+when you initialize the :py:mod:`~pomdp_py.algorithms.po_uct.POUCT`
+or :py:mod:`~pomdp_py.algorithms.pomcp.POMCP` objects through the :code:`action_prior` argument.
diff --git a/docs/_sphinx_src/examples.tiger.rst b/docs/_sphinx_src/examples.tiger.rst
@@ -213,7 +213,7 @@ this policy model through the :code:`sample` function.
    provides a way to inject problem-specific action prior
    to POMDP planning; pomdp_py allows the user to do this through
    defining :py:mod:`~pomdp_py.algorithms.po_uct.ActionPrior`.
-   See :doc:`examples.action_prior` for an example.
+   See :doc:`examples.action_prior` for details.
 
 .. note::
 
diff --git a/docs/html/_sources/examples.action_prior.rst.txt b/docs/html/_sources/examples.action_prior.rst.txt
@@ -46,3 +46,9 @@ In general, you could express a prior over the action distribution
 explicitly through the :code:`sample` and :code:`rollout` function in
 :py:mod:`~pomdp_py.framework.basics.PolicyModel`. Refer to the `Tiger <https://h2r.github.io/pomdp-py/html/examples.tiger.html#:~:text=e.g.%20continuous).-,Next,-%2C%20we%20define%20the>`_
 tutorial for more details (the paragraph on PolicyModel).
+
+As described in :cite:`silver2010monte`, you could choose to set an initial visit count and initial value corresponding
+to a preferred action; To take this into account during POMDP planning using POUCT or POMCP,
+you need to supply the :py:mod:`~pomdp_py.algorithms.po_uct.ActionPrior` object
+when you initialize the :py:mod:`~pomdp_py.algorithms.po_uct.POUCT`
+or :py:mod:`~pomdp_py.algorithms.pomcp.POMCP` objects through the :code:`action_prior` argument.
diff --git a/docs/html/_sources/examples.tiger.rst.txt b/docs/html/_sources/examples.tiger.rst.txt
@@ -213,7 +213,7 @@ this policy model through the :code:`sample` function.
    provides a way to inject problem-specific action prior
    to POMDP planning; pomdp_py allows the user to do this through
    defining :py:mod:`~pomdp_py.algorithms.po_uct.ActionPrior`.
-   See :doc:`examples.action_prior` for an example.
+   See :doc:`examples.action_prior` for details.
 
 .. note::
 
diff --git a/docs/html/api/pomdp_py.algorithms.html b/docs/html/api/pomdp_py.algorithms.html
@@ -161,7 +161,7 @@ <h1>pomdp_py.algorithms package<a class="headerlink" href="#pomdp-py-algorithms-
 <td><p>PO-rollout: Baseline algorithm in the POMCP paper</p></td>
 </tr>
 <tr class="row-even"><td><p><a class="reference internal" href="#pomdp_py.algorithms.po_uct.POUCT" title="pomdp_py.algorithms.po_uct.POUCT"><code class="xref py py-obj docutils literal notranslate"><span class="pre">POUCT</span></code></a></p></td>
-<td><p>POUCT (Partially Observable UCT) <a class="bibtex reference internal" href="#silver2010monte" id="id1">[1]</a> is presented in the POMCP paper as an extension of the UCT algorithm to partially-observable domains that combines MCTS and UCB1 for action selection.</p></td>
+<td><p>POUCT (Partially Observable UCT) <a class="bibtex reference internal" href="../examples.tiger.html#silver2010monte" id="id1">[1]</a> is presented in the POMCP paper as an extension of the UCT algorithm to partially-observable domains that combines MCTS and UCB1 for action selection.</p></td>
 </tr>
 <tr class="row-odd"><td><p><a class="reference internal" href="#pomdp_py.algorithms.pomcp.POMCP" title="pomdp_py.algorithms.pomcp.POMCP"><code class="xref py py-obj docutils literal notranslate"><span class="pre">POMCP</span></code></a></p></td>
 <td><p>POMCP is POUCT + particle belief representation.</p></td>
@@ -185,7 +185,7 @@ <h1>pomdp_py.algorithms package<a class="headerlink" href="#pomdp-py-algorithms-
 </table>
 <section id="module-pomdp_py.algorithms.po_rollout">
 <span id="pomdp-py-algorithms-po-rollout-module"></span><h2>pomdp_py.algorithms.po_rollout module<a class="headerlink" href="#module-pomdp_py.algorithms.po_rollout" title="Permalink to this headline">¶</a></h2>
-<p>PO-rollout: Baseline algorithm in the POMCP paper <a class="bibtex reference internal" href="#silver2010monte" id="id3">[1]</a>.</p>
+<p>PO-rollout: Baseline algorithm in the POMCP paper <a class="bibtex reference internal" href="../examples.tiger.html#silver2010monte" id="id3">[1]</a>.</p>
 <p>Quote from the POMCP paper:</p>
 <blockquote>
 <div><p>To provide a performance benchmark in these cases, we evaluated the
@@ -240,7 +240,7 @@ <h1>pomdp_py.algorithms package<a class="headerlink" href="#pomdp-py-algorithms-
 <section id="module-pomdp_py.algorithms.po_uct">
 <span id="pomdp-py-algorithms-po-uct-module"></span><h2>pomdp_py.algorithms.po_uct module<a class="headerlink" href="#module-pomdp_py.algorithms.po_uct" title="Permalink to this headline">¶</a></h2>
 <p>This algorithm is PO-UCT (Partially Observable UCT). It is
-presented in the POMCP paper <a class="bibtex reference internal" href="#silver2010monte" id="id4">[1]</a> as an extension to the UCT
+presented in the POMCP paper <a class="bibtex reference internal" href="../examples.tiger.html#silver2010monte" id="id4">[1]</a> as an extension to the UCT
 algorithm <a class="bibtex reference internal" href="#kocsis2006bandit" id="id5">[3]</a> that combines MCTS and UCB1
 for action selection.</p>
 <p>In other words, this is just POMCP without particle belief,
@@ -297,7 +297,7 @@ <h1>pomdp_py.algorithms package<a class="headerlink" href="#pomdp-py-algorithms-
 <dt class="sig sig-object py" id="pomdp_py.algorithms.po_uct.POUCT">
 <em class="property"><span class="pre">class</span><span class="w"> </span></em><span class="sig-prename descclassname"><span class="pre">pomdp_py.algorithms.po_uct.</span></span><span class="sig-name descname"><span class="pre">POUCT</span></span><a class="headerlink" href="#pomdp_py.algorithms.po_uct.POUCT" title="Permalink to this definition">¶</a></dt>
 <dd><p>Bases: <a class="reference internal" href="pomdp_py.framework.html#pomdp_py.framework.planner.Planner" title="pomdp_py.framework.planner.Planner"><code class="xref py py-class docutils literal notranslate"><span class="pre">pomdp_py.framework.planner.Planner</span></code></a></p>
-<p>POUCT (Partially Observable UCT) <a class="bibtex reference internal" href="#silver2010monte" id="id9">[1]</a> is presented in the POMCP
+<p>POUCT (Partially Observable UCT) <a class="bibtex reference internal" href="../examples.tiger.html#silver2010monte" id="id9">[1]</a> is presented in the POMCP
 paper as an extension of the UCT algorithm to partially-observable domains
 that combines MCTS and UCB1 for action selection.</p>
 <p>POUCT only works for problems with action space that can be enumerated.</p>
@@ -543,7 +543,7 @@ <h1>pomdp_py.algorithms package<a class="headerlink" href="#pomdp-py-algorithms-
 <span id="pomdp-py-algorithms-value-iteration-module"></span><h2>pomdp_py.algorithms.value_iteration module<a class="headerlink" href="#module-pomdp_py.algorithms.value_iteration" title="Permalink to this headline">¶</a></h2>
 <p>Implementation of the basic policy tree based value iteration as explained
 in section 4.1 of <cite>Planning and acting in partially observable stochastic
-domains</cite> <a class="bibtex reference internal" href="#kaelbling1998planning" id="id10">[2]</a></p>
+domains</cite> <a class="bibtex reference internal" href="../examples.tiger.html#kaelbling1998planning" id="id10">[1]</a></p>
 <p>Warning: No pruning - the number of policy trees explodes very fast.</p>
 <dl class="py class">
 <dt class="sig sig-object py" id="pomdp_py.algorithms.value_iteration.ValueIteration">
@@ -657,7 +657,7 @@ <h2>pomdp_py.algorithms.visual.visual module<a class="headerlink" href="#pomdp-p
 <dt class="bibtex label" id="gusmao2012towards"><span class="brackets"><a class="fn-backref" href="#id8">6</a></span></dt>
 <dd><p>António Gusmao and Tapani Raiko. Towards generalizing the success of monte-carlo tree search beyond the game of go. In <em>ECAI</em>, 384–389. 2012.</p>
 </dd>
-<dt class="bibtex label" id="kaelbling1998planning"><span class="brackets"><a class="fn-backref" href="#id10">2</a></span></dt>
+<dt class="bibtex label" id="kaelbling1998planning"><span class="brackets"><a class="fn-backref" href="#id10">1</a></span></dt>
 <dd><p>Leslie Pack Kaelbling, Michael L Littman, and Anthony R Cassandra. Planning and acting in partially observable stochastic domains. <em>Artificial intelligence</em>, 101(1-2):99–134, 1998.</p>
 </dd>
 </dl>
diff --git a/docs/html/examples.action_prior.html b/docs/html/examples.action_prior.html
@@ -119,7 +119,7 @@ <h3 class="donation">Donate/support</h3>
 <h1>Preference-based Action Prior<a class="headerlink" href="#preference-based-action-prior" title="Permalink to this headline">¶</a></h1>
 <p>The code below is a minimum example of defining a
 <a class="reference internal" href="api/pomdp_py.framework.html#pomdp_py.framework.basics.PolicyModel" title="pomdp_py.framework.basics.PolicyModel"><code class="xref py py-mod docutils literal notranslate"><span class="pre">PolicyModel</span></code></a>
-that supports a rollout policy based on preference-based action prior <a class="bibtex reference internal" href="examples.tiger.html#silver2010monte" id="id1">[1]</a>.
+that supports a rollout policy based on preference-based action prior <a class="bibtex reference internal" href="api/pomdp_py.algorithms.html#silver2010monte" id="id1">[1]</a>.
 The action prior is specified through the
 <a class="reference internal" href="api/pomdp_py.algorithms.html#pomdp_py.algorithms.po_uct.ActionPrior" title="pomdp_py.algorithms.po_uct.ActionPrior"><code class="xref py py-mod docutils literal notranslate"><span class="pre">ActionPrior</span></code></a> object,
 which returns a set of preferred actions given a state (and/or history).</p>
@@ -155,11 +155,16 @@ <h1>Preference-based Action Prior<a class="headerlink" href="#preference-based-a
 </pre></div>
 </div>
 <p>Note that the notion of “action prior” here is narrow; It
-follows the original POMCP paper <a class="bibtex reference internal" href="examples.tiger.html#silver2010monte" id="id2">[1]</a>.
+follows the original POMCP paper <a class="bibtex reference internal" href="api/pomdp_py.algorithms.html#silver2010monte" id="id2">[1]</a>.
 In general, you could express a prior over the action distribution
 explicitly through the <code class="code docutils literal notranslate"><span class="pre">sample</span></code> and <code class="code docutils literal notranslate"><span class="pre">rollout</span></code> function in
 <a class="reference internal" href="api/pomdp_py.framework.html#pomdp_py.framework.basics.PolicyModel" title="pomdp_py.framework.basics.PolicyModel"><code class="xref py py-mod docutils literal notranslate"><span class="pre">PolicyModel</span></code></a>. Refer to the <a class="reference external" href="https://h2r.github.io/pomdp-py/html/examples.tiger.html#:~:text=e.g.%20continuous).-,Next,-%2C%20we%20define%20the">Tiger</a>
 tutorial for more details (the paragraph on PolicyModel).</p>
+<p>As described in <a class="bibtex reference internal" href="api/pomdp_py.algorithms.html#silver2010monte" id="id3">[1]</a>, you could choose to set an initial visit count and initial value corresponding
+to a preferred action; To take this into account during POMDP planning using POUCT or POMCP,
+you need to supply the <a class="reference internal" href="api/pomdp_py.algorithms.html#pomdp_py.algorithms.po_uct.ActionPrior" title="pomdp_py.algorithms.po_uct.ActionPrior"><code class="xref py py-mod docutils literal notranslate"><span class="pre">ActionPrior</span></code></a> object
+when you initialize the <a class="reference internal" href="api/pomdp_py.algorithms.html#pomdp_py.algorithms.po_uct.POUCT" title="pomdp_py.algorithms.po_uct.POUCT"><code class="xref py py-mod docutils literal notranslate"><span class="pre">POUCT</span></code></a>
+or <a class="reference internal" href="api/pomdp_py.algorithms.html#pomdp_py.algorithms.pomcp.POMCP" title="pomdp_py.algorithms.pomcp.POMCP"><code class="xref py py-mod docutils literal notranslate"><span class="pre">POMCP</span></code></a> objects through the <code class="code docutils literal notranslate"><span class="pre">action_prior</span></code> argument.</p>
 </section>
 
 
diff --git a/docs/html/examples.tiger.html b/docs/html/examples.tiger.html
@@ -143,7 +143,7 @@ <h3 class="donation">Donate/support</h3>
             
   <section id="tiger">
 <h1>Tiger<a class="headerlink" href="#tiger" title="Permalink to this headline">¶</a></h1>
-<p>This is a classic POMDP problem, introduced in <a class="bibtex reference internal" href="index.html#kaelbling1998planning" id="id1">[7]</a>. The description of the tiger problem is as follows: (Quote from <a class="reference external" href="https://cran.r-project.org/web/packages/pomdp/vignettes/POMDP.pdf">POMDP:
+<p>This is a classic POMDP problem, introduced in <a class="bibtex reference internal" href="#kaelbling1998planning" id="id1">[1]</a>. The description of the tiger problem is as follows: (Quote from <a class="reference external" href="https://cran.r-project.org/web/packages/pomdp/vignettes/POMDP.pdf">POMDP:
 Introduction to Partially Observable Markov Decision Processes</a> by
 Kamalzadeh and Hahsler ):</p>
 <p><cite>A tiger is put with equal probability behind one
@@ -327,7 +327,7 @@ <h1>Tiger<a class="headerlink" href="#tiger" title="Permalink to this headline">
 provides a way to inject problem-specific action prior
 to POMDP planning; pomdp_py allows the user to do this through
 defining <a class="reference internal" href="api/pomdp_py.algorithms.html#pomdp_py.algorithms.po_uct.ActionPrior" title="pomdp_py.algorithms.po_uct.ActionPrior"><code class="xref py py-mod docutils literal notranslate"><span class="pre">ActionPrior</span></code></a>.
-See <a class="reference internal" href="examples.action_prior.html"><span class="doc">Preference-based Action Prior</span></a> for an example.</p>
+See <a class="reference internal" href="examples.action_prior.html"><span class="doc">Preference-based Action Prior</span></a> for details.</p>
 </div>
 <div class="admonition note">
 <p class="admonition-title">Note</p>
@@ -526,7 +526,7 @@ <h2>Define the POMDP<a class="headerlink" href="#define-the-pomdp" title="Permal
 <dt class="bibtex label" id="silver2010monte"><span class="brackets">1</span><span class="fn-backref">(<a href="#id6">1</a>,<a href="#id10">2</a>)</span></dt>
 <dd><p>David Silver and Joel Veness. Monte-carlo planning in large pomdps. In <em>Advances in neural information processing systems</em>, 2164–2172. 2010.</p>
 </dd>
-<dt class="bibtex label" id="kaelbling1998planning"><span class="brackets"><a class="fn-backref" href="#id1">7</a></span></dt>
+<dt class="bibtex label" id="kaelbling1998planning"><span class="brackets"><a class="fn-backref" href="#id1">1</a></span></dt>
 <dd><p>Leslie Pack Kaelbling, Michael L Littman, and Anthony R Cassandra. Planning and acting in partially observable stochastic domains. <em>Artificial intelligence</em>, 101(1-2):99–134, 1998.</p>
 </dd>
 <dt class="bibtex label" id="abel2015goal"><span class="brackets"><a class="fn-backref" href="#id7">3</a></span></dt>
diff --git a/docs/html/index.html b/docs/html/index.html
@@ -138,7 +138,7 @@ <h1>pomdp_py Documentation<a class="headerlink" href="#pomdp-py-documentation" t
 <h2>Overview<a class="headerlink" href="#overview" title="Permalink to this headline">¶</a></h2>
 <p><a class="reference external" href="https://github.com/h2r/pomdp-py">pomdp_py</a> is a <strong>general purpose POMDP library</strong> written in Python and Cython. It features simple and comprehensive interfaces to describe POMDP or MDP problems. Originally written to support POMDP planning research, the interfaces also allow extensions to model-free or model-based learning in (PO)MDPs, multi-agent POMDP planning/learning, and task transfer or transfer learning.</p>
 <p><strong>Why pomdp_py?</strong> It provides a POMDP framework in Python with clean and intuitive interfaces. This makes POMDP-related research or projects accessible to more people. It also helps sharing code and developing a community.</p>
-<p>POMDP stands for <strong>P</strong>artially <strong>O</strong>bservable <strong>M</strong>arkov <strong>D</strong>ecision <strong>P</strong>rocess <a class="bibtex reference internal" href="api/pomdp_py.algorithms.html#kaelbling1998planning" id="id1">[2]</a>.</p>
+<p>POMDP stands for <strong>P</strong>artially <strong>O</strong>bservable <strong>M</strong>arkov <strong>D</strong>ecision <strong>P</strong>rocess <a class="bibtex reference internal" href="examples.tiger.html#kaelbling1998planning" id="id1">[1]</a>.</p>
 <p>The code is available <a class="reference external" href="https://github.com/h2r/pomdp-py">on github</a>. We welcome contributions to this library in:</p>
 <ol class="arabic simple">
 <li><p>Implementation of additional POMDP solvers (see <a class="reference internal" href="existing_solvers.html"><span class="doc">Existing POMDP Solvers</span></a>)</p></li>
@@ -228,7 +228,7 @@ <h2>Tools<a class="headerlink" href="#tools" title="Permalink to this headline">
 <li><p><a class="reference internal" href="search.html"><span class="std std-ref">Search Page</span></a></p></li>
 </ul>
 <p id="bibtex-bibliography-index-0"><dl class="bibtex citation">
-<dt class="bibtex label" id="kaelbling1998planning"><span class="brackets"><a class="fn-backref" href="#id1">2</a></span></dt>
+<dt class="bibtex label" id="kaelbling1998planning"><span class="brackets"><a class="fn-backref" href="#id1">1</a></span></dt>
 <dd><p>Leslie Pack Kaelbling, Michael L Littman, and Anthony R Cassandra. Planning and acting in partially observable stochastic domains. <em>Artificial intelligence</em>, 101(1-2):99–134, 1998.</p>
 </dd>
 </dl>
diff --git a/docs/html/objects.inv b/docs/html/objects.inv
diff --git a/docs/html/problems/pomdp_problems.tiger.cythonize.html b/docs/html/problems/pomdp_problems.tiger.cythonize.html
@@ -347,7 +347,7 @@ <h2>pomdp_problems.tiger.cythonize.tiger_problem.cpython-38-x86_64-linux-gnu mod
 <dd><p>Bases: <a class="reference internal" href="../api/pomdp_py.framework.html#pomdp_py.framework.basics.POMDP" title="pomdp_py.framework.basics.POMDP"><code class="xref py py-class docutils literal notranslate"><span class="pre">pomdp_py.framework.basics.POMDP</span></code></a></p>
 <dl class="py attribute">
 <dt class="sig sig-object py" id="pomdp_problems.tiger.cythonize.tiger_problem.TigerProblem.ACTIONS">
-<span class="sig-name descname"><span class="pre">ACTIONS</span></span><em class="property"><span class="w"> </span><span class="p"><span class="pre">=</span></span><span class="w"> </span><span class="pre">{TigerAction(open-left),</span> <span class="pre">TigerAction(open-right),</span> <span class="pre">TigerAction(listen)}</span></em><a class="headerlink" href="#pomdp_problems.tiger.cythonize.tiger_problem.TigerProblem.ACTIONS" title="Permalink to this definition">¶</a></dt>
+<span class="sig-name descname"><span class="pre">ACTIONS</span></span><em class="property"><span class="w"> </span><span class="p"><span class="pre">=</span></span><span class="w"> </span><span class="pre">{TigerAction(open-right),</span> <span class="pre">TigerAction(open-left),</span> <span class="pre">TigerAction(listen)}</span></em><a class="headerlink" href="#pomdp_problems.tiger.cythonize.tiger_problem.TigerProblem.ACTIONS" title="Permalink to this definition">¶</a></dt>
 <dd></dd></dl>
 
 <dl class="py attribute">
diff --git a/docs/html/searchindex.js b/docs/html/searchindex.js