Merge pull request #20161 from rajasekharporeddy:doc_typos

jax authors · jax authors · commit fe44afc0fce1 · 2024-03-11T15:27:06.000-07:00
PiperOrigin-RevId: 614813533
diff --git a/docs/jax-101/06-parallelism.ipynb b/docs/jax-101/06-parallelism.ipynb
@@ -410,7 +410,7 @@
     "## Communication between devices\n",
     "\n",
     "The above is enough to perform simple parallel operations, e.g. batching a simple MLP forward pass across several devices. However, sometimes we need to pass information between the devices. For example, perhaps we are interested in normalizing the output of each device so they sum to 1.\n",
-    "For that, we can use special [collective ops](https://jax.readthedocs.io/en/latest/jax.lax.html#parallel-operators) (such as the `jax.lax.p*` ops `psum`, `pmean`, `pmax`, ...). In order to use the collective ops we must specify the name of the `pmap`-ed axis through `axis_name` argument, and then refer to it when calling the op. Here's how to do that:"
+    "For that, we can use special [collective ops](https://jax.readthedocs.io/en/latest/jax.lax.html#parallel-operators) (such as the `jax.lax.p*` ops `psum`, `pmean`, `pmax`, ...). In order to use the collective ops we must specify the name of the `pmap`-ed axis through the `axis_name` argument, and then refer to it when calling the op. Here's how to do that:"
    ]
   },
   {
diff --git a/docs/jax-101/06-parallelism.md b/docs/jax-101/06-parallelism.md
@@ -159,7 +159,7 @@ Keep in mind that when calling the transformed function, the size of the specifi
 ## Communication between devices
 
 The above is enough to perform simple parallel operations, e.g. batching a simple MLP forward pass across several devices. However, sometimes we need to pass information between the devices. For example, perhaps we are interested in normalizing the output of each device so they sum to 1.
-For that, we can use special [collective ops](https://jax.readthedocs.io/en/latest/jax.lax.html#parallel-operators) (such as the `jax.lax.p*` ops `psum`, `pmean`, `pmax`, ...). In order to use the collective ops we must specify the name of the `pmap`-ed axis through `axis_name` argument, and then refer to it when calling the op. Here's how to do that:
+For that, we can use special [collective ops](https://jax.readthedocs.io/en/latest/jax.lax.html#parallel-operators) (such as the `jax.lax.p*` ops `psum`, `pmean`, `pmax`, ...). In order to use the collective ops we must specify the name of the `pmap`-ed axis through the `axis_name` argument, and then refer to it when calling the op. Here's how to do that:
 
 ```{code-cell} ipython3
 :id: 0nCxGwqmtd3w
diff --git a/docs/jep/18137-numpy-scipy-scope.md b/docs/jep/18137-numpy-scipy-scope.md
@@ -140,7 +140,7 @@ incompatible with JAX’s computation model. We instead focus on {mod}`jax.rando
 which offers similar functionality using a counter-based PRNG.
 
 #### ❌ `numpy.ma` & `numpy.polynomial`
-The {mod}`numpy.ma` andd {mod}`numpy.polynomial` submodules are mostly concerned with
+The {mod}`numpy.ma` and {mod}`numpy.polynomial` submodules are mostly concerned with
 providing object-oriented interfaces to computations that can be expressed via other
 functional means (Axis 5); for this reason, we deem them out-of-scope for JAX.
 
@@ -187,7 +187,7 @@ evaluations. {func}`jax.experimental.ode.odeint` is related, but rather limited
 under any active development.
 
 JAX does currently include {func}`jax.scipy.integrate.trapezoid`, but this is only because
-{func}`numpy.trapz` was recently deprecated in favor of this. For any particular inputs,
+{func}`numpy.trapz` was recently deprecated in favor of this. For any particular input,
 its implementation could be replaced with one line of {mod}`jax.numpy` expressions, so
 it’s not a particularly useful API to provide.
 
diff --git a/docs/pallas/quickstart.ipynb b/docs/pallas/quickstart.ipynb
@@ -245,7 +245,7 @@
    "metadata": {},
    "source": [
     "With `grid` and `program_id` in mind, Pallas provides an abstraction that takes care of some common indexing patterns seen in a lot of kernels.\n",
-    "To build intution, let's try to implement a matrix multiplication.\n",
+    "To build intuition, let's try to implement a matrix multiplication.\n",
     "\n",
     "A simple strategy for implementing a matrix multiplication in Pallas is to implement it recursively. We know our underlying hardware has support for small matrix multiplications (using GPU and TPU tensorcores), so we just express a big matrix multiplication in terms of smaller ones.\n",
     "\n",
diff --git a/docs/pallas/quickstart.md b/docs/pallas/quickstart.md
@@ -143,7 +143,7 @@ On TPUs, programs are executed in a combination of parallel and sequential (depe
 +++
 
 With `grid` and `program_id` in mind, Pallas provides an abstraction that takes care of some common indexing patterns seen in a lot of kernels.
-To build intution, let's try to implement a matrix multiplication.
+To build intuition, let's try to implement a matrix multiplication.
 
 A simple strategy for implementing a matrix multiplication in Pallas is to implement it recursively. We know our underlying hardware has support for small matrix multiplications (using GPU and TPU tensorcores), so we just express a big matrix multiplication in terms of smaller ones.
 
diff --git a/docs/pallas/tpu/details.rst b/docs/pallas/tpu/details.rst
@@ -266,7 +266,7 @@ Elementwise operations
 ^^^^^^^^^^^^^^^^^^^^^^
 
 Many elementwise operations are supported. It is worth noting that the hardware
-generally only supports elementwise compute using 32-bit types. When loading
+generally only supports elementwise computation using 32-bit types. When loading
 operands that use lower-precision types, they should generally be upcast to a
 32-bit type before applying elementwise ops.
 
@@ -344,5 +344,5 @@ However, loop primitives get fully unrolled during the compilation at the
 moment, so try to keep the loop trip count reasonably small.
 
 Overusing control flow can lead to significant regressions in low-level code
-generation, and it is recommended to try to squeeze as many computationaly
+generation, and it is recommended to try to squeeze as many computationally
 expensive operations into a single basic block as possible.
diff --git a/docs/pallas/tpu/pipelining.ipynb b/docs/pallas/tpu/pipelining.ipynb
@@ -128,7 +128,7 @@
    "source": [
     "We've written two functions: `add_matrices_kernel` and `add_matrices`.\n",
     "\n",
-    "`add_matrices_kernel` operates using `Ref`s that live in VMEM. Loading from a VMEM `Ref` produces a value that lives in VREGs. Values in VREGs behave like `jax.Array`s in that we can use `jnp` and `jax.lax` operations on then to produce new values that live in VREGs. When we produce the values we'd like to return, we store them in the output VMEM `Ref`.\n",
+    "`add_matrices_kernel` operates using `Ref`s that live in VMEM. Loading from a VMEM `Ref` produces a value that lives in VREGs. Values in VREGs behave like `jax.Array`s in that we can use `jnp` and `jax.lax` operations on them to produce new values that live in VREGs. When we produce the values we'd like to return, we store them in the output VMEM `Ref`.\n",
     "\n",
     "The `add_matrices` function acts on `jax.Array`s and returns a `jax.Array`. Inside it, we pass `x` and `y` into `pallas_call`. `pallas_call` is responsible for copying `x` and `y` into VMEM and for allocating the VMEM buffers that the kernel operates on (including allocating `z_vmem_ref`, the output VMEM buffer). After the kernel function is finished running, `pallas_call` will also copy the value in `z_vmem_ref` to HBM, resulting in an output `jax.Array`."
    ]
@@ -497,7 +497,7 @@
     "id": "Kv9qJYJY4jbK"
    },
    "source": [
-    "Notice how we've set up the `BlockSpec`s: we're loading the entirety of the `(512, 512)` dimension into VMEM (no pipelining there) but selecting the `i`-th dimension of `x` each iteration in the `index_map`. We are using a `None` for that dimension in the block shape, which indicates that we are selecting a singleton dimension from `x` that we would like squeeze away in the kernel. Therefore, `x_ref` is `(512, 512)`-shaped in VMEM as well.\n",
+    "Notice how we've set up the `BlockSpec`s: we're loading the entirety of the `(512, 512)` dimension into VMEM (no pipelining there) but selecting the `i`-th dimension of `x` each iteration in the `index_map`. We are using a `None` for that dimension in the block shape, which indicates that we are selecting a singleton dimension from `x` that we would like to squeeze away in the kernel. Therefore, `x_ref` is `(512, 512)`-shaped in VMEM as well.\n",
     "\n",
     "`out_spec` uses `lambda i: (0, 0)` as its `index_map`, indicating that `o_ref` is unchanged over the course of the pipeline. This means that we can update its value each iteration by reading from and writing to it. Or can it? Actually there is one catch: *`o_ref` is initially garbage*, meaning we'll be accumulating into garbage. This will result in the overall function outputting the incorrect value!\n",
     "\n",
@@ -582,7 +582,7 @@
     "\n",
     "Conceptually, TPUs in Megacore behave like very simple GPUs, i.e. they have only two threads. How do we modify our kernels to utilize both TensorCores simultaneously?\n",
     "\n",
-    "The basic idea is that if we have embarassingly parallel dimensions in our computation, we can split up those dimensions across the TensorCores. We can indicate which dimensions are parallelizable by providing an annotation to `pallas_call` called `dimension_semantics`."
+    "The basic idea is that if we have embarrassingly parallel dimensions in our computation, we can split up those dimensions across the TensorCores. We can indicate which dimensions are parallelizable by providing an annotation to `pallas_call` called `dimension_semantics`."
    ]
   },
   {
diff --git a/docs/pallas/tpu/pipelining.md b/docs/pallas/tpu/pipelining.md
@@ -88,7 +88,7 @@ add_matrices(x, y)
 
 We've written two functions: `add_matrices_kernel` and `add_matrices`.
 
-`add_matrices_kernel` operates using `Ref`s that live in VMEM. Loading from a VMEM `Ref` produces a value that lives in VREGs. Values in VREGs behave like `jax.Array`s in that we can use `jnp` and `jax.lax` operations on then to produce new values that live in VREGs. When we produce the values we'd like to return, we store them in the output VMEM `Ref`.
+`add_matrices_kernel` operates using `Ref`s that live in VMEM. Loading from a VMEM `Ref` produces a value that lives in VREGs. Values in VREGs behave like `jax.Array`s in that we can use `jnp` and `jax.lax` operations on them to produce new values that live in VREGs. When we produce the values we'd like to return, we store them in the output VMEM `Ref`.
 
 The `add_matrices` function acts on `jax.Array`s and returns a `jax.Array`. Inside it, we pass `x` and `y` into `pallas_call`. `pallas_call` is responsible for copying `x` and `y` into VMEM and for allocating the VMEM buffers that the kernel operates on (including allocating `z_vmem_ref`, the output VMEM buffer). After the kernel function is finished running, `pallas_call` will also copy the value in `z_vmem_ref` to HBM, resulting in an output `jax.Array`.
 
@@ -311,7 +311,7 @@ naive_sum(x)
 
 +++ {"id": "Kv9qJYJY4jbK"}
 
-Notice how we've set up the `BlockSpec`s: we're loading the entirety of the `(512, 512)` dimension into VMEM (no pipelining there) but selecting the `i`-th dimension of `x` each iteration in the `index_map`. We are using a `None` for that dimension in the block shape, which indicates that we are selecting a singleton dimension from `x` that we would like squeeze away in the kernel. Therefore, `x_ref` is `(512, 512)`-shaped in VMEM as well.
+Notice how we've set up the `BlockSpec`s: we're loading the entirety of the `(512, 512)` dimension into VMEM (no pipelining there) but selecting the `i`-th dimension of `x` each iteration in the `index_map`. We are using a `None` for that dimension in the block shape, which indicates that we are selecting a singleton dimension from `x` that we would like to squeeze away in the kernel. Therefore, `x_ref` is `(512, 512)`-shaped in VMEM as well.
 
 `out_spec` uses `lambda i: (0, 0)` as its `index_map`, indicating that `o_ref` is unchanged over the course of the pipeline. This means that we can update its value each iteration by reading from and writing to it. Or can it? Actually there is one catch: *`o_ref` is initially garbage*, meaning we'll be accumulating into garbage. This will result in the overall function outputting the incorrect value!
 
@@ -359,7 +359,7 @@ Some TPU chips have two TensorCores but appear as one device to JAX users. This
 
 Conceptually, TPUs in Megacore behave like very simple GPUs, i.e. they have only two threads. How do we modify our kernels to utilize both TensorCores simultaneously?
 
-The basic idea is that if we have embarassingly parallel dimensions in our computation, we can split up those dimensions across the TensorCores. We can indicate which dimensions are parallelizable by providing an annotation to `pallas_call` called `dimension_semantics`.
+The basic idea is that if we have embarrassingly parallel dimensions in our computation, we can split up those dimensions across the TensorCores. We can indicate which dimensions are parallelizable by providing an annotation to `pallas_call` called `dimension_semantics`.
 
 ```{code-cell}
 :id: nQNa8RaQ-TR1
diff --git a/docs/tutorials/advanced-autodiff.md b/docs/tutorials/advanced-autodiff.md
@@ -38,7 +38,7 @@ JAX's autodiff makes it easy to compute higher-order derivatives, because the fu
 
 The single-variable case was covered in the {ref}`automatic-differentiation` tutorial, where the example showed how to use {func}`jax.grad` to compute the the derivative of $f(x) = x^3 + 2x^2 - 3x + 1$.
 
-In the multi-variable case, higher-order derivatives are more complicated. The second-order derivative of a function is represented by its [Hessian matrix](https://en.wikipedia.org/wiki/Hessian_matrix), defined according to:
+In the multivariable case, higher-order derivatives are more complicated. The second-order derivative of a function is represented by its [Hessian matrix](https://en.wikipedia.org/wiki/Hessian_matrix), defined according to:
 
 $$(\mathbf{H}f)_{i,j} = \frac{\partial^2 f}{\partial_i\partial_j}.$$
 
diff --git a/docs/tutorials/jax-primitives.md b/docs/tutorials/jax-primitives.md
@@ -406,7 +406,7 @@ If you attempt now to use reverse differentiation, you'll notice that JAX starts
 When computing the reverse differentiation, JAX first performs an abstract evaluation of the forward differentiation code `multiply_add_value_and_jvp` to obtain a  trace of primitives that compute the output tangent.
 
 - Observe that JAX performs this abstract evaluation with concrete values for the differentiation point, and abstract values for the tangents.
-- Notice that JAX uses the special abstract tangent value `Zero` for the tangent corresponding to the 3rd argument of `ma`. This reflects the fact that you do not differentiate w.r.t. the secibd argument to `square_add_prim`, which flows to the third argument to `multiply_add_prim`.
+- Notice that JAX uses the special abstract tangent value `Zero` for the tangent corresponding to the third argument of `ma`. This reflects the fact that you do not differentiate w.r.t. the second argument to `square_add_prim`, which flows to the third argument to `multiply_add_prim`.
 - Notice also that during the abstract evaluation of the tangent you pass the value `0.0` as the tangent for the third argument. This is because of the use of the `make_zero` function in the definition of `multiply_add_value_and_jvp`.
 
 ```{code-cell}
diff --git a/docs/tutorials/jaxpr.md b/docs/tutorials/jaxpr.md
@@ -195,7 +195,7 @@ def func7(arg):
 print(make_jaxpr(func7)(5.))
 ```
 
-In this case, the boolean predicate is converted to an integer index (0 or 1), and `branches` are jaxprs that correspond to the false and true branch functionals, in that order. Again, each functional takes one input variable, corresponding to `xfalse` and `xtrue` respectively.
+In this case, the boolean predicate is converted to an integer index (0 or 1), and `branches` are jaxprs that correspond to the false and true branch functionals, in that order. Again, each function takes one input variable, corresponding to `xfalse` and `xtrue` respectively.
 
 The following example shows a more complicated situation when the input to the branch functionals is a tuple, and the `false` branch functional contains a constant `jnp.ones(1)` that is hoisted as a `constvar`.
 
@@ -218,7 +218,7 @@ lax.while_loop(cond_fun: (C -> bool), body_fun: (C -> C), init: C) -> C
 lax.fori_loop(start: int, end: int, body: (int -> C -> C), init: C) -> C
 ```
 
-In the above signature, `C` stands for the type of a the loop “carry” value. For example, here is an example `fori_loop`:
+In the above signature, `C` stands for the type of the loop “carry” value. For example, here is an example `fori_loop`:
 
 ```{code-cell}
 import numpy as np
diff --git a/docs/tutorials/quickstart.md b/docs/tutorials/quickstart.md
@@ -89,7 +89,7 @@ For more on JIT compilation in JAX, check out {ref}`jit-compilation`.
 
 ## Taking derivatives with {func}`~jax.grad`
 
-In addition to evaluating numerical functions, we can also to transform them.
+In addition to evaluating numerical functions, we can also transform them.
 One transformation is [automatic differentiation (autodiff)](https://en.wikipedia.org/wiki/Automatic_differentiation).
 In JAX, you can compute gradients with the {func}`~jax.grad` function.
 
diff --git a/docs/tutorials/working-with-pytrees.md b/docs/tutorials/working-with-pytrees.md
@@ -15,7 +15,7 @@ kernelspec:
 (working-with-pytrees)=
 # Working with pytrees
 
-JAX has built-in support for objects that look like dictionaries (dicts) of arrays, or lists of lists of dicts, or other nested structures — they are called JAX pytrees (also known as nests, or just trees). Often, in JAX, you want to operate over these nested pytrees. This tutorial will explain how to use them, provide useful code example, and point out common "gotchas" and patterns.
+JAX has built-in support for objects that look like dictionaries (dicts) of arrays, or lists of lists of dicts, or other nested structures — they are called JAX pytrees (also known as nests, or just trees). Often, in JAX, you want to operate over these nested pytrees. This tutorial will explain how to use them, provide useful code examples, and point out common "gotchas" and patterns.
 
 
 (pytrees-what-is-a-pytree)=
@@ -82,7 +82,7 @@ list_of_lists = [
 jax.tree_map(lambda x: x*2, list_of_lists)
 ```
 
-{func}`jax.tree_map` also allows to map a [N-ary](https://en.wikipedia.org/wiki/N-ary) function over multiple arguments. For example:
+{func}`jax.tree_map` also allows mapping a [N-ary](https://en.wikipedia.org/wiki/N-ary) function over multiple arguments. For example:
 
 ```{code-cell}
 another_list_of_lists = list_of_lists

Original file line number	Diff line number	Diff line change
`@@ -410,7 +410,7 @@`
`410`	`410`	`"## Communication between devices\n",`
`411`	`411`	`"\n",`
`412`	`412`	`"The above is enough to perform simple parallel operations, e.g. batching a simple MLP forward pass across several devices. However, sometimes we need to pass information between the devices. For example, perhaps we are interested in normalizing the output of each device so they sum to 1.\n",`
`413`		- "For that, we can use special [collective ops](https://jax.readthedocs.io/en/latest/jax.lax.html#parallel-operators) (such as the `jax.lax.p*` ops `psum`, `pmean`, `pmax`, ...). In order to use the collective ops we must specify the name of the `pmap`-ed axis through `axis_name` argument, and then refer to it when calling the op. Here's how to do that:"
	`413`	+ "For that, we can use special [collective ops](https://jax.readthedocs.io/en/latest/jax.lax.html#parallel-operators) (such as the `jax.lax.p*` ops `psum`, `pmean`, `pmax`, ...). In order to use the collective ops we must specify the name of the `pmap`-ed axis through the `axis_name` argument, and then refer to it when calling the op. Here's how to do that:"
`414`	`414`	`]`
`415`	`415`	`},`
`416`	`416`	`{`