diff --git a/lectures/_static/lecture_specific/about_py/pytorch_vs_matlab.png b/lectures/_static/lecture_specific/about_py/pytorch_vs_matlab.png new file mode 100644 index 00000000..7230ad09 Binary files /dev/null and b/lectures/_static/lecture_specific/about_py/pytorch_vs_matlab.png differ diff --git a/lectures/about_py.md b/lectures/about_py.md index 2c80dde1..8f5a4307 100644 --- a/lectures/about_py.md +++ b/lectures/about_py.md @@ -21,7 +21,7 @@ kernelspec: ```{index} single: python ``` -# About Python +# About These Lectures ```{epigraph} "Python has gotten sufficiently weapons grade that we don’t descend into R @@ -31,105 +31,266 @@ into R." -- Chris Wiggins ## Overview +This lecture series will teach you to use Python for scientific computing, with +a focus on economics and finance. + +The series is aimed at Python novices, although experienced users will also find useful content in later lectures. + In this lecture we will -* outline what Python is -* compare it to some other languages -* showcase some of its abilities. +* introduce Python, +* showcase some of its abilities, +* discuss the connection between Python and AI, +* explain why Python is our favorite language for scientific computing, and +* point you to the next steps. + +You do **not** need to understand everything you see in this lecture -- we will work through the details slowly later in the lecture series. + + +### Can't I Just Use ChatGPT? + +No! + +It's tempting to think that in the age of AI we don't need to learn how to code. + +And it's true that AIs like [ChatGPT](https://chatgpt.com/) and other LLMs are wonderful productivity tools for coders. + +In fact an AI can be a great companion for these lectures -- try copy-pasting some code from this series and ask the AI to explain it to you. + +AIs will certainly help you write pieces of code that you can combine. + +But AIs cannot completely and reliably solve a new problem that they haven't seen before! + +You will need to be the supervisor -- and for that you need to be able to read, write, and understand computer code. -At this stage, it's **not** our intention that you try to replicate all you see. -We will work through what follows at a slow pace later in the lecture series. +### Isn't MATLAB Better? + +No, no, and one hundred times no. + +For almost all modern problems, Python's scientific libraries are now far in advance of MATLAB's capabilities. + +We will explain the benefits of Python's libraries throughout this lecture +series, as well as in our later series on [JAX](https://jax.quantecon.org/intro.html). + +We will also explain how Python's elegant design helps you write clean, efficient code. + +On top of these features, Python is more widely used, with a huge and helpful community, and free! -Our only objective for this lecture is to give you some feel of what Python is, and what it can do. ## What's Python? -[Python](https://www.python.org) is a general-purpose programming language conceived in 1989 by Dutch programmer [Guido van Rossum](https://en.wikipedia.org/wiki/Guido_van_Rossum). -Python is free and open source, with development coordinated through the [Python Software Foundation](https://www.python.org/psf/). +[Python](https://www.python.org) is a general-purpose programming language conceived in 1989 by [Guido van Rossum](https://en.wikipedia.org/wiki/Guido_van_Rossum). + +Python is free and [open source](https://en.wikipedia.org/wiki/Open_source), with development coordinated through the [Python Software Foundation](https://www.python.org/psf/). + +This is important because it + +* saves us money, +* means that Python is controlled by the community of users rather than a for-profit corporation, and +* encourages reproducibility and [open science](https://en.wikipedia.org/wiki/Open_science). + -Python has experienced rapid adoption in the last decade and is now one of the [most popular programming languages](https://www.tiobe.com/tiobe-index/). ### Common Uses -{index}`Python ` is a general-purpose language used in almost all application domains such as +{index}`Python ` is a general-purpose language used in almost all application domains, including -* AI +* AI +* scientific computing * communication * web development * CGI and graphical user interfaces * game development * resource planning -* multimedia, data science, security, etc., etc., etc. +* multimedia +* etc. -Used and supported extensively by Internet services and high-tech companies including +It is used and supported extensively by tech firms including * [Google](https://www.google.com/) +* [OpenAI](https://openai.com/) * [Netflix](https://www.netflix.com/) * [Meta](https://opensource.fb.com/) * [Dropbox](https://www.dropbox.com/) * [Amazon](https://www.amazon.com/) * [Reddit](https://www.reddit.com/) +* etc. -For reasons we will discuss, Python is particularly popular within the scientific community - -Meanwhile, Python is also very beginner-friendly and is found to be suitable for -students learning programming and recommended to introduce computational methods -to students in fields other than computer science. -Python is also replacing familiar tools like Excel as an essential skill in the fields of finance and banking. ### Relative Popularity -The following chart, produced using Stack Overflow Trends, shows one measure of the relative popularity of Python +Python is, without doubt, one of the [most popular programming languages](https://www.tiobe.com/tiobe-index/). + +Python libraries like [pandas](https://pandas.pydata.org/) and [Polars](https://pola.rs/) are replacing familiar tools like Excel and VBA as an essential skill in the fields of finance and banking. + +Moreover, Python is extremely popular within the scientific community -- especially AI -```{figure} /_static/lecture_specific/about_py/python_vs_matlab.png +The following chart, produced using Stack Overflow Trends, provides some evidence. + +It shows the popularity of a Python AI library called [PyTorch](https://pytorch.org/) relative to MATLAB. + +```{figure} /_static/lecture_specific/about_py/pytorch_vs_matlab.png ``` -The figure indicates not only that Python is widely used but also that adoption of Python has accelerated significantly since 2012. +The chart shows that MATLAB's popularity has faded, while PyTorch is growing rapidly. -This is driven at least in part by uptake in the scientific domain, particularly in rapidly growing fields like data science and AI. +Moreover, PyTorch is just one of the thousands of Python libraries available for scientic computing. ### Features -Python is a [high-level language](https://en.wikipedia.org/wiki/High-level_programming_language) suitable for rapid development. +Python is a [high-level language](https://en.wikipedia.org/wiki/High-level_programming_language), which means it is relatively easy to read, write and debug. + +It has a relatively small core language that is easy to learn. -It has a relatively small core language supported by many libraries. +This core is supported by many libraries, which you can learn to use as required. + +Python is very beginner-friendly + +* suitable for students learning programming +* used in many undergraduate and graduate programs Other features of Python: * multiple programming styles are supported (procedural, object-oriented, functional, etc.) -* it is interpreted rather than compiled. +* [interpreted](https://en.wikipedia.org/wiki/Interpreter_(computing)) rather than [compiled](https://en.wikipedia.org/wiki/Compiler) ahead of time. + + ### Syntax and Design ```{index} single: Python; syntax and design ``` -One nice feature of Python is its elegant syntax --- we'll see many examples later on. +One reason for Python's popularity is its simple and elegant design --- we'll see many examples later on. + +To get a feeling for this, let's look at an example. + +The code below is written in [Java](https://en.wikipedia.org/wiki/Java_(programming_language)) rather than Python. + +You do **not** need to read and understand this code! + + +```{code-block} java + +import java.io.BufferedReader; +import java.io.FileReader; +import java.io.IOException; + +public class CSVReader { + public static void main(String[] args) { + String filePath = "data.csv"; + String line; + String splitBy = ","; + int columnIndex = 1; + double sum = 0; + int count = 0; + + try (BufferedReader br = new BufferedReader(new FileReader(filePath))) { + while ((line = br.readLine()) != null) { + String[] values = line.split(splitBy); + if (values.length > columnIndex) { + try { + double value = Double.parseDouble( + values[columnIndex] + ); + sum += value; + count++; + } catch (NumberFormatException e) { + System.out.println( + "Skipping non-numeric value: " + + values[columnIndex] + ); + } + } + } + } catch (IOException e) { + e.printStackTrace(); + } + + if (count > 0) { + double average = sum / count; + System.out.println( + "Average of the second column: " + average + ); + } else { + System.out.println( + "No valid numeric data found in the second column." + ); + } + } +} + +``` + +This Java code opens an imaginary file called `data.csv` and computes the mean +of the values in the second column. + +Even without knowing Java, you can see that the program is long and complex. + +Here's Python code that does the same thing. + +Even if you don't yet know Python, you can see that the code is simpler and +easier to read. + +```{code-cell} python3 +:tags: [skip-execution] + +import csv + +total, count = 0, 0 +with open(data.csv, mode='r') as file: + reader = csv.reader(file) + for row in reader: + try: + total += float(row[1]) + count += 1 + except (ValueError, IndexError): + pass +print(f"Average: {total / count if count else 'No valid data'}") + +``` + +The simplicity of Python and its neat design are a big factor in its popularity. + + +### The AI Connection + +Unless you have been living under a rock and avoiding all contact with the +modern world, you will know that AI is rapidly advancing. + +AI is already remarkably good at helping you write code, as discussed above. + +No doubt AI will take over many tasks currently performed by humans, +just like other forms of machinery have done over the past few centuries. -Elegant code might sound superfluous but in fact it's highly beneficial because it makes the syntax easy to read and easy to remember. +Python is playing a huge role in the advance of AI and machine learning. -Closely related to elegant syntax is an elegant design. +This means that tech firms are pouring money into development of extremely +powerful Python libraries. -Features like iterators, generators, decorators and list comprehensions make Python highly expressive, allowing you to get more done with less code. +Even if you don't plan to work on AI and machine learning, you can benefit from +learning to use some of these libraries for your own projects in economics, +finance and other fields of science. -[Namespaces](https://en.wikipedia.org/wiki/Namespace) improve productivity by cutting down on bugs and syntax errors. +These lectures will explain how. -## Scientific Programming +## Scientific Programming with Python ```{index} single: scientific programming ``` -Python has become one of the core languages of scientific computing. +We have already discussed the importance of Python for AI, machine learning and data science -It's either the dominant player or a major player in +Let's take a look at the role of Python in other areas of scientific computing. + +Python is either the dominant player or a major player in -* AI, machine learning and data science * astronomy * chemistry * computational biology @@ -137,39 +298,87 @@ It's either the dominant player or a major player in * natural language processing * etc. -This section briefly showcases some examples of Python for scientific programming. +Use of Python is also rising in economics, finance, and adjacent fields like +operations research -- which were previously dominated by MATLAB / Excel / STATA / C / Fortran. + +This section briefly showcases some examples of Python for general scientific programming. -* All of these topics below will be covered in detail later on. -### Numerical Programming +### NumPy ```{index} single: scientific programming; numeric ``` -Fundamental matrix and array processing capabilities are provided by the excellent [NumPy](http://www.numpy.org/) library. +One of the most important parts of scientific computing is working with data. -NumPy provides the basic array data type plus some simple processing operations. +Data is often stored in matrices, vectors and arrays. -For example, let's build some arrays +We can create a simple array of numbers with pure Python as follows: + +```{code-cell} python3 +a = [-3.14, 0, 3.14] # A Python list +a +``` + +This array is very small so it's fine to work with pure Python. + +But when we want to work with larger arrays in real programs we need more efficiency and more tools. + +For this we need to use libraries for working with arrays. + +For Python, the most important matrix and array processing library is +[NumPy](http://www.numpy.org/) library. + +For example, let's build a NumPy array with 100 elements ```{code-cell} python3 import numpy as np # Load the library a = np.linspace(-np.pi, np.pi, 100) # Create even grid from -π to π +a +``` + +Now let's transform this array by applying functions to it. + +```{code-cell} python3 b = np.cos(a) # Apply cosine to each element of a c = np.sin(a) # Apply sin to each element of a ``` -Now let's take the inner product +Now we can easily take the inner product of `b` and `c`. ```{code-cell} python3 b @ c ``` -The number you see here might vary slightly but it's essentially zero. +We can also do many other tasks, like + +* compute the mean and variance of arrays +* build matrices and solve linear systems +* generate random arrays for simulation, etc. + +We will discuss the details later in the lecture series, where we cover NumPy in depth. + -(For older versions of Python and NumPy you need to use the [np.dot](http://docs.scipy.org/doc/numpy/reference/generated/numpy.dot.html) function) +### NumPy Alternatives + +While NumPy is still the king of array processing in Python, there are now +important competitors. + +Libraries such as [JAX](https://github.com/google/jax), [Pytorch](https://pytorch.org/), and [CuPy](https://cupy.dev/) also have +built in array types and array operations that can be very fast and efficient. + +In fact these libraries are better at exploiting parallelization and fast hardware, as +we'll explain later in this series. + +However, you should still learn NumPy first because + +* NumPy is simpler and provides a strong foundation, and +* libraries like JAX directly extend NumPy functionality and hence are easier to + learn when you already know NumPy. + +### SciPy The [SciPy](http://www.scipy.org) library is built on top of NumPy and provides additional functionality. @@ -196,11 +405,16 @@ SciPy includes many of the standard routines used in See them all [here](http://docs.scipy.org/doc/scipy/reference/index.html). +Later we'll discuss SciPy in more detail. + + ### Graphics ```{index} single: Matplotlib ``` +A major strength of Python is data visualization. + The most popular and comprehensive Python library for creating figures and graphs is [Matplotlib](http://matplotlib.org/), with functionality including * plots, histograms, contour images, 3D graphs, bar charts etc. @@ -235,14 +449,28 @@ Other graphics libraries include You can visit the [Python Graph Gallery](https://www.python-graph-gallery.com/) for more example plots drawn using a variety of libraries. + ### Networks and Graphs -Python has many libraries for studying graphs. +The study of networks and graphs becoming an important part of scientific work +in economics, finance and other fields. + +For example, we are interesting in studying + +* production networks +* networks of banks and financial institutions +* friendship and social networks +* etc. + +(We have a [book on economic networks](https://networks.quantecon.org/) if you would like to learn more.) + +Python has many libraries for studying networks and graphs. ```{index} single: NetworkX ``` One well-known example is [NetworkX](http://networkx.github.io/). + Its features include, among many other things: * standard graph algorithms for analyzing networks @@ -278,28 +506,34 @@ nx.draw_networkx_nodes(g, plt.show() ``` + ### Other Scientific Libraries -Here's a short list of more important scientific libraries for Python. +As discussed above, there are literally thousands of scientific libraries for +Python. + +Some are small and do very specific tasks. + +Others are huge in terms of lines of code and investment from coders and tech +firms. + +Here's a short list of some important scientific libraries for Python not +mentioned above. * [SymPy](http://www.sympy.org/) for symbolic algebra, including limits, derivatives and integrals -* [pandas](http://pandas.pydata.org/) for data maniputation * [statsmodels](http://statsmodels.sourceforge.net/) for statistical routines * [scikit-learn](http://scikit-learn.org/) for machine learning -* [JAX](https://github.com/google/jax) for automatic differentiation, accelerated linear algebra and GPU computing -* [PyTorch](https://pytorch.org/) for deep learning * [Keras](https://keras.io/) for machine learning * [Pyro](https://pyro.ai/) and [PyStan](https://pystan.readthedocs.org/en/latest/) for Bayesian data analysis -* [lifelines](https://lifelines.readthedocs.io/en/latest/) for survival analysis * [GeoPandas](https://geopandas.org/en/stable/) for spatial data analysis * [Dask](https://docs.dask.org/en/stable/) for parallelization * [Numba](http://numba.pydata.org/) for making Python run at the same speed as native machine code * [CVXPY](https://www.cvxpy.org/) for convex optimization -* [PyTables](http://www.pytables.org) for managing large data sets * [scikit-image](https://scikit-image.org/) and [OpenCV](https://opencv.org/) for processing and analysing image data -* [FLAML](https://mlflow.org/docs/latest/index.html) for automated machine learning and hyperparameter tuning * [BeautifulSoup](https://www.crummy.com/software/BeautifulSoup/bs4/doc/) for extracting data from HTML and XML files In this lecture series we will learn how to use many of these libraries for scientific computing tasks in economics and finance. + +