(under construction)
- What is a Large Language Model (LMM)?
- What is the building block of an LLM?
- LLMs, self-attention mechanism: The self-attention mechanism is the core concept of transformer-based LLMs. Here, we review the formulae of this mechanism and implement a self-attention from scratch in Python.
- LLMs, the softmax in self-attention: We remind the softmax function ,which is widely used in neural networks, deep learning, and machine learning. The function softmax is implemented in Python with an example.
- LLMs: Layer normalization: Layer normalization is a critical component of Transformers and LLMs, ensuring stable and efficient training by normalizing activations across the feature dimension. It is particularly well-suited for sequence-based tasks and deep architectures. Here, we implement the layer normalization with Numpy. Moreover, we give the code of PyTorch for the layer normalization so that you can compare the results.