This repository contains materials for a workshop on using Large Language Models (LLMs) in social science research. The tutorial focuses on practical implementations using the langchain
package and OpenAI's GPT-3.5-turbo model.
The tutorial covers:
- Introduction to generative LLMs and their applications in social science
- Implementation of two key methods:
- Chat completion for text annotation
- Retrieval-Augmented Generation (RAG)
LLM_Tutorial.ipynb
: Main Jupyter notebook containing the tutorialLLM_Tutorial.html
: An html file rendered from the Jupyter notebook.data/
(gitignored but is required to run the file locally): Data for this tutorial is downloaded from the Global Populism Dataset.
- Setting up OpenAI API and LangChain
- Handling text encoding and chunking
- Prompt engineering
- Chain creation and execution
- Validation using Krippendorff's alpha
- Word embeddings
- Vector stores
- Document retrieval
- Response generation
- Performance evaluation
- Python packages:
- langchain
- openai
- pandas
- scikit-learn
- dotenv
- tiktoken
- krippendorff