Speech and Language Processing (3rd ed. draft)
Dan Jurafsky and James H. Martin

Here's our August 24, 2025 release!

This release has

  • preference alignment with DPO in the posttraining Chapter 9
  • completely new ASR (Whisper) and TTS (EnCodec and VALL-E) material in Chapter 15 and 16
  • a restructuring of earlier chapters to fit how we are teaching now:
    • move Naive Bayes to the Appendix and instead using Logistic Regression to teach about classification
    • Moving PPMI to the appendix and tf-idf only in Chapter 11, to move more quickly through sparse vectors
  • the concept of LLMs, and LLM sampling and training introduced in chapter 7, before introducing the internals with the transformer in Chapter 8.
  • RNN/LSTM chapter delayed to 13, because students have asked to go directly to Transformers without first learning RNNs. The new structure allows either order (LSTM/Transformer or Transformer/LSTM).
  • a restructured Chapter 2 to focus more on tokens and words and introduce Unicode.
  • typo fixes (thanks again to all of you!)
  • some new slides
  • The dialogue and chatbot Chapter was divided up and folded into various other chapters, now that LLMs tend to have replaced most earlier chatbot architectures. Much of the introduction and the ethics section went into the LLM chapter. The summary of human conversational structure went to the new chapter 25 "Conversation and its structure". The frame-based dialogue agents section is currently in Appendix chapter J, although that may change.
Individual chapters and updated slides are below.

Here is a single pdf of Aug 24, 2025 book!

  1. Feel free to use the draft chapters and slides in your classes, print it out, whatever, the resulting feedback we get from you makes the book better!
  2. Typos and comments are very welcome (just email slp3edbugs@gmail.com and let us know the date on the draft)! (Don't bother reporting missing refs due to cross-chapter cross-reference problems in the indvidual chapter pdfs, those are fixed in the full book draft)
  3. Gratitude! We've put up a list here of the amazing people who have sent so many fantastic suggestions and bug-fixes for improving the book. We are really grateful to all of you for your help, the book would not be possible without you!
  4. How to cite the book:

    Daniel Jurafsky and James H. Martin. 2025. Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition with Language Models, 3rd edition. Online manuscript released August 24, 2025. https://web.stanford.edu/~jurafsky/slp3.

  5. A bib entry for the book is here.
    @Book{jm3,
      author =       "Daniel Jurafsky and James H. Martin",
      title =        "Speech and Language Processing: An Introduction to Natural Language Processing, 
      		  Computational Linguistics, and Speech Recognition,
    		   with Language Models",
      year =         "2025",
      url = {https://web.stanford.edu/~jurafsky/slp3/},
      note = "Online manuscript released August 24, 2025",
      edition =         "3rd",
      }
    
  6. When will the book be finished? Don't ask.
  7. If you need the previous Jan 2025 draft chapters, they are here; if you need the previous Aug 2024 draft chapters, they are here;
     
Volume I: Large Language Models
Chapter Slides
1:Introduction
2: Words and Tokens 2: Words and Tokens [pptx] [pdf] 2: Edit Distance [pptx] [pdf]
3: N-gram Language Models 3: [pptx] [pdf]
4: Logistic Regression and Text Classification 4: [pptx] [pdf]
5: Embeddings 5: [pptx] [pdf]
6: Neural Networks 6: [pptx] [pdf]
7: Large Language Models 7: [pptx] [pdf]
8: Transformers 8: [pptx] [pdf]
9: Post-training: Instruction Tuning, Alignment, and Test-Time Compute
10: Masked Language Models 10: [pptx] [pdf]
11: Information Retrieval and Retrieval-Augmented Generation
12: Machine Translation
13: RNNs and LSTMs 13: [pptx] [pdf]
14: Phonetics and Speech Feature Extraction
15: Automatic Speech Recognition
16: Text-to-Speech
 
Volume II: Annotating Linguistic Structure
Chapter Slides
17: Sequence Labeling for Parts of Speech and Named Entities 17: (Intro only) [pptx] [pdf]
18: Context-Free Grammars and Constituency Parsing
19: Dependency Parsing
20: Information Extraction: Relations, Events, and Time
21: Semantic Role Labeling and Argument Structure
22: Lexicons for Sentiment, Affect, and Connotation
23: Coreference Resolution and Entity Linking
24: Discourse Coherence
25: Conversation and its Structure
 
Appendix (will be just on the web)
A: Hidden Markov Models
B: Naive Bayes Classification B: [pptx] [pdf]
C: Kneser-Ney Smoothing
D: Spelling Correction and the Noisy Channel
E: Statistical Constituency Parsing
F: Context-Free Grammars
G: Combinatory Categorial Grammar
H: Logical Representations of Sentence Meaning
I: Word Senses and WordNet
J: PPMI
K: Frame-based Dialogue Systems