Очень рекомендуют видео - Stanford CS229 I Machine Learning I Building Large Language Models (LLMs) __00:10 Building Large Language Models overview 02:21 Focus on data evaluation and systems in industry over architecture 06:25 Auto regressive language models predict the next word in a sentence. 08:26 Tokenizing text is crucial for language models 12:38 Training a large language model involves using a large corpus of text. 14:49 Tokenization process considerations 18:40 Tokenization improvement in GPT 4 for code understanding 20:31 Perplexity measures model hesitation between tokens 24:18 Comparing outputs and model prompting 26:15 Evaluation of language models can yield different results 30:15 Challenges in training large language models 32:06 Challenges in building large language models 35:57 Collecting real-world data is crucial for large language models 37:53 Challenges in building large language models 41:38 Scaling laws predict performance improvement with more data and larger models 43:33 Relationship between data, parameters, and compute 47:21 Importance of scaling laws in model performance 49:12 Quality of data matters more than architecture and losses in scaling laws 52:54 Inference for large language models is very expensive 54:54 Training large language models is costly 59:12 Post training aligns language models for AI assistant use 1:01:05 Supervised fine-tuning for large language models 1:04:50 Leveraging large language models for data generation and synthesis 1:06:49 Balancing data generation and human input for effective learning 1:10:23 Limitations of human abilities in generating large language models 1:12:12 Training language models to maximize human preference instead of cloning human behaviors. 1:16:06 Training reward model using softmax logits for human preferences. 1:18:02 Modeling optimization and challenges in large language models (LLMs) 1:21:49 Reinforcement learning models and potential benefits 1:23:44 Challenges with using humans for data annotation 1:27:21 LLMs are cost-effective and have better agreement with humans than humans themselves 1:29:12 Perplexity is not calibrated for large language models 1:33:00 Variance in performance of GPT-4 based on prompt specificity 1:34:51 Pre-training data plays a vital role in model initialization 1:38:32 Utilize GPUs efficiently with matrix multiplication 1:40:21 Utilizing 16 bits for faster training in deep learning 1:44:08 Building Large Language Models from scratch__