Очень рекомендуют видео - Stanford CS229 I Machine Learning I Building Large Language Models (LLMs) __00:10 Building Large Language Models overview 02:21 Focus on data evaluation and systems in industry over architecture 06:25 Auto regressive language models predict the next word in a sentence. 08:26 Tokenizing text is crucial for language models 12:38 Training a large language model involves using a large corpus of text. 14:49 Tokenization process considerations 18:40 Tokenization improvement in GPT 4 for code understanding 20:31 Perplexity measures model hesitation between tokens 24:18 Comparing outputs and model prompting 26:15 Evaluation of language models can yield different results 30:15 Challenges in training large language models 32:06 Challenges in building large language models 35:57 Collecting real-world data is crucial for large language models 37:53 Challenges in building large language models 41:38 Scaling laws predict performance improvement with more data and larger models 43:33 Relationship between data, parameters, and compute 47:21 Importance of scaling laws in model performance 49:12 Quality of data matters more than architecture and losses in scaling laws 52:54 Inference for large language models is very expensive 54:54 Training large language models is costly 59:12 Post training aligns language models for AI assistant use 1:01:05 Supervised fine-tuning for large language models 1:04:50 Leveraging large language models for data generation and synthesis 1:06:49 Balancing data generation and human input for effective learning 1:10:23 Limitations of human abilities in generating large language models 1:12:12 Training language models to maximize human preference instead of cloning human behaviors. 1:16:06 Training reward model using softmax logits for human preferences. 1:18:02 Modeling optimization and challenges in large language models (LLMs) 1:21:49 Reinforcement learning models and potential benefits 1:23:44 Challenges with using humans for data annotation 1:27:21 LLMs are cost-effective and have better agreement with humans than humans themselves 1:29:12 Perplexity is not calibrated for large language models 1:33:00 Variance in performance of GPT-4 based on prompt specificity 1:34:51 Pre-training data plays a vital role in model initialization 1:38:32 Utilize GPUs efficiently with matrix multiplication 1:40:21 Utilizing 16 bits for faster training in deep learning 1:44:08 Building Large Language Models from scratch__
Очень рекомендуют видео - Stanford CS229 I Machine Learning I Building Large…
Из этого канала
- #5098Недавно наткнулся на классный материал про Growth Engineering — направление,…
Недавно наткнулся на классный материал про Growth Engineering — направление, которое активно развивается в крупных компаниях вроде Meta, Airbnb и Dropbox.
- #5100Список сайтов для поиска удаленной работы: 1. SimplyHired (simplyhired.com) 2.…
Список сайтов для поиска удаленной работы: 1. SimplyHired (simplyhired.com) 2. Jobspresso (jobspresso.co) 3. Stack Overflow Jobs (stackoverflow.com) 4.
- #5101Не так давно был пост про Trisigma — платформу для автоматизации…
Не так давно был пост про Trisigma — платформу для автоматизации A/B-тестирования и аналитики от команды Авито.
- #5095В 7 утра я обычно работаю в living room пока все спят, а потом уже иду в офис…
В 7 утра я обычно работаю в living room пока все спят, а потом уже иду в офис смотреть на прохожих в окно👆 Чем меньше вещей вокруг тем продуктивней работа.
- #5094Недавно проходил собес на VP Analytics&Data Engineering в Американскую большую…
Недавно проходил собес на VP Analytics&Data Engineering в Американскую большую компанию, вот это тема, никто вас не спрашивает про leetcode, алгоритмы и другие…