I trained a 1.8M params model from scratch on a total of ~40M tokens. https://www.reddit.com/r/LocalLLaMA/comments/1qym566/i_trained_a_18m_params_model_from_scratch_on_a/ https://github.com/SrijanSriv211/Strawberry
I trained a 1.8M params model from scratch on a total of ~40M tokens.…
0 viewsОткрыть в Telegram →
Из этого канала
- #5969надо в модельки дома затащить)
надо в модельки дома затащить)
- #5971Dr. Kernel: Reinforcement Learning Done Right for Triton Kernel Generations…
Dr. Kernel: Reinforcement Learning Done Right for Triton Kernel Generations https://arxiv.org/abs/2602.05885 https://www.alphaxiv.org/overview/2602.05885…
- #5972https://github.com/hkust-nlp/KernelGYM
https://github.com/hkust-nlp/KernelGYM
- #5967https://github.com/researchim-ai/re-rl заехало
https://github.com/researchim-ai/re-rl заехало
- #5966BFS-Prover: Scalable Best-First Tree Search for LLM-based Automatic Theorem…
BFS-Prover: Scalable Best-First Tree Search for LLM-based Automatic Theorem Proving https://arxiv.org/abs/2502.03438…