I trained a 1.8M params model from scratch on a total of ~40M tokens. https://www.reddit.com/r/LocalLLaMA/comments/1qym566/i_trained_a_18m_params_model_from_scratch_on_a/ https://github.com/SrijanSriv211/Strawberry