Интересный пост от Анимы Анандкумар: __It is interesting that the new DeepSeek Al v3.1 is trained using the UE8M0 FP8 scale data format, which is nothing but the logarithmic number system (LNS), meaning it has only exponent and no mantissa. Our multiplicative weights update (Madam) for training in that format was done several years ago while at NVIDIA It yields maximum hardware efficiency with no accuracy loss ____https://arxiv.org/abs/2106.13914____ Logarithmic number system achieves a higher computational efficiency by transforming expensive multiplication operations in the network layers to inexpensive additions in their logarithmic representations. In addition, it attains a wide dynamic range and can provide a good approximation. Also, logarithmic number system is biologically inspired, and there is evidence that our brains use such a format for storage. However, using standard SGD or Adam optimization for training in logarithmic format is challenging, and requires intermediate updates and optimization states to be stored in full precision (FP32). To overcome this, we proposed Multiple Weights update (Madam) that instead updates directly in the logarithmic format and leads to good training outcomes. Our LNS-Madam when compared to training in FP32 and FP8 formats, LNS-Madam reduces the energy consumption by over 90% and 55%, respectively, while maintaining accuracy.__
Интересный пост от Анимы Анандкумар: It is interesting that the new DeepSeek Al…
Из этого канала
- #3997☝ всех с 70-летием ИИ! https://raysolomonoff.com/dartmouth/boxa/dart564props.pdf
☝ всех с 70-летием ИИ! https://raysolomonoff.com/dartmouth/boxa/dart564props.pdf
- #3999Интересная свежая работа про методичное сравнение разных оптимизаторов.…
Интересная свежая работа про методичное сравнение разных оптимизаторов. https://t.me/gonzoMLpodcasts/786 Новые матричные оптимизаторы in general хороши, но и…
- #4000Хорошая статья-интервью с Демисом Хассабисом была в Гардиане в августе. С…
Хорошая статья-интервью с Демисом Хассабисом была в Гардиане в августе. С каким-то правильным вайбом.
- #3992Что-то интересное про world models, надо внимательно разбираться:…
Что-то интересное про world models, надо внимательно разбираться: https://t.me/gonzoMLpodcasts/772
- #3991Simons Foundation Launches Collaboration on the Physics of Learning and Neural…
Simons Foundation Launches Collaboration on the Physics of Learning and Neural Computation…