Comprehensive Training Pipelines: Full support for Diffusion Language Models (DLMs) and Autoregressive LMs, from pre-training and SFT to RL, on both dense and MoE architectures. We strongly recommend ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results