I noticed that in the quantizer.py, you have a init_embed_ function which use data to init the embedding weight. In a distributed training environment, each rank has their own data, which leads to ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results