First of all, thank you for your great work! I noticed this sentence in the paper: "To enable joint training across images and videos, we dynamically disable context parallelism during image ...
I got data from slam、navigation、reconstruct ... changed the trajectory into action with 7 guidance ([x,y,z,r,p,y],last item always be 1.0). Generate about 70000 samples (each sample has 36 frame-12 ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results