We provide visual examples for reconstruction at 8x temporal compression (8x8x8) with 8 latent channels and we compare with MAGVIT-v2 at the resolution of 512x512. MAGVIT-v2 exhibits much more severe artifacts at high motion videos compared to our method.
Reference
MAGVIT-v2
REGEN