REGEN: Learning Compact Video Embedding with (Re-)Generative Decoder

1 Adobe Research    2 Northeastern University

ArXiv Project Gallery

Reconstruction Comparison
4X
Reconstruction Comparison
8X
Reconstruction Comparison
16X

Reconstruction Comparison
32X
Text-to-Video Generation
32X Latent

Reconstruction Comparison (8× Temporal Compression, zdim=8)

We provide visual examples for reconstruction at 8x temporal compression (8x8x8) with 8 latent channels and we compare with MAGVIT-v2 at the resolution of 512x512. MAGVIT-v2 exhibits much more severe artifacts at high motion videos compared to our method.

Reference

MAGVIT-v2

REGEN