Will future large video model (understanding) use pixel loss or embedding loss? | Manifold

Will future large video model (understanding) use pixel loss or embedding loss?

Mini

7

Ṁ92

2028

1D

1W

1M

ALL

35%

Pixel loss

27%

Embedding Loss

38%

Neither

Examples of models with pixel loss:

MAE
iGPT
LVM

Examples of models with embedding loss:

I-JEPA

If end up people use diffusion model (DDPM) to pretrain large video understanding model, then resolve pixel level.

Will resolve in EOY 2027 by consulting expert/public opinions. Among all factors that decides the resolution, the paradigm that the SOTA video understanding model uses will be most indicative.

Discrete cross entropy loss (transformer+vqvae) will resolv eto Neither

Get Ṁ1,000 play money

Sort by:

I thought pixel loss will increase with DDPM

here is the ambiguity part. What if someone uses V-JEPA as encoder, diffusion as backbone, and something else as decoder?

Related questions

Will video dominate 2024 machine learning?

Will OpenAI's next major LLM release support video input?

Will video generation AI make more product revenue than text models in 2025?

Will we learn by EOY 2024 that large AI labs use something like activation addition on their best models?

Will Sparse Autoencoders be successfully used on a downstream task in the next year and beat baselines?

Will OpenAI release next-generation models with varying capabilities and sizes?

Will AI figure out who has the highest visual frame compression efficiency to working memory ratio by 2030?

Related questions

Will video dominate 2024 machine learning?

Will Sparse Autoencoders be successfully used on a downstream task in the next year and beat baselines?

Will OpenAI's next major LLM release support video input?

Will OpenAI release next-generation models with varying capabilities and sizes?

Will video generation AI make more product revenue than text models in 2025?

Will AI figure out who has the highest visual frame compression efficiency to working memory ratio by 2030?

Will we learn by EOY 2024 that large AI labs use something like activation addition on their best models?