Tag
Meta's FAIR team released the code for Flowception, a CVPR 2026 paper presenting a non-autoregressive video generation framework that interleaves frame insertion with continuous denoising to reduce error accumulation and computational cost.
The author demonstrates how to collaborate using Codex, HyperFrames, and Remotion tools to produce a Chinese educational video about declassified UFO files. Additionally, it introduces a Claude Code skills repository on GitHub that automates the organization and analysis of publicly declassified UAP/UFO government documents.
This is an aggregation of trending AI news from Digg, covering topics such as Neuralink brain implants, NVIDIA's performance fixes for Claude Code, Anthropic's policy stances, and the release of Flowception video modeling code.
BACH is introduced as a significant advancement in video generation, achieving unprecedented character consistency across scenes without face morphing or drift.
This paper introduces DeScore, a video reward model that decouples reasoning and scoring processes to improve training efficiency and generalization. It addresses the limitations of existing discriminative and generative reward models by using a 'think-then-score' paradigm with multimodal large language models.
Fluent Frame is a new tool that allows users to ship polished product videos as quickly as they deploy software features.
Stream-T1 is a proposed framework for test-time scaling in streaming video generation, improving temporal consistency and quality through mechanisms like noise propagation and reward pruning. The paper addresses the high computational costs of existing diffusion-based methods by leveraging chunk-level synthesis.
Stream-R1 introduces a reliability-perplexity aware reward distillation framework for streaming video generation that adaptively weights supervision to improve visual and motion quality without additional computational overhead.
Sulphur-2-base is an uncensored video generation model based on LTX 2.3, supporting native text-to-video and image-to-video workflows.
The article discusses the UniVidX paper, which introduces a unified multimodal framework for video generation using diffusion priors and discusses its cross-modal coherence mechanisms.
YouTube talk by @sedielem offering a concise state-of-the-art overview of scaling generative image and video models, covering modeling, architecture, distillation and control.
Huashu Design launches a tool that lets users whip up 80-point promo videos in 30 minutes with any AI agent; demo showcases a spot for Kimi K2.6.
CityRAG introduces a video generative model that produces long, physically grounded, 3D-consistent videos of real-world cities using geo-registered data, enabling realistic navigation and simulation for robotics and autonomous driving.
OSCBench is a new benchmark designed to evaluate text-to-video generation models' ability to accurately represent object state changes (transformations caused by actions like peeling or slicing). The paper reveals that current T2V models struggle with temporally consistent state changes, especially in novel and compositional scenarios, identifying this as a key bottleneck in video generation.
SDVG adapts speculative decoding to autoregressive video diffusion, using an image-quality router to achieve up to 2.09× speed-up with 95.7% quality retention on MovieGenVideoBench.
Motif-Video 2B is a 2B parameter text-to-video generation model that achieves 83.76% on VBench, surpassing Wan2.1 14B while using 7x fewer parameters and trained on fewer than 10M clips with less than 100,000 H200 GPU hours. The model uses a specialized architecture with shared cross-attention and a three-part backbone to separate prompt alignment, temporal consistency, and detail refinement.
This paper presents a method for HDR video generation by leveraging pretrained generative models through logarithmic encoding alignment and camera-mimicking degradation training, enabling effective HDR synthesis without architectural redesign. The approach demonstrates that HDR generation can be achieved simply by adapting existing models to a representation naturally aligned with their learned priors.
LiconStudio releases a LoRA adapter for LTX-2.3 fine-tuned on the VBVR dataset to enhance video generation with improved prompt understanding, motion dynamics, and temporal consistency for complex video reasoning tasks.
Google Vids introduces free high-quality video generation using Veo 3.1 for all users, alongside new custom music creation via Lyria 3 and AI avatar features for premium subscribers.
Google releases Veo 3.1 Lite, a cost-effective video generation model available on the Gemini API with 50% lower cost than Veo 3.1 Fast while maintaining the same speed. The model supports text-to-video and image-to-video generation with flexible resolutions and aspect ratios.