@siddarthv66: Im excited to announce something that's been cooking for a bit... We are organizing a COLM workshop in SF with a stacke…
Summary
Announcing the 'Context Beyond the Window' workshop at COLM in San Francisco, focused on LLM context length and diverse perspectives on extending it.
View Cached Full Text
Cached at: 06/01/26, 05:10 AM
Im excited to announce something that’s been cooking for a bit…
We are organizing a COLM workshop in SF with a stacked cast of speakers. It will feature diverse perspectives on LLM context length and how to increase it (or maybe why we shouldnt!)
If you are working on
Dane Malenfant (@dvnxmvl_hdf5): 🚨Excited to announce our workshop Context Beyond the Window hosted at COLM in SF! 🚨
LLMs have finite context windows, yet real-world tasks demand absorbing, retaining, and acting on information that far exceeds any single prompt.
1/3
We’re looking for submissions across:
Similar Articles
I built a 2.5D visual compiler for AI agents: It separates topology from geometry so LLMs stop generating spaghetti diagrams.
An open-source 2.5D diagram engine in Go that separates topology from geometry to enable LLMs to generate clean architecture diagrams without spatial hallucinations.
SharQ: Bridging Activation Sparsity and FP4 Quantization for LLM Inference
SharQ introduces a training-free method combining activation sparsity and FP4 quantization for LLM inference, using sparse-dense decomposition and a unified FP4 weight payload. It achieves significant latency reduction and accuracy recovery over FP4-only baselines.
Can Large Language Models Reliably Code Qualitative Humanitarian Data? A Benchmark Study Against Human Expert Adjudication
This benchmark study evaluates 46 large language models against human experts for coding qualitative humanitarian data, finding that LLMs can achieve comparable reliability with structured prompts and reasoning, but require careful oversight for nuanced themes.
Optimizing CUDA like a Human: Micro-Profiling Tools as Expert Surrogates for LLM-Based GPU Kernel Optimization
KernelPro is a closed-loop multi-agent system that uses LLMs and micro-profiling tools to automatically optimize GPU kernel code, achieving geomean speedups of 2.42×/4.69×/5.30× on KernelBench and demonstrating a measured 11.6% energy reduction at matched speed.
CAT-Q: Cost-efficient and Accurate Ternary Quantization for LLMs
CAT-Q introduces a post-training ternary quantization method for LLMs that uses learnable modulation and softened ternarization, achieving superior performance over BitNet 1.58-bit while using only 512 calibration samples and scaling to 235B parameters.