Tag
This paper formulates LLM inference budget allocation as a constrained optimization problem, proposing CLEAR to reallocate resources from low-utility queries to those near emergence thresholds, achieving up to 3× accuracy improvement under tight budgets.
This paper proposes HADT, a transformer-based architecture for autonomous resource management in heterogeneous satellite clusters for Earth observation, using differential attention and relational tokenization. Experiments show significant improvements over baselines and strong adaptability to varying cluster sizes.
This paper introduces the Online Shared Supply Allocation problem and proposes a deterministic threshold-proportional policy (GPA) that achieves a 4/3-approximation to the offline optimum. It also includes a learning-augmented extension to handle imperfect forecasts and demonstrates superior performance in synthetic and real-world experiments.
This paper introduces RGAO, a retrieval-guided adaptive orchestration framework for multi-agent code generation that dynamically selects topology based on code complexity. It provides a formal budget algebra ensuring provable resource conservation while significantly reducing routing errors compared to baseline methods.
A Kaggle competition challenges participants to build a scheduler that decides whether to run a 2B-parameter model on MMLU questions to minimize a weighted cost based on compute and accuracy trade-offs.