Tag
Debunks common myths about /dev/urandom and /dev/random, explaining that /dev/urandom is the preferred source of cryptographic randomness on Unix-like systems.
This paper investigates whether probabilistic calibration in language models can be improved through fine-tuning, comparing soft-target and hard-target methods across 12 models. The results show that calibration is a trainable capability, though gains sometimes reduce downstream arithmetic reasoning capabilities.
The paper introduces Diamond Attention, a method for multi-agent reinforcement learning that uses structured randomness to break symmetry and enable role differentiation among homogeneous agents, achieving perfect coordination in symmetric tasks like the XOR game.