Tag
PhD student Alisa Liu shared her study notes on LLMs and math, which helped her get hired at OpenAI after 47 interviews and 4 offers.
This paper establishes foundational principles for deterministic encapsulation of generative models in traditional computational systems, defining four primitives and two anti-patterns to de-risk AI integration.
Charity Majors discusses how AI flipped the economics of code production, making code generation cheap and instant, transforming code from a treasured asset into a disposable, regenerable resource.
A blog post exploring the NP-hard problem of partitioning songs for 8-track tapes and humorously suggesting that LLMs could replace the human engineers who once solved this problem manually, while criticizing the use of Mechanical Turk workers for similar tasks.
This paper tackles code generation for no-resource programming languages by building benchmarks and proposing a method that combines further pre-training with weight difference transfer to create specialized instruction-following models at reduced cost.
A tweet thread lists 10 must-check GitHub repositories for AI engineers, covering hands-on AI engineering, LLMs, AI agents, ML deployment, and more.
This paper presents simple prompting strategies that help large language models better capture the full distribution of human judgments, improving alignment on moral scenarios and beliefs. The authors show that asking models to report standard deviations and response proportions, along with ensuring scenario clarity, yields better agreement with human responses.
This paper investigates the lack of standardized reporting on computational and environmental costs of LLMs in AIED research, reviewing 396 AIED 2025 papers and proposing an open-source method to measure and report these impacts.
This position paper argues that integrating explicit memory, analogous to human hippocampal memory, is essential for advancing LLMs toward AGI. It draws on neuroscience to propose that higher-order cognitive functions require explicit memory beyond implicit statistical learning.
An opinion piece arguing that open-source LLMs, particularly from China, have prevented US AI companies from monopolizing the technology, and advocating for open-source as an ethical duty.
A discussion on whether open-source LLMs are now 'just good enough' for most use cases, questioning the added value of proprietary models and the cost-benefit tradeoffs.
A software engineer with 10 years of experience in finance and payment systems reflects on how LLMs like ChatGPT and Claude are eroding the value of his domain-specific knowledge, as AI can now handle complex design tasks that previously required years of expertise.
This paper investigates how LLMs rely on morphological cues (affixes) to make pharmacological inferences, demonstrating that models can confidently generate plausible content for fictitious drug names based solely on affix heuristics, which poses a subtle safety risk.
This paper analyzes 35,361 GitHub code comments referencing AI use to develop a taxonomy of AI-assisted development activities, finding that developers primarily use LLMs for code implementation and enhancement, with subsequent human refactoring and bug fixes, and a temporal shift toward conceptual support over direct code generation.
This paper situates large language models within the broader history of computational approaches to concept analysis in the history, philosophy, and sociology of science (HPSS), reviewing methodological challenges and LLM-based case studies for lexical semantic change detection. It covers corpus construction, operationalization, and evaluation across both pre-LLM and LLM-era workflows.
A tweet curating foundational resources for understanding modern AI, covering topics from transformers to physical AI, including key papers and models.
This paper introduces a nested geometric decomposition framework to analyze how prompting reorganizes internal representations in large language and vision-language models. The authors show that affine transformations, particularly cross-dimensional linear mixing, are key to explaining prompt-induced behavioral changes.
A monthly briefing from Simon Willison summarizing important developments in LLMs, sponsored by Microsoft.
This paper presents the FETCH classifier, which uses an ensemble of LLMs to generate follow-up questions for automated legal intake, evaluating question quality and cost trade-offs. It finds that high-cost models like GPT-5 are needed for effective plain-language questions, and proposes a rubric for evaluating such questions.
This paper proves that learning by predicting latent representations (as in world models like JEPA and data2vec) requires exponentially less data than predicting tokens (as in LLMs) for hierarchical data with hidden structure.