Tag
A comprehensive, open-source GitHub repository providing structured learning roadmaps and curated resources for mastering AI, machine learning, deep learning, and large language models from beginner to advanced levels. Designed for students and professionals, it covers foundational concepts, programming frameworks, career tracks, and emerging AI topics.
llm-gemini 0.31 is a new release of the plugin for using Google's Gemini models with the LLM command-line tool.
The article criticizes the reliance on Large Language Models for generating bibliographic entries, highlighting issues with hallucinated citations and incorrect author lists in academic papers.
This article profiles MIT senior Olivia Honeycutt, highlighting her interdisciplinary research at the intersection of linguistics, computation, and cognition, with a focus on comparing human language processing with large language models.
Andrew Ng discusses how coding agents accelerate different types of software work at varying speeds, with frontend development benefiting most and research least.
Andrej Karpathy posted a 2-hour educational video that promises to significantly improve viewers' practical use of large language models.
A question seeking fundamental research or papers on whether AGI can or cannot be achieved through large language models, looking to move beyond opinion-based discussion.
CLewR introduces a curriculum learning strategy with restarts for improving machine translation performance in LLMs through preference optimization. The method addresses catastrophic forgetting by iterating easy-to-hard curriculum multiple times, showing consistent gains across Gemma2, Qwen2.5, and Llama3.1 models.
Disco-RAG proposes a discourse-aware retrieval-augmented generation framework that integrates discourse signals through intra-chunk discourse trees and inter-chunk rhetorical graphs to improve knowledge synthesis in LLMs. The method achieves state-of-the-art results on QA and summarization benchmarks without fine-tuning.
AtManRL is a method that uses differentiable attention manipulation and reinforcement learning to train LLMs to generate more faithful chain-of-thought reasoning by ensuring reasoning tokens causally influence final predictions. Experiments on GSM8K and MMLU with Llama-3.2-3B demonstrate the approach can identify influential reasoning tokens and improve reasoning transparency.
Google DeepMind senior scientist Alexander Lerchner argues that large language models cannot achieve consciousness, dubbing the assumption the 'Abstraction Fallacy' and suggesting this limitation persists even over a century-long timeframe.
LlamaFactory is a unified framework that enables efficient fine-tuning of over 100 large language models via a web-based interface, eliminating the need for coding.
A commentary on how large language models and AI have made English an effective programming language, reflecting the shift toward natural language interfaces for coding tasks.