methodology

#methodology

Do you eval the whole harness or each of its parts?

Reddit r/AI_Agents ↗ · yesterday

A discussion question about whether to evaluate a machine learning harness as a whole or evaluate its individual components separately.

0 favorites 0 likes

#methodology

Traditional SDLC vs Agentic SDLC

Reddit r/ArtificialInteligence ↗ · 6d ago

This article compares the traditional Software Development Life Cycle (SDLC) with the emerging 'agentic SDLC' approach, which incorporates AI agents into the software development process.

0 favorites 0 likes

#methodology

@neil_xbt: https://x.com/neil_xbt/status/2067083332513395140

X AI KOLs Timeline ↗ · 2026-06-17 Cached

This article explains how to build a knowledge graph in Obsidian using Claude as an AI engine to find connections, arguing that note-taking systems become more valuable over time when notes are linked, rather than isolated.

0 favorites 0 likes

#methodology

@shao__meng: Seven Stages of AI-Driven Development 1. Grill 2. Research 3. Prototype 4. PRD 5. Issues 6. Implement 7. Review From Skills For Real Engineers by @…

X AI KOLs Timeline ↗ · 2026-06-16 Cached

A Twitter thread shares the seven stages of AI-driven development from the Skills For Real Engineers project by @mattpocockuk, including alignment techniques like grill sessions and tooling for coding agents.

0 favorites 0 likes

#methodology

NeurIPS used uncalibrated AI detector for desk rejections [D]

Reddit r/MachineLearning ↗ · 2026-06-03

A submission was desk-rejected from NeurIPS based on an uncalibrated AI detector (Pangram), raising concerns about circularity in the review process and unvalidated false-positive rates on the target distribution.

0 favorites 0 likes

#methodology

Gate AI: LLM Security Benchmark Evaluation Methodology and Results

arXiv cs.LG ↗ · 2026-06-03 Cached

This paper presents an evaluation methodology for LLM security detectors that addresses systematic weaknesses like per-dataset threshold tuning and undisclosed operating points. The framework uses cross-validation across 16 benchmarks, selects a single global operating point, and includes multiple diagnostics for generalization.

0 favorites 0 likes

#methodology

@freeman1266: Software engineering methodology must shift from the traditional 'state perspective' to a dynamical system perspective. The core view advocates that 'attractor logic takes precedence over governance tools (Harness)', that is, first define the structural invariants that the system should converge to in the long term, rather than merely focusing on local constraints and verification. AI, as a high-frequency and directionless perturbing...

X AI KOLs Timeline ↗ · 2026-06-03 Cached

The article proposes that software engineering methodology should shift from a state perspective to a dynamical system perspective, emphasizing that attractor logic takes precedence over governance tools. In the AI era, it is necessary to explicitly model state space, attractors, trajectories, and controls to address architectural drift caused by AI as a high-frequency perturbation source.

0 favorites 0 likes

#methodology

How much published AI research is wrong because of data leakage?

Reddit r/artificial ↗ · 2026-06-01

A Princeton study found data leakage in nearly 300 AI papers across 17 fields, causing overoptimistic results. The author highlights how easy it is to accidentally leak data and cautions against trusting impressive AI claims without checking for leakage.

0 favorites 0 likes

#methodology

I designed a methodology for (autonomously) training transformer language models on a single consumer GPU.

Reddit r/openclaw ↗ · 2026-05-31

A methodology for autonomously training transformer language models on a single consumer GPU, structured in six stages with verification gates and AGENTS.md specs for orchestration frameworks like OpenClaw.

0 favorites 0 likes

#methodology

@arcinstitute: Neuronal proteins originating in the brain drained to the dura, skull, and nose, while injected CSF-tracer gathered in …

X AI KOLs Timeline ↗ · 2026-05-29 Cached

Neuronal proteins from the brain drain to the dura, skull, and nose, whereas injected CSF-tracer accumulates in neck lymph nodes. The study highlights that the act of injection may perturb the system under investigation.

0 favorites 0 likes

#methodology

@k_dense_ai: Introducing Science Superpowers — a complete computational-science methodology for AI research agents. It makes your ag…

X AI KOLs Timeline ↗ · 2026-05-28 Cached

Science Superpowers is an open-source computational-science methodology for AI research agents, enforcing pre-registration and reproducible workflows to prevent p-hacking and HARKing.

0 favorites 0 likes

#methodology

Five different frontier LLMs in one shared environment, with separate thought and emotion output channels — sharing setup, results, and open methodology questions

Reddit r/AI_Agents ↗ · 2026-05-27

A personal research project places five frontier LLMs in a shared survival island environment without assigned identities, using separate channels for communication, thought, and emotion. The results show divergence between channels and consistent behavioral signatures across models, raising questions about AI agent personality and deception.

0 favorites 0 likes

#methodology

LQS v3.1 — an open methodology for rating AI training data (multi-oracle consensus + signed certificates) [P]

Reddit r/MachineLearning ↗ · 2026-05-23

The author presents LQS v3.1, an open methodology for rating AI training data using multi-oracle consensus and signed certificates, with a published paper and public index. The approach aims to solve the bottleneck of independent quality evaluation in the AI training data market.

0 favorites 0 likes

#methodology

Personality Engineering with AI Agents: A New Methodology for Negotiation Research

arXiv cs.AI ↗ · 2026-05-22 Cached

Introduces 'personality engineering,' a methodology using AI agents to parameterize, manipulate, and evaluate negotiator personality based on the interpersonal circumplex, enabling controlled experiments in negotiation theory.

0 favorites 0 likes

#methodology

Twelve Ways to Be Wrong About AI-Assisted Coding

Lobsters Hottest ↗ · 2026-05-21 Cached

This article critiques common flawed methods for evaluating AI-assisted coding tools, such as counting lines of code, timing artificial tasks, and relying on developer self-reports, arguing for more rigorous research methods.

0 favorites 0 likes

#methodology

METR evaluated an early version of Claude Mythos

Reddit r/singularity ↗ · 2026-05-09

METR evaluated an early version of Claude Mythos Preview in March 2026 using their time-horizons task suite, estimating a 50%-time-horizon of at least 16 hours, indicating the model is at the upper end of what current benchmarks can measure, with caveats about stability at longer time ranges.

0 favorites 0 likes

#methodology

@jaynitx: https://x.com/jaynitx/status/2052734499319091384

X AI KOLs Timeline ↗ · 2026-05-08 Cached

A personal reflection on first principles thinking versus reasoning by analogy, using examples from Elon Musk's approach to reducing rocket costs at SpaceX, and the author's own startup failure.

0 favorites 0 likes

#methodology

Advancing red teaming with people and AI

OpenAI Blog ↗ · 2024-11-21 Cached

OpenAI publishes a white paper detailing their approach to external red teaming for AI models, outlining methods for selecting diverse red team members, determining model access levels, providing testing infrastructure, and synthesizing feedback to improve AI safety and policy coverage.

0 favorites 0 likes

methodology

Submit Feedback