Tag
The article raises design and ethical questions about what information AI agents should disclose when recommending products or services, including business partnerships, ranking criteria, and affiliate relationships, drawing parallels with traditional online advertising transparency patterns.
This paper analyzes Canada's Federal AI Register (409 systems) and argues that such transparency artifacts configure accountability through ontological design rather than enabling genuine contestability, finding that 86% of systems are internal-efficiency focused while human discretion is systematically obscured.
A comprehensive survey reviewing recent advances in intrinsic interpretability for Large Language Models, categorizing approaches into five design paradigms: functional transparency, concept alignment, representational decomposability, explicit modularization, and latent sparsity induction. The paper addresses the challenge of building transparency directly into model architectures rather than relying on post-hoc explanation methods.
Anthropic released Claude Opus 4.7 with notable system prompt changes including expanded child safety instructions, new tool integrations (Claude in PowerPoint, Chrome, Excel), and behavioral adjustments to reduce verbosity and improve task completion without unnecessary clarification.
OpenAI publishes details on its Model Spec, a formal framework defining how its AI models should behave across diverse use cases, emphasizing transparency, fairness, and safety as core principles for democratized AI development.
OpenAI announces a strengthened safety ecosystem through external third-party testing and evaluations of frontier AI models, including independent assessments, methodology reviews, and subject-matter expert probing. The company commits to transparency by publicly sharing third-party assessment results and supporting independent evaluations since GPT-4's launch.
OpenAI has released a major update to its Model Spec, a document defining desired AI model behavior, now publicly available under CC0 license. The update emphasizes customizability, transparency, and intellectual freedom while maintaining safety guardrails through a clear chain-of-command framework.
OpenAI reports disrupting five covert influence operations attempting to misuse its AI models for deceptive campaigns, with findings showing that safety-designed models prevented threat actors from generating desired content. The company is publishing trend analysis and collaborating with industry, civil society, and government to combat AI-enabled information manipulation.
OpenAI introduces the Model Spec, a document outlining how its models should behave in ChatGPT and the API, covering objectives, rules, and default behaviors. An updated version was released in February 2025, reinforcing commitments to customizability, transparency, and intellectual freedom while maintaining safety guardrails.
OpenAI publishes AI governance recommendations committing companies to internal and external red-teaming for safety risks, information sharing on emerging capabilities, and mechanisms for detecting AI-generated audio and visual content.
OpenAI publishes a report on mechanisms to improve verifiability in AI development, addressing how stakeholders can verify organizations' claims about AI system properties and safety practices.