Counterfactual Optimization of Baseball Pitch Sequences and Estimation of Its Impact on Season-Level Statistics

arXiv cs.LG 06/17/26, 04:00 AM Papers

baseball sports-analytics pitch-sequencing transformer machine-learning counterfactual optimization

Summary

This paper uses a Transformer-based model on MLB Statcast data to counterfactually optimize baseball pitch sequences, finding that optimizing both final and setup pitches can improve season-level statistics like K/9 by over 1.0.

arXiv:2606.17345v1 Announce Type: new Abstract: Although pitch sequencing is a central topic in baseball analytics, previous studies have primarily focused on optimizing the final pitch within a single plate appearance, leaving the role of preceding setup pitches and their impact on long-term season-level performance insufficiently examined. To address these issues, this study conducted counterfactual analyses using MLB Statcast data. A Transformer-based machine-learning model was trained to predict whether a target pitch would result in an in-play outcome or swing-out. Counterfactual pitch sequences were then generated by replacing either the final pitch or the preceding setup pitch with alternative pitch types and locations while keeping the surrounding contextual information fixed. Optimal counterfactual selections were defined as those that minimized the predicted in-play probability, and their expected effects on pitchers' seasonal statistics were estimated using regression models linking model outputs to season statistics. The results suggest that the optimization of both final and setup pitches may substantially influence season-level performance, including improvements of more than 1.0 in K/9. The analyses also provided several practical insights, including velocity-band-specific effective locations, the importance of pitch commands, and the expansion of pitch-selection options through middle-velocity pitches. These findings quantitatively support the strategic importance of pitch sequencing in baseball.

Original Article

View Cached Full Text

Cached at: 06/17/26, 05:37 AM

# Counterfactual Optimization of Baseball Pitch Sequences and Estimation of Its Impact on Season-Level Statistics
Source: [https://arxiv.org/abs/2606.17345](https://arxiv.org/abs/2606.17345)
[View PDF](https://arxiv.org/pdf/2606.17345)

> Abstract:Although pitch sequencing is a central topic in baseball analytics, previous studies have primarily focused on optimizing the final pitch within a single plate appearance, leaving the role of preceding setup pitches and their impact on long\-term season\-level performance insufficiently examined\. To address these issues, this study conducted counterfactual analyses using MLB Statcast data\. A Transformer\-based machine\-learning model was trained to predict whether a target pitch would result in an in\-play outcome or swing\-out\. Counterfactual pitch sequences were then generated by replacing either the final pitch or the preceding setup pitch with alternative pitch types and locations while keeping the surrounding contextual information fixed\. Optimal counterfactual selections were defined as those that minimized the predicted in\-play probability, and their expected effects on pitchers' seasonal statistics were estimated using regression models linking model outputs to season statistics\. The results suggest that the optimization of both final and setup pitches may substantially influence season\-level performance, including improvements of more than 1\.0 in K/9\. The analyses also provided several practical insights, including velocity\-band\-specific effective locations, the importance of pitch commands, and the expansion of pitch\-selection options through middle\-velocity pitches\. These findings quantitatively support the strategic importance of pitch sequencing in baseball\.

## Submission history

From: Ryota Takamido \[[view email](https://arxiv.org/show-email/b7a09756/2606.17345)\] **\[v1\]**Mon, 15 Jun 2026 22:47:06 UTC \(2,531 KB\)

Counterfactual Optimization of Baseball Pitch Sequences and Estimation of Its Impact on Season-Level Statistics

Similar Articles

Conditional Attribute Estimation with Autoregressive Sequence Models

How Faithful Is Trajectory-Based Data Attribution? Error Sources, Remedies, and Practical Guidelines

SkillOpt treats markdown skill files as trainable parameters with proper optimization machinery

Plan Before You Trade: Inference-Time Optimization for RL Trading Agents

@Yif_Yang: Introducing SkillOpt — an optimizer for agent skills. Instead of finetuning model weights, we treat a natural-language …

Submit Feedback

Similar Articles

Conditional Attribute Estimation with Autoregressive Sequence Models

How Faithful Is Trajectory-Based Data Attribution? Error Sources, Remedies, and Practical Guidelines

SkillOpt treats markdown skill files as trainable parameters with proper optimization machinery

Plan Before You Trade: Inference-Time Optimization for RL Trading Agents

@Yif_Yang: Introducing SkillOpt — an optimizer for agent skills. Instead of finetuning model weights, we treat a natural-language …