This article describes a constant-space linear-time algorithm using a min-priority queue to delete all but the 10 most recent files in a directory, avoiding the O(n) space of sorting all entries.
<p>Say you have a directory full of files, and you want to delete all but the 10 most recent files. Is there a way to tell <code>FindFirstFile</code> to enumerate the files in date order?</p>
<p>No, there is no way to tell <code>FindFirstFile</code> to enumerate the files in date order. The files enumerated by <code>FindFirstFile</code> are produced in whatever order the file system driver wants. For example, FAT typically enumerates them in the order the files appear in the directory listing, which could be in order of creation if the files were added sequentially, or some mishmash order if there were renames or deletions mixed in.</p>
<p>Since you can’t control the order in which the files are enumerated, you’ll have to do the sorting yourself. The naïve solution is to read in all the entries, sort them by last-modified date, and then delete all but the last ten. This is <var>O</var>(<var>n</var>) space and <var>O</var>(<var>n</var> log <var>n</var>) running time.</p>
<p>But you can do better.</p>
<p>This job calls for a priority queue. A priority queue is a data structure that supports these operations, where <var>n</var> is the number of items in the priority queue.</p>
<ul>
<li>Add sorted: <var>O</var>(log <var>n</var>)</li>
<li>Find largest: <var>O</var>(1)</li>
<li>Remove largest: <var>O</var>(log <var>n</var>)</li>
</ul>
<p>The above description is for a max-priority queue. There is also a min-priority queue where the final two operations are “find smallest” and “remove smallest”. The two versions are equivalent because you can just use a reverse-sense comparison to switch from one to the other.</p>
<p>What we can do is enumerate all the files and add them one by one to a min-priority queue sorted by modified date. The priority queue holds the newest items. If the priority queue size exceeds 10, then we delete the file corresponding to the “smallest” (earliest) entry in the priority queue, and the remove that entry from the priority queue.</p>
<p>Since the priority queue size has a fixed cap, all of the operations run in <var>O</var>(1) time because the value of <var>n</var> is bounded by a predetermined constant. (Of course, the larger the cap, the larger the constant in <var>O</var>(1).) The overall algorithm then runs in <var>O</var>(<var>n</var>) times, where <var>n</var> is the number of files in the directory.</p>
<p>Here’s a sketch of a solution. To get a min-priority heap, we have to reverse the sense of the comparison in <code>dateAscending</code>.</p>
<pre>constexpr int files_to_keep = 10;
auto dateAscending = [](const WIN32_FIND_DATA& a, const WIN32_FIND_DATA& b) {
return CompareFileTime(&a.ftLastWriteTime, &b.ftLastWriteTime) > 0;
};
std::priority_queue<WIN32_FIND_DATA,
std::vector<WIN32_FIND_DATA>, decltype(dateAscending)>
names(dateAscending);
WIN32_FIND_DATA wfd;
wil::unique_hfind findHandle( FindFirstFileW(L"*.*", &wfd));
if (findHandle.is_valid())
{
do
{
if (wfd.dwFileAttributes & FILE_ATTRIBUTE_DIRECTORY) {
// Skip directories
continue;
}
names.push(wfd);
if (names.size() > files_to_keep) {
DeleteFileW(names.top().cFileName);
names.pop();
}
} while (FindNextFileW(findHandle.get(), &wfd));
}
</pre>
<p>It’s unfortunate that <code>std::<wbr />priority_<wbr />queue</code> doesn’t have a deduction guide that deduces the <code>Comparator</code>. We have to specify it explicitly, and since it comes after the <code>Container</code>, we have to write out the container type manually instead of allowing it to be deduced.</p>
<p>It’s also unfortunate that it’s hard to call <code>reserve()</code> on the vector hiding inside the <code>priority_<wbr />queue</code>. This means that the <code>names.push()</code> could throw an exception. At least we use an RAII type (<code>wil::<wbr />unique_<wbr />hfind</code>) to ensure that the find handle is not leaked.</p>
<p>If you have access to <code>std::<wbr />inplace_<wbr />vector</code>, you could use a</p>
<pre>std::priority_queue<WIN32_FIND_DATA,
std::inplace_vector<WIN32_FIND_DATA, files_to_keep + 1>,
decltype(dateAscending)> names(dateAscending);
</pre>
<p>to avoid memory allocations entirely. (It also makes it clearer that the algorithm is constant-space.)</p>
<p>This is an example of a so-called <a href="https://en.wikipedia.org/wiki/Online_algorithm">online algorithm</a>, an algorithm that does its work incrementally rather than requiring all of the input before it can start working.</p>
<p><b>Exercise</b>: What if the task was to delete the 10 oldest files?</p>
<p>The post <a href="https://devblogs.microsoft.com/oldnewthing/20260514-00/?p=112322">A constant-space linear-time algorithm for deleting all but the 10 most recent files in a directory</a> appeared first on <a href="https://devblogs.microsoft.com/oldnewthing">The Old New Thing</a>.</p>
# A constant-space linear-time algorithm for deleting all but the 10 most recent files in a directory - The Old New Thing
Source: [https://devblogs.microsoft.com/oldnewthing/20260514-00?p=112322](https://devblogs.microsoft.com/oldnewthing/20260514-00?p=112322)
Say you have a directory full of files, and you want to delete all but the 10 most recent files\. Is there a way to tell`FindFirstFile`to enumerate the files in date order?
No, there is no way to tell`FindFirstFile`to enumerate the files in date order\. The files enumerated by`FindFirstFile`are produced in whatever order the file system driver wants\. For example, FAT typically enumerates them in the order the files appear in the directory listing, which could be in order of creation if the files were added sequentially, or some mishmash order if there were renames or deletions mixed in\.
Since you can’t control the order in which the files are enumerated, you’ll have to do the sorting yourself\. The naïve solution is to read in all the entries, sort them by last\-modified date, and then delete all but the last ten\. This isO\(n\) space andO\(nlogn\) running time\.
But you can do better\.
This job calls for a priority queue\. A priority queue is a data structure that supports these operations, wherenis the number of items in the priority queue\.
- Add sorted:O\(logn\)
- Find largest:O\(1\)
- Remove largest:O\(logn\)
The above description is for a max\-priority queue\. There is also a min\-priority queue where the final two operations are “find smallest” and “remove smallest”\. The two versions are equivalent because you can just use a reverse\-sense comparison to switch from one to the other\.
What we can do is enumerate all the files and add them one by one to a min\-priority queue sorted by modified date\. The priority queue holds the newest items\. If the priority queue size exceeds 10, then we delete the file corresponding to the “smallest” \(earliest\) entry in the priority queue, and the remove that entry from the priority queue\.
Since the priority queue size has a fixed cap, all of the operations run inO\(1\) time because the value ofnis bounded by a predetermined constant\. \(Of course, the larger the cap, the larger the constant inO\(1\)\.\) The overall algorithm then runs inO\(n\) times, wherenis the number of files in the directory\.
Here’s a sketch of a solution\. To get a min\-priority heap, we have to reverse the sense of the comparison in`dateAscending`\.
```
constexpr int files_to_keep = 10;
auto dateAscending = [](const WIN32_FIND_DATA& a, const WIN32_FIND_DATA& b) {
return CompareFileTime(&a.ftLastWriteTime, &b.ftLastWriteTime) > 0;
};
std::priority_queue<WIN32_FIND_DATA,
std::vector<WIN32_FIND_DATA>, decltype(dateAscending)>
names(dateAscending);
WIN32_FIND_DATA wfd;
wil::unique_hfind findHandle( FindFirstFileW(L"*.*", &wfd));
if (findHandle.is_valid())
{
do
{
if (wfd.dwFileAttributes & FILE_ATTRIBUTE_DIRECTORY) {
// Skip directories
continue;
}
names.push(wfd);
if (names.size() > files_to_keep) {
DeleteFileW(names.top().cFileName);
names.pop();
}
} while (FindNextFileW(findHandle.get(), &wfd));
}
```
It’s unfortunate that`std::priority\_queue`doesn’t have a deduction guide that deduces the`Comparator`\. We have to specify it explicitly, and since it comes after the`Container`, we have to write out the container type manually instead of allowing it to be deduced\.
It’s also unfortunate that it’s hard to call`reserve\(\)`on the vector hiding inside the`priority\_queue`\. This means that the`names\.push\(\)`could throw an exception\. At least we use an RAII type \(`wil::unique\_hfind`\) to ensure that the find handle is not leaked\.
If you have access to`std::inplace\_vector`, you could use a
```
std::priority_queue<WIN32_FIND_DATA,
std::inplace_vector<WIN32_FIND_DATA, files_to_keep + 1>,
decltype(dateAscending)> names(dateAscending);
```
to avoid memory allocations entirely\. \(It also makes it clearer that the algorithm is constant\-space\.\)
This is an example of a so\-called[online algorithm](https://en.wikipedia.org/wiki/Online_algorithm), an algorithm that does its work incrementally rather than requiring all of the input before it can start working\.
**Exercise**: What if the task was to delete the 10 oldest files?
### Category
### Topics
## Author

Raymond has been involved in the evolution of Windows for more than 30 years\. In 2003, he began a Web site known as The Old New Thing which has grown in popularity far beyond his wildest imagination, a development which still gives him the heebie\-jeebies\. The Web site spawned a book, coincidentally also titled The Old New Thing \(Addison Wesley 2007\)\. He occasionally appears on the Windows Dev Docs Twitter account to tell stories which convey no useful information\.
microsandbox replaced its slow user-space FUSE filesystem with a kernel-mounted EROFS disk image, achieving a 47× geometric mean speedup across filesystem operations and eliminating the VM/host round-trip bottleneck.
Raymond Chen revisits a unidirectional rotation algorithm for swapping adjacent memory blocks, explaining its recursive approach and performance characteristics.
fff is a fast, typo-resistant file search toolkit with frecency ranking and an MCP server for AI agents, providing efficient file and content search with git awareness.
A new branchless Quicksort implementation (blqsort) using sorting networks outperforms std::sort and pdqsort on Apple M1 and AMD Ryzen systems, available as single-header C and C++ libraries. It achieves speedups through branchless partitioning, median-of-medians pivot selection, and custom sorting networks for small arrays.
Shares tips on deeply cleaning Mac storage using the open-source tool Mole alongside Codex. Mole is a command-line tool that combines cleanup, uninstallation, and monitoring. This session freed up over 100 GB of space.