A constant-space linear-time algorithm for deleting all but the 10 most recent files in a directory

The Old New Thing (Raymond Chen) 05/14/26, 02:00 PM Tools

algorithm file-management priority-queue constant-space windows-api programming

Summary

This article describes a constant-space linear-time algorithm using a min-priority queue to delete all but the 10 most recent files in a directory, avoiding the O(n) space of sorting all entries.

Say you have a directory full of files, and you want to delete all but the 10 most recent files. Is there a way to tell <code>FindFirstFile</code> to enumerate the files in date order? No, there is no way to tell <code>FindFirstFile</code> to enumerate the files in date order. The files enumerated by <code>FindFirstFile</code> are produced in whatever order the file system driver wants. For example, FAT typically enumerates them in the order the files appear in the directory listing, which could be in order of creation if the files were added sequentially, or some mishmash order if there were renames or deletions mixed in. Since you can’t control the order in which the files are enumerated, you’ll have to do the sorting yourself. The naïve solution is to read in all the entries, sort them by last-modified date, and then delete all but the last ten. This is <var>O</var>(<var>n</var>) space and <var>O</var>(<var>n</var> log <var>n</var>) running time. But you can do better. This job calls for a priority queue. A priority queue is a data structure that supports these operations, where <var>n</var> is the number of items in the priority queue. <ul> <li>Add sorted: <var>O</var>(log <var>n</var>)</li> <li>Find largest: <var>O</var>(1)</li> <li>Remove largest: <var>O</var>(log <var>n</var>)</li> </ul> The above description is for a max-priority queue. There is also a min-priority queue where the final two operations are “find smallest” and “remove smallest”. The two versions are equivalent because you can just use a reverse-sense comparison to switch from one to the other. What we can do is enumerate all the files and add them one by one to a min-priority queue sorted by modified date. The priority queue holds the newest items. If the priority queue size exceeds 10, then we delete the file corresponding to the “smallest” (earliest) entry in the priority queue, and the remove that entry from the priority queue. Since the priority queue size has a fixed cap, all of the operations run in <var>O</var>(1) time because the value of <var>n</var> is bounded by a predetermined constant. (Of course, the larger the cap, the larger the constant in <var>O</var>(1).) The overall algorithm then runs in <var>O</var>(<var>n</var>) times, where <var>n</var> is the number of files in the directory. Here’s a sketch of a solution. To get a min-priority heap, we have to reverse the sense of the comparison in <code>dateAscending</code>. <pre>constexpr int files_to_keep = 10; auto dateAscending = [](const WIN32_FIND_DATA& a, const WIN32_FIND_DATA& b) { return CompareFileTime(&a.ftLastWriteTime, &b.ftLastWriteTime) > 0; }; std::priority_queue<WIN32_FIND_DATA, std::vector<WIN32_FIND_DATA>, decltype(dateAscending)> names(dateAscending); WIN32_FIND_DATA wfd; wil::unique_hfind findHandle( FindFirstFileW(L"*.*", &wfd)); if (findHandle.is_valid()) { do { if (wfd.dwFileAttributes & FILE_ATTRIBUTE_DIRECTORY) { // Skip directories continue; } names.push(wfd); if (names.size() > files_to_keep) { DeleteFileW(names.top().cFileName); names.pop(); } } while (FindNextFileW(findHandle.get(), &wfd)); } </pre> It’s unfortunate that <code>std::priority_queue</code> doesn’t have a deduction guide that deduces the <code>Comparator</code>. We have to specify it explicitly, and since it comes after the <code>Container</code>, we have to write out the container type manually instead of allowing it to be deduced. It’s also unfortunate that it’s hard to call <code>reserve()</code> on the vector hiding inside the <code>priority_queue</code>. This means that the <code>names.push()</code> could throw an exception. At least we use an RAII type (<code>wil::unique_hfind</code>) to ensure that the find handle is not leaked. If you have access to <code>std::inplace_vector</code>, you could use a <pre>std::priority_queue<WIN32_FIND_DATA, std::inplace_vector<WIN32_FIND_DATA, files_to_keep + 1>, decltype(dateAscending)> names(dateAscending); </pre> to avoid memory allocations entirely. (It also makes it clearer that the algorithm is constant-space.) This is an example of a so-called <a href="https://en.wikipedia.org/wiki/Online_algorithm">online algorithm</a>, an algorithm that does its work incrementally rather than requiring all of the input before it can start working. Exercise: What if the task was to delete the 10 oldest files? The post <a href="https://devblogs.microsoft.com/oldnewthing/20260514-00/?p=112322">A constant-space linear-time algorithm for deleting all but the 10 most recent files in a directory</a> appeared first on <a href="https://devblogs.microsoft.com/oldnewthing">The Old New Thing</a>.

Original Article

View Cached Full Text

Cached at: 05/16/26, 03:31 AM

# A constant-space linear-time algorithm for deleting all but the 10 most recent files in a directory - The Old New Thing Source: [https://devblogs.microsoft.com/oldnewthing/20260514-00?p=112322](https://devblogs.microsoft.com/oldnewthing/20260514-00?p=112322) Say you have a directory full of files, and you want to delete all but the 10 most recent files\. Is there a way to tell`FindFirstFile`to enumerate the files in date order? No, there is no way to tell`FindFirstFile`to enumerate the files in date order\. The files enumerated by`FindFirstFile`are produced in whatever order the file system driver wants\. For example, FAT typically enumerates them in the order the files appear in the directory listing, which could be in order of creation if the files were added sequentially, or some mishmash order if there were renames or deletions mixed in\. Since you can’t control the order in which the files are enumerated, you’ll have to do the sorting yourself\. The naïve solution is to read in all the entries, sort them by last\-modified date, and then delete all but the last ten\. This isO\(n\) space andO\(nlogn\) running time\. But you can do better\. This job calls for a priority queue\. A priority queue is a data structure that supports these operations, wherenis the number of items in the priority queue\. - Add sorted:O\(logn\) - Find largest:O\(1\) - Remove largest:O\(logn\) The above description is for a max\-priority queue\. There is also a min\-priority queue where the final two operations are “find smallest” and “remove smallest”\. The two versions are equivalent because you can just use a reverse\-sense comparison to switch from one to the other\. What we can do is enumerate all the files and add them one by one to a min\-priority queue sorted by modified date\. The priority queue holds the newest items\. If the priority queue size exceeds 10, then we delete the file corresponding to the “smallest” \(earliest\) entry in the priority queue, and the remove that entry from the priority queue\. Since the priority queue size has a fixed cap, all of the operations run inO\(1\) time because the value ofnis bounded by a predetermined constant\. \(Of course, the larger the cap, the larger the constant inO\(1\)\.\) The overall algorithm then runs inO\(n\) times, wherenis the number of files in the directory\. Here’s a sketch of a solution\. To get a min\-priority heap, we have to reverse the sense of the comparison in`dateAscending`\. ``` constexpr int files_to_keep = 10; auto dateAscending = [](const WIN32_FIND_DATA& a, const WIN32_FIND_DATA& b) { return CompareFileTime(&a.ftLastWriteTime, &b.ftLastWriteTime) > 0; }; std::priority_queue<WIN32_FIND_DATA, std::vector<WIN32_FIND_DATA>, decltype(dateAscending)> names(dateAscending); WIN32_FIND_DATA wfd; wil::unique_hfind findHandle( FindFirstFileW(L"*.*", &wfd)); if (findHandle.is_valid()) { do { if (wfd.dwFileAttributes & FILE_ATTRIBUTE_DIRECTORY) { // Skip directories continue; } names.push(wfd); if (names.size() > files_to_keep) { DeleteFileW(names.top().cFileName); names.pop(); } } while (FindNextFileW(findHandle.get(), &wfd)); } ``` It’s unfortunate that`std::priority\_queue`doesn’t have a deduction guide that deduces the`Comparator`\. We have to specify it explicitly, and since it comes after the`Container`, we have to write out the container type manually instead of allowing it to be deduced\. It’s also unfortunate that it’s hard to call`reserve\(\)`on the vector hiding inside the`priority\_queue`\. This means that the`names\.push\(\)`could throw an exception\. At least we use an RAII type \(`wil::unique\_hfind`\) to ensure that the find handle is not leaked\. If you have access to`std::inplace\_vector`, you could use a ``` std::priority_queue<WIN32_FIND_DATA, std::inplace_vector<WIN32_FIND_DATA, files_to_keep + 1>, decltype(dateAscending)> names(dateAscending); ``` to avoid memory allocations entirely\. \(It also makes it clearer that the algorithm is constant\-space\.\) This is an example of a so\-called[online algorithm](https://en.wikipedia.org/wiki/Online_algorithm), an algorithm that does its work incrementally rather than requiring all of the input before it can start working\. **Exercise**: What if the task was to delete the 10 oldest files? ### Category ### Topics ## Author ![Raymond Chen](https://devblogs.microsoft.com/oldnewthing/wp-content/uploads/sites/38/2019/02/RaymondChen_5in-150x150.jpg) Raymond has been involved in the evolution of Windows for more than 30 years\. In 2003, he began a Web site known as The Old New Thing which has grown in popularity far beyond his wildest imagination, a development which still gives him the heebie\-jeebies\. The Web site spawned a book, coincidentally also titled The Old New Thing \(Addison Wesley 2007\)\. He occasionally appears on the Windows Dev Docs Twitter account to tell stories which convey no useful information\.

A constant-space linear-time algorithm for deleting all but the 10 most recent files in a directory

Similar Articles

We made our filesystem 47× faster by deleting it

Rotation revisited: Another unidirectional algorithm

dmtrKovalenko/fff

Branchless Quicksort faster than std:sort and pdqsort with C and C++ API

@gkxspace: Whenever my Mac storage gets close to full, I run a cleanup using Mole + Codex. This time, I cleared another 100+ GB. 1. First, a single Mole command: mo clean https://github.com/tw93/Mole 2. Then let Codex …

Submit Feedback

Similar Articles

We made our filesystem 47× faster by deleting it

Rotation revisited: Another unidirectional algorithm

Branchless Quicksort faster than std:sort and pdqsort with C and C++ API

@gkxspace: Whenever my Mac storage gets close to full, I run a cleanup using Mole + Codex. This time, I cleared another 100+ GB. 1. First, a single Mole command: mo clean https://github.com/tw93/Mole 2. Then let Codex …