Question about NTFS and fragmentation of MFT records RRS feed

  • Question

  • In short, my question is about fragmentation of MFT records (NOT the data in the file, the records in the NTFS MFT that POINT to the data in the file).  I'm seeing some odd behavior that has both performance and size implications that I can't figure out how to clean up (more detail below).

    In trying to figure out where to ask this question, I'm struggling with the fact that this is pretty esoteric stuff.  The number of people in the world who understand NTFS to this level is going to be relatively small.  And those who might know how to fix something like this?  Even smaller.

    Help me find one?


    In a small file stored on an NTFS partition, the locations of the clusters in use by the file are stored in the same MFT record as the rest of the information about the file (name, size, last modified, etc).  However, that list of locations can grow to be very large if the file becomes very fragmented.  For example: instead of storing 1 "data run" that starts at LCN #300,000 and runs for 1,000 clusters (which NTFS can store very efficiently), you could (theoretically) have 1,000 data runs, each 1 cluster long, positioned all over the disk.

    Now, as the number of data runs climbs, eventually they won't all fit in that record with the rest of the data.  In that case, NTFS allocates a second record in the MFT, and has the "base" record point to it (via an ATTRIBUTE_LIST).  As the file/fragmentation gets even larger, 1 record might not be enough, so NTFS will allocate even more records.  When using the default size of MFT records (1k), I'm seeing a max of ~200 data runs on a page.

    With that background in mind, here's my problem:

    I'm seeing files that have multiple records allocated to hold data runs.  But each of these MFT records only holds a single data run (instead of the ~200 I'm expecting).  In an extreme example, I've got a file with 637 MFT records allocated, all with exactly 1 data run on them.  So instead of taking up 4 records in the MFT, it's using 637.  Which means that when I walk the file, I'll not only be reading each of the pages of data from the file, NTFS is going to have to do an additional 637 reads to find out where the data is.  Ouch.

    Which brings me to my questions:

    1. What is happening that causes this to happen to some files and not others?
    2. (More importantly) What API can I use to "defrag" these 637 records back to the 4 it should take?

    Things that don't work:

    • Using FSCTL_MOVE_FILE to defrag the file will move the data clusters next to each other.  But it will NOT cause the MFT records to coalesce.  Fragging then defragging the file data doesn't work either.
    • "fsutil repair initiate" on an affected file does not cause the records to coalesce.  Presumably the associated DeviceIoControl won't help either.
    • Presumably copying the file, deleting the original, and renaming the copy would work.  But this is not a practical solution.  I need to be able to tell NTFS to clean up the file's records without copying gigabytes of data around.
    • FSCTL_INITIATE_FILE_METADATA_OPTIMIZATION sounds like it might do what I need (from the name).  But the fact that it is only supported on W10 and is totally undocumented limits its usefulness.  I need a solution that works for W7 & up.  Documentation is also good.


    • I'm seeing this behavior on 2 W7 machines and a W8.
    • The more use the computer has seen, the more affected files there are.
    • Oddly, c:\Windows\inf\setupapi.dev.log shows the problem on all three machines.
    • One of the machines has an SSD, the others do not.
    • The files are neither compressed nor sparse.

    Friday, May 18, 2018 12:20 AM


All replies