9

Let's say a MLC SSD has lasted a very long time and the first cell has hit its last erase cycle and refuses to erase.

What happens after that? Does the controller detect that as a bad block and moves to the next one and tries to erase that instead? Would the total capacity of the drive just slowly decrease over time?

EDIT

And of course we can forget about wear leveling. Yes it extends the life of a drive, but I am not talking about that. Eventually a cell will hit its last erase cycle.

3 Answers 3

8

The NAND flash chips have some built-in mechanisms to detect failures on write and erase operations, and will alert the controller if one fails. In this case, the controller can either try again, or treat that block as bad and map it out of its wear-leveling algorithm. Each page in the NAND device also has a spare area alongside the main data area, which is intended for metadata such as ECC and other forms of fault detection and tolerance. The controller can decide on its own fault tolerance scheme using the spare area. Hamming Codes are one common scheme, though there are several, including simple parity bits and Reed-Solomon codes. If things don't match up on a read operation, again, the controller is free to do as it pleases. Ideally, it would also map these blocks out of the wear leveling algorithm, and you would just lose capacity little by little until "too many" blocks fail, where "too many" depends on the algorithms and hardware structure sizes within the controller. Many first-cut controller designs simply declare an error to the operating system.

Note that this is not an MLC-specific issue; though MLC cells may be more prone to a read error, since there is necessarily a smaller margin for error, SLC cells fail with mostly the same mechanisms, and can be dealt with by the controller in the same way.

2

Just like with hard disks, it's up to the implementation in the operating system. The controller would simply report that write (erase is actually a write operation) failed and it's up to the devide driver in operating system kernel to decide what to do. From what I've seen so far, Microsoft and Linux implementations simply return the error code to the calling application - so it produces I/O error.

In short: You simply get a "broken" device at some point.

2
  • Well, that sucks. Not a very good abstraction then... Jun 9, 2009 at 18:31
  • 1
    And wrong. Primarily this is handled in the SSD itself - not the device driver. Because this is normal operations. Wear leveling will record the sector as failed and remap the sector.
    – TomTom
    Jan 8, 2014 at 8:05
1

SSDs use something called "wear leveling", where the drive keeps a statistic about sector usage and at some point or when it detects problems it will move the sector to a reserve one, just like it happens with regular hard drives.

2
  • Yes, but I see mechanical HD failing mechanically before they run out of reserved sectors. SSD however might not. What happens when a drive runs out of reserved sectors? Jun 9, 2009 at 18:28
  • Write errors. WHat else you expect?
    – TomTom
    Jan 8, 2014 at 8:05

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .