Special File that causes I/O error












11















I want to automatically test if a piece of software reacts as expected if an essential SQLite DB file fails to be read (causing an I/O error). Exactly that happened some days ago at a client. We manually fixed it but now I want to create automatic code to fix it and need access to a broken file to test that.



As everything in Unix's a file, I suspected that there might be a special file that always causes I/O errors when one tries to read it (e.g. in /dev).



Some similar files (imo) would be:





  • /dev/full which always says "No space left on device" if you try to write it


  • /dev/null and /dev/zero


so I assumed there just has to be a file like that (but haven't found one yet).



Does anyone know such a file or any other method for me to get the desired result (a intentionally faulty partition image, a wrapper around open() using LD_PRELOAD, ...)?

What's the best way to go here?










share|improve this question

























  • As far as I know, there is no special file on Linux that gives SIGIO when you read from it. The last time I got a SIGIO was because of a USB stick that declared a capacity much bigger than the real, physical one. Maybe that could be a possibility?

    – lgeorget
    May 29 '13 at 12:25











  • hmmm, I might be able to try that with a small partition image that I'll crop somewhere in the middle...

    – mreithub
    May 29 '13 at 13:04











  • SIGIO doesn't mean there has been an error, it is a way that a program can request to be notified that non blocking IO is now possible, instead of calling select() or poll().

    – psusi
    May 29 '13 at 13:13











  • Ups, yes, you're right, of course. I wrote SIGIO but was thinking of EIO error code. But maybe the OP too? Why would a failure to read give a SIGIO?

    – lgeorget
    May 29 '13 at 13:17











  • oh, I made the same mistake in the question... Edited it...

    – mreithub
    May 29 '13 at 14:08
















11















I want to automatically test if a piece of software reacts as expected if an essential SQLite DB file fails to be read (causing an I/O error). Exactly that happened some days ago at a client. We manually fixed it but now I want to create automatic code to fix it and need access to a broken file to test that.



As everything in Unix's a file, I suspected that there might be a special file that always causes I/O errors when one tries to read it (e.g. in /dev).



Some similar files (imo) would be:





  • /dev/full which always says "No space left on device" if you try to write it


  • /dev/null and /dev/zero


so I assumed there just has to be a file like that (but haven't found one yet).



Does anyone know such a file or any other method for me to get the desired result (a intentionally faulty partition image, a wrapper around open() using LD_PRELOAD, ...)?

What's the best way to go here?










share|improve this question

























  • As far as I know, there is no special file on Linux that gives SIGIO when you read from it. The last time I got a SIGIO was because of a USB stick that declared a capacity much bigger than the real, physical one. Maybe that could be a possibility?

    – lgeorget
    May 29 '13 at 12:25











  • hmmm, I might be able to try that with a small partition image that I'll crop somewhere in the middle...

    – mreithub
    May 29 '13 at 13:04











  • SIGIO doesn't mean there has been an error, it is a way that a program can request to be notified that non blocking IO is now possible, instead of calling select() or poll().

    – psusi
    May 29 '13 at 13:13











  • Ups, yes, you're right, of course. I wrote SIGIO but was thinking of EIO error code. But maybe the OP too? Why would a failure to read give a SIGIO?

    – lgeorget
    May 29 '13 at 13:17











  • oh, I made the same mistake in the question... Edited it...

    – mreithub
    May 29 '13 at 14:08














11












11








11


3






I want to automatically test if a piece of software reacts as expected if an essential SQLite DB file fails to be read (causing an I/O error). Exactly that happened some days ago at a client. We manually fixed it but now I want to create automatic code to fix it and need access to a broken file to test that.



As everything in Unix's a file, I suspected that there might be a special file that always causes I/O errors when one tries to read it (e.g. in /dev).



Some similar files (imo) would be:





  • /dev/full which always says "No space left on device" if you try to write it


  • /dev/null and /dev/zero


so I assumed there just has to be a file like that (but haven't found one yet).



Does anyone know such a file or any other method for me to get the desired result (a intentionally faulty partition image, a wrapper around open() using LD_PRELOAD, ...)?

What's the best way to go here?










share|improve this question
















I want to automatically test if a piece of software reacts as expected if an essential SQLite DB file fails to be read (causing an I/O error). Exactly that happened some days ago at a client. We manually fixed it but now I want to create automatic code to fix it and need access to a broken file to test that.



As everything in Unix's a file, I suspected that there might be a special file that always causes I/O errors when one tries to read it (e.g. in /dev).



Some similar files (imo) would be:





  • /dev/full which always says "No space left on device" if you try to write it


  • /dev/null and /dev/zero


so I assumed there just has to be a file like that (but haven't found one yet).



Does anyone know such a file or any other method for me to get the desired result (a intentionally faulty partition image, a wrapper around open() using LD_PRELOAD, ...)?

What's the best way to go here?







linux devices io testing






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited May 29 '13 at 14:03







mreithub

















asked May 29 '13 at 11:57









mreithubmreithub

2,22311417




2,22311417













  • As far as I know, there is no special file on Linux that gives SIGIO when you read from it. The last time I got a SIGIO was because of a USB stick that declared a capacity much bigger than the real, physical one. Maybe that could be a possibility?

    – lgeorget
    May 29 '13 at 12:25











  • hmmm, I might be able to try that with a small partition image that I'll crop somewhere in the middle...

    – mreithub
    May 29 '13 at 13:04











  • SIGIO doesn't mean there has been an error, it is a way that a program can request to be notified that non blocking IO is now possible, instead of calling select() or poll().

    – psusi
    May 29 '13 at 13:13











  • Ups, yes, you're right, of course. I wrote SIGIO but was thinking of EIO error code. But maybe the OP too? Why would a failure to read give a SIGIO?

    – lgeorget
    May 29 '13 at 13:17











  • oh, I made the same mistake in the question... Edited it...

    – mreithub
    May 29 '13 at 14:08



















  • As far as I know, there is no special file on Linux that gives SIGIO when you read from it. The last time I got a SIGIO was because of a USB stick that declared a capacity much bigger than the real, physical one. Maybe that could be a possibility?

    – lgeorget
    May 29 '13 at 12:25











  • hmmm, I might be able to try that with a small partition image that I'll crop somewhere in the middle...

    – mreithub
    May 29 '13 at 13:04











  • SIGIO doesn't mean there has been an error, it is a way that a program can request to be notified that non blocking IO is now possible, instead of calling select() or poll().

    – psusi
    May 29 '13 at 13:13











  • Ups, yes, you're right, of course. I wrote SIGIO but was thinking of EIO error code. But maybe the OP too? Why would a failure to read give a SIGIO?

    – lgeorget
    May 29 '13 at 13:17











  • oh, I made the same mistake in the question... Edited it...

    – mreithub
    May 29 '13 at 14:08

















As far as I know, there is no special file on Linux that gives SIGIO when you read from it. The last time I got a SIGIO was because of a USB stick that declared a capacity much bigger than the real, physical one. Maybe that could be a possibility?

– lgeorget
May 29 '13 at 12:25





As far as I know, there is no special file on Linux that gives SIGIO when you read from it. The last time I got a SIGIO was because of a USB stick that declared a capacity much bigger than the real, physical one. Maybe that could be a possibility?

– lgeorget
May 29 '13 at 12:25













hmmm, I might be able to try that with a small partition image that I'll crop somewhere in the middle...

– mreithub
May 29 '13 at 13:04





hmmm, I might be able to try that with a small partition image that I'll crop somewhere in the middle...

– mreithub
May 29 '13 at 13:04













SIGIO doesn't mean there has been an error, it is a way that a program can request to be notified that non blocking IO is now possible, instead of calling select() or poll().

– psusi
May 29 '13 at 13:13





SIGIO doesn't mean there has been an error, it is a way that a program can request to be notified that non blocking IO is now possible, instead of calling select() or poll().

– psusi
May 29 '13 at 13:13













Ups, yes, you're right, of course. I wrote SIGIO but was thinking of EIO error code. But maybe the OP too? Why would a failure to read give a SIGIO?

– lgeorget
May 29 '13 at 13:17





Ups, yes, you're right, of course. I wrote SIGIO but was thinking of EIO error code. But maybe the OP too? Why would a failure to read give a SIGIO?

– lgeorget
May 29 '13 at 13:17













oh, I made the same mistake in the question... Edited it...

– mreithub
May 29 '13 at 14:08





oh, I made the same mistake in the question... Edited it...

– mreithub
May 29 '13 at 14:08










5 Answers
5






active

oldest

votes


















8














You can use dmsetup to create a device-mapper device using either the error or flakey targets to simulate failures.



dmsetup create test --table '0 123 flakey 1 0 /dev/loop0'


Where 123 is the length of the device, in sectors and /dev/loop0 is the original device that you want to simulate errors on. For error, you don't need the subsequent arguments as it always returns an error.






share|improve this answer





















  • 1





    I find at least two errors in that command: The missing device name, the quoting typo, and what is "1 0 /dev/null" supposed to mean?

    – Hauke Laging
    May 29 '13 at 13:40











  • @HaukeLaging, ahh, yes, I left out the name and somehow hit the wrong quote. The 1 0 /dev/null means 1 target, starting at offset 0, backed by device /dev/null. It is needed for flakey, but apparently is optional for error.

    – psusi
    May 29 '13 at 13:44











  • It seems to me that it's not "optional" but simply ignored. You may check with dmsetup table test. You can even write foo bar behind error; it just doesn't care (and thus should be deleted).

    – Hauke Laging
    May 29 '13 at 13:48











  • @HaukeLaging, edited.

    – psusi
    May 29 '13 at 13:54











  • Thanks for the answer, I think that's the way I'll go for now. The only minor issue I have with this is that it requires root access, but I guess you'll need that anyway or such lowlevel stuff... (I'll dig into the LD_PRELOAD idea when I have time).

    – mreithub
    May 29 '13 at 15:06



















12














There's a great set of answers to this on Stack Overflow and Server Fault already but some techniques were missing. To make life easier here's a (not so) short list of Linux disk and filesystem I/O fault injection mechanisms:




  • Use Device Mapper's error/flakey/delay devices to return errors/corruption from, or delay/split IO to a synthesised block device (kernel, requires kernel to have been built with device mapper support, appropriate additional device mapper modules and to have device mapper userspace bits).

  • Use md's faulty personality to perform periodic fault injection on a synthesised block device. See the --layout option of the mdadm man page for how to configure it (kernel and mdadm userspace bits).

  • Use libfiu to perform fault injection on POSIX API calls (userspace, can be used with LD_PRELOAD).

  • Use the Linux kernel's fault injector to inject an error into the underlying block device (kernel, requires kernel to have been built with FAIL_MAKE_REQUEST=y)

  • Using SystemTap to do fault injection (kernel, requires a kernel to have been built with lots of stuff).


  • Inject filesystem faults using CharybdeFS or PetardFS (userspace via FUSE).

  • Create a synthesised block device using the Linux scsi_debug driver that performs fault injection (kernel).

  • Run your system within QEMU and use QEMU to inject block device errors using the blkdebug driver (VM).

  • Create a synthesised block device via the null_blk device's options to inject faults (kernel >= 4.14 but options like timeout probabilities didn't arrive until 4.17 and require the kernel to have been built with BLK_DEV_NULL_BLK_FAULT_INJECTION=y)

  • Create a synthesised Network Block Device which is served to the host via NBDkit filters such as delay or error and then attach a block device to it via nbd-client (kernel + NBD userspace bits, kernel >= 4.18 built with NBD support, nbdclient >= 3.18 and nbdkit >= 1.8.1 recommended - see NBDKit demo video around 20 min mark)


Bonus fact: SQLite has a VFS driver for simulating errors so it can get good test coverage.



Related:




  • How can I simulate a failed disk during testing?

  • Simulate a faulty block device with read errors?

  • Generate a read error

  • Intentionally cause an I/O error in Linux?






share|improve this answer

































    5














    You want a fault injection mechanism for I/O.



    On Linux, here's a method that doesn't require any prior setup and generates an unusual error (not EIO “Input/output error” but ESRCH “No such process”):



    cat /proc/1234/mem


    where 1234 is the PID of a process running as the same user as the process you're testing, but not that process itself. Credits to rubasov for thinking of /proc/$pid/mem.



    If you use the PID of the process itself, you get EIO, but only if you're reading from an area that isn't mapped in the process's memory. The first page is never mapped, so it's ok if you read the file sequentially, but not suitable for a database process that seeks directly to the middle of the file.



    With some more setup as root, you can leverage the device mapper to create files with valid sectors and bad sectors.



    Another approach would be to implement a small FUSE filesystem. EIO is the default error code when your userspace filesystem driver does something wrong, so it's easy to achieve. Both the Perl and Python bindings come with examples to get started, you can quickly write a filesystem that mostly mirrors existing files but injects an EIO in carefully chosen places. There's an existing such filesystem: petardfs (article), I don't know how well it works out of the box.



    Yet another method is an LD_PRELOAD wrapper. An existing one is Libfiu (fault injection in userspace). It works by preloading a library that overloads the POSIX API calls. You can write simple directives or arbitrary C code to override the normal behavior.






    share|improve this answer


























    • Libfiu looks really promising (and it's in the debian repos). Great answer, thanks, +1

      – mreithub
      May 30 '13 at 22:21



















    1














    The solution is a lot easier if it's OK to use a device file as "file with I/O errors". My proposal is for those cases where a regular file shall have such errors.



    > dd if=/dev/zero of=/path/to/ext2.img bs=10M count=10
    > losetup /dev/loop0 /path/to/ext2.img
    > blockdev --getsz /dev/loop0
    204800
    > echo "0 204800 linear /dev/loop0 0" | dmsetup create sane_dev
    > mke2fs /dev/mapper/sane_dev # ext2 reicht
    > mount -t ext2 /dev/mapper/sane_dev /some/where
    > dd if=/dev/zero of=/some/where/unreadable_file bs=512 count=4
    > hdparm --fibmap /some/where/unreadable_file
    /mnt/tmp/unreadable_file:
    filesystem blocksize 1024, begins at LBA 0; assuming 512 byte sectors.
    byte_offset begin_LBA end_LBA sectors
    0 2050 2053 4
    > umount /dev/mapper/sane_dev
    > dmsetup remove sane_dev
    > start_sector=$((204800-2053-1))
    > echo $'0 2053 linear /dev/loop0 0n2053 1 errorn2054 '"${start_sector} linear /dev/loop0 2054" |
    > dmsetup create error_dev
    > mount -t ext2 /dev/mapper/error_dev /some/where
    > cat /some/where/unreadable_file # 3rd sector of file is unreadable
    cat: /some/where/unreadable_file: Input/output error


    I must admit that I am a bit confused because I haven't managed to read single sectors from that file without an error (with dd .. seek=...). Maybe that is a read-ahead problem.






    share|improve this answer


























    • Your filesystem's blocks are at least 4096 bytes in size so they will span multiple sectors even if the file is small.

      – Anon
      Sep 11 '17 at 6:38



















    1














    You could use CharybdeFS that was made exactly for this kind of purpose.



    It's a passthrough fuse filesystem like PetardFS but much more configurable.



    See the CharybdeFS cookbook here: http://www.scylladb.com/2016/05/02/fault-injection-filesystem-cookbook/



    It's advanced enough to test a database.






    share|improve this answer



















    • 1





      you must disclose your affiliation in your answers.

      – Michael Mrozek
      May 2 '16 at 19:56











    Your Answer








    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "106"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: false,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: null,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f77492%2fspecial-file-that-causes-i-o-error%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    5 Answers
    5






    active

    oldest

    votes








    5 Answers
    5






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    8














    You can use dmsetup to create a device-mapper device using either the error or flakey targets to simulate failures.



    dmsetup create test --table '0 123 flakey 1 0 /dev/loop0'


    Where 123 is the length of the device, in sectors and /dev/loop0 is the original device that you want to simulate errors on. For error, you don't need the subsequent arguments as it always returns an error.






    share|improve this answer





















    • 1





      I find at least two errors in that command: The missing device name, the quoting typo, and what is "1 0 /dev/null" supposed to mean?

      – Hauke Laging
      May 29 '13 at 13:40











    • @HaukeLaging, ahh, yes, I left out the name and somehow hit the wrong quote. The 1 0 /dev/null means 1 target, starting at offset 0, backed by device /dev/null. It is needed for flakey, but apparently is optional for error.

      – psusi
      May 29 '13 at 13:44











    • It seems to me that it's not "optional" but simply ignored. You may check with dmsetup table test. You can even write foo bar behind error; it just doesn't care (and thus should be deleted).

      – Hauke Laging
      May 29 '13 at 13:48











    • @HaukeLaging, edited.

      – psusi
      May 29 '13 at 13:54











    • Thanks for the answer, I think that's the way I'll go for now. The only minor issue I have with this is that it requires root access, but I guess you'll need that anyway or such lowlevel stuff... (I'll dig into the LD_PRELOAD idea when I have time).

      – mreithub
      May 29 '13 at 15:06
















    8














    You can use dmsetup to create a device-mapper device using either the error or flakey targets to simulate failures.



    dmsetup create test --table '0 123 flakey 1 0 /dev/loop0'


    Where 123 is the length of the device, in sectors and /dev/loop0 is the original device that you want to simulate errors on. For error, you don't need the subsequent arguments as it always returns an error.






    share|improve this answer





















    • 1





      I find at least two errors in that command: The missing device name, the quoting typo, and what is "1 0 /dev/null" supposed to mean?

      – Hauke Laging
      May 29 '13 at 13:40











    • @HaukeLaging, ahh, yes, I left out the name and somehow hit the wrong quote. The 1 0 /dev/null means 1 target, starting at offset 0, backed by device /dev/null. It is needed for flakey, but apparently is optional for error.

      – psusi
      May 29 '13 at 13:44











    • It seems to me that it's not "optional" but simply ignored. You may check with dmsetup table test. You can even write foo bar behind error; it just doesn't care (and thus should be deleted).

      – Hauke Laging
      May 29 '13 at 13:48











    • @HaukeLaging, edited.

      – psusi
      May 29 '13 at 13:54











    • Thanks for the answer, I think that's the way I'll go for now. The only minor issue I have with this is that it requires root access, but I guess you'll need that anyway or such lowlevel stuff... (I'll dig into the LD_PRELOAD idea when I have time).

      – mreithub
      May 29 '13 at 15:06














    8












    8








    8







    You can use dmsetup to create a device-mapper device using either the error or flakey targets to simulate failures.



    dmsetup create test --table '0 123 flakey 1 0 /dev/loop0'


    Where 123 is the length of the device, in sectors and /dev/loop0 is the original device that you want to simulate errors on. For error, you don't need the subsequent arguments as it always returns an error.






    share|improve this answer















    You can use dmsetup to create a device-mapper device using either the error or flakey targets to simulate failures.



    dmsetup create test --table '0 123 flakey 1 0 /dev/loop0'


    Where 123 is the length of the device, in sectors and /dev/loop0 is the original device that you want to simulate errors on. For error, you don't need the subsequent arguments as it always returns an error.







    share|improve this answer














    share|improve this answer



    share|improve this answer








    edited May 29 '13 at 13:53

























    answered May 29 '13 at 13:22









    psusipsusi

    13.5k22439




    13.5k22439








    • 1





      I find at least two errors in that command: The missing device name, the quoting typo, and what is "1 0 /dev/null" supposed to mean?

      – Hauke Laging
      May 29 '13 at 13:40











    • @HaukeLaging, ahh, yes, I left out the name and somehow hit the wrong quote. The 1 0 /dev/null means 1 target, starting at offset 0, backed by device /dev/null. It is needed for flakey, but apparently is optional for error.

      – psusi
      May 29 '13 at 13:44











    • It seems to me that it's not "optional" but simply ignored. You may check with dmsetup table test. You can even write foo bar behind error; it just doesn't care (and thus should be deleted).

      – Hauke Laging
      May 29 '13 at 13:48











    • @HaukeLaging, edited.

      – psusi
      May 29 '13 at 13:54











    • Thanks for the answer, I think that's the way I'll go for now. The only minor issue I have with this is that it requires root access, but I guess you'll need that anyway or such lowlevel stuff... (I'll dig into the LD_PRELOAD idea when I have time).

      – mreithub
      May 29 '13 at 15:06














    • 1





      I find at least two errors in that command: The missing device name, the quoting typo, and what is "1 0 /dev/null" supposed to mean?

      – Hauke Laging
      May 29 '13 at 13:40











    • @HaukeLaging, ahh, yes, I left out the name and somehow hit the wrong quote. The 1 0 /dev/null means 1 target, starting at offset 0, backed by device /dev/null. It is needed for flakey, but apparently is optional for error.

      – psusi
      May 29 '13 at 13:44











    • It seems to me that it's not "optional" but simply ignored. You may check with dmsetup table test. You can even write foo bar behind error; it just doesn't care (and thus should be deleted).

      – Hauke Laging
      May 29 '13 at 13:48











    • @HaukeLaging, edited.

      – psusi
      May 29 '13 at 13:54











    • Thanks for the answer, I think that's the way I'll go for now. The only minor issue I have with this is that it requires root access, but I guess you'll need that anyway or such lowlevel stuff... (I'll dig into the LD_PRELOAD idea when I have time).

      – mreithub
      May 29 '13 at 15:06








    1




    1





    I find at least two errors in that command: The missing device name, the quoting typo, and what is "1 0 /dev/null" supposed to mean?

    – Hauke Laging
    May 29 '13 at 13:40





    I find at least two errors in that command: The missing device name, the quoting typo, and what is "1 0 /dev/null" supposed to mean?

    – Hauke Laging
    May 29 '13 at 13:40













    @HaukeLaging, ahh, yes, I left out the name and somehow hit the wrong quote. The 1 0 /dev/null means 1 target, starting at offset 0, backed by device /dev/null. It is needed for flakey, but apparently is optional for error.

    – psusi
    May 29 '13 at 13:44





    @HaukeLaging, ahh, yes, I left out the name and somehow hit the wrong quote. The 1 0 /dev/null means 1 target, starting at offset 0, backed by device /dev/null. It is needed for flakey, but apparently is optional for error.

    – psusi
    May 29 '13 at 13:44













    It seems to me that it's not "optional" but simply ignored. You may check with dmsetup table test. You can even write foo bar behind error; it just doesn't care (and thus should be deleted).

    – Hauke Laging
    May 29 '13 at 13:48





    It seems to me that it's not "optional" but simply ignored. You may check with dmsetup table test. You can even write foo bar behind error; it just doesn't care (and thus should be deleted).

    – Hauke Laging
    May 29 '13 at 13:48













    @HaukeLaging, edited.

    – psusi
    May 29 '13 at 13:54





    @HaukeLaging, edited.

    – psusi
    May 29 '13 at 13:54













    Thanks for the answer, I think that's the way I'll go for now. The only minor issue I have with this is that it requires root access, but I guess you'll need that anyway or such lowlevel stuff... (I'll dig into the LD_PRELOAD idea when I have time).

    – mreithub
    May 29 '13 at 15:06





    Thanks for the answer, I think that's the way I'll go for now. The only minor issue I have with this is that it requires root access, but I guess you'll need that anyway or such lowlevel stuff... (I'll dig into the LD_PRELOAD idea when I have time).

    – mreithub
    May 29 '13 at 15:06













    12














    There's a great set of answers to this on Stack Overflow and Server Fault already but some techniques were missing. To make life easier here's a (not so) short list of Linux disk and filesystem I/O fault injection mechanisms:




    • Use Device Mapper's error/flakey/delay devices to return errors/corruption from, or delay/split IO to a synthesised block device (kernel, requires kernel to have been built with device mapper support, appropriate additional device mapper modules and to have device mapper userspace bits).

    • Use md's faulty personality to perform periodic fault injection on a synthesised block device. See the --layout option of the mdadm man page for how to configure it (kernel and mdadm userspace bits).

    • Use libfiu to perform fault injection on POSIX API calls (userspace, can be used with LD_PRELOAD).

    • Use the Linux kernel's fault injector to inject an error into the underlying block device (kernel, requires kernel to have been built with FAIL_MAKE_REQUEST=y)

    • Using SystemTap to do fault injection (kernel, requires a kernel to have been built with lots of stuff).


    • Inject filesystem faults using CharybdeFS or PetardFS (userspace via FUSE).

    • Create a synthesised block device using the Linux scsi_debug driver that performs fault injection (kernel).

    • Run your system within QEMU and use QEMU to inject block device errors using the blkdebug driver (VM).

    • Create a synthesised block device via the null_blk device's options to inject faults (kernel >= 4.14 but options like timeout probabilities didn't arrive until 4.17 and require the kernel to have been built with BLK_DEV_NULL_BLK_FAULT_INJECTION=y)

    • Create a synthesised Network Block Device which is served to the host via NBDkit filters such as delay or error and then attach a block device to it via nbd-client (kernel + NBD userspace bits, kernel >= 4.18 built with NBD support, nbdclient >= 3.18 and nbdkit >= 1.8.1 recommended - see NBDKit demo video around 20 min mark)


    Bonus fact: SQLite has a VFS driver for simulating errors so it can get good test coverage.



    Related:




    • How can I simulate a failed disk during testing?

    • Simulate a faulty block device with read errors?

    • Generate a read error

    • Intentionally cause an I/O error in Linux?






    share|improve this answer






























      12














      There's a great set of answers to this on Stack Overflow and Server Fault already but some techniques were missing. To make life easier here's a (not so) short list of Linux disk and filesystem I/O fault injection mechanisms:




      • Use Device Mapper's error/flakey/delay devices to return errors/corruption from, or delay/split IO to a synthesised block device (kernel, requires kernel to have been built with device mapper support, appropriate additional device mapper modules and to have device mapper userspace bits).

      • Use md's faulty personality to perform periodic fault injection on a synthesised block device. See the --layout option of the mdadm man page for how to configure it (kernel and mdadm userspace bits).

      • Use libfiu to perform fault injection on POSIX API calls (userspace, can be used with LD_PRELOAD).

      • Use the Linux kernel's fault injector to inject an error into the underlying block device (kernel, requires kernel to have been built with FAIL_MAKE_REQUEST=y)

      • Using SystemTap to do fault injection (kernel, requires a kernel to have been built with lots of stuff).


      • Inject filesystem faults using CharybdeFS or PetardFS (userspace via FUSE).

      • Create a synthesised block device using the Linux scsi_debug driver that performs fault injection (kernel).

      • Run your system within QEMU and use QEMU to inject block device errors using the blkdebug driver (VM).

      • Create a synthesised block device via the null_blk device's options to inject faults (kernel >= 4.14 but options like timeout probabilities didn't arrive until 4.17 and require the kernel to have been built with BLK_DEV_NULL_BLK_FAULT_INJECTION=y)

      • Create a synthesised Network Block Device which is served to the host via NBDkit filters such as delay or error and then attach a block device to it via nbd-client (kernel + NBD userspace bits, kernel >= 4.18 built with NBD support, nbdclient >= 3.18 and nbdkit >= 1.8.1 recommended - see NBDKit demo video around 20 min mark)


      Bonus fact: SQLite has a VFS driver for simulating errors so it can get good test coverage.



      Related:




      • How can I simulate a failed disk during testing?

      • Simulate a faulty block device with read errors?

      • Generate a read error

      • Intentionally cause an I/O error in Linux?






      share|improve this answer




























        12












        12








        12







        There's a great set of answers to this on Stack Overflow and Server Fault already but some techniques were missing. To make life easier here's a (not so) short list of Linux disk and filesystem I/O fault injection mechanisms:




        • Use Device Mapper's error/flakey/delay devices to return errors/corruption from, or delay/split IO to a synthesised block device (kernel, requires kernel to have been built with device mapper support, appropriate additional device mapper modules and to have device mapper userspace bits).

        • Use md's faulty personality to perform periodic fault injection on a synthesised block device. See the --layout option of the mdadm man page for how to configure it (kernel and mdadm userspace bits).

        • Use libfiu to perform fault injection on POSIX API calls (userspace, can be used with LD_PRELOAD).

        • Use the Linux kernel's fault injector to inject an error into the underlying block device (kernel, requires kernel to have been built with FAIL_MAKE_REQUEST=y)

        • Using SystemTap to do fault injection (kernel, requires a kernel to have been built with lots of stuff).


        • Inject filesystem faults using CharybdeFS or PetardFS (userspace via FUSE).

        • Create a synthesised block device using the Linux scsi_debug driver that performs fault injection (kernel).

        • Run your system within QEMU and use QEMU to inject block device errors using the blkdebug driver (VM).

        • Create a synthesised block device via the null_blk device's options to inject faults (kernel >= 4.14 but options like timeout probabilities didn't arrive until 4.17 and require the kernel to have been built with BLK_DEV_NULL_BLK_FAULT_INJECTION=y)

        • Create a synthesised Network Block Device which is served to the host via NBDkit filters such as delay or error and then attach a block device to it via nbd-client (kernel + NBD userspace bits, kernel >= 4.18 built with NBD support, nbdclient >= 3.18 and nbdkit >= 1.8.1 recommended - see NBDKit demo video around 20 min mark)


        Bonus fact: SQLite has a VFS driver for simulating errors so it can get good test coverage.



        Related:




        • How can I simulate a failed disk during testing?

        • Simulate a faulty block device with read errors?

        • Generate a read error

        • Intentionally cause an I/O error in Linux?






        share|improve this answer















        There's a great set of answers to this on Stack Overflow and Server Fault already but some techniques were missing. To make life easier here's a (not so) short list of Linux disk and filesystem I/O fault injection mechanisms:




        • Use Device Mapper's error/flakey/delay devices to return errors/corruption from, or delay/split IO to a synthesised block device (kernel, requires kernel to have been built with device mapper support, appropriate additional device mapper modules and to have device mapper userspace bits).

        • Use md's faulty personality to perform periodic fault injection on a synthesised block device. See the --layout option of the mdadm man page for how to configure it (kernel and mdadm userspace bits).

        • Use libfiu to perform fault injection on POSIX API calls (userspace, can be used with LD_PRELOAD).

        • Use the Linux kernel's fault injector to inject an error into the underlying block device (kernel, requires kernel to have been built with FAIL_MAKE_REQUEST=y)

        • Using SystemTap to do fault injection (kernel, requires a kernel to have been built with lots of stuff).


        • Inject filesystem faults using CharybdeFS or PetardFS (userspace via FUSE).

        • Create a synthesised block device using the Linux scsi_debug driver that performs fault injection (kernel).

        • Run your system within QEMU and use QEMU to inject block device errors using the blkdebug driver (VM).

        • Create a synthesised block device via the null_blk device's options to inject faults (kernel >= 4.14 but options like timeout probabilities didn't arrive until 4.17 and require the kernel to have been built with BLK_DEV_NULL_BLK_FAULT_INJECTION=y)

        • Create a synthesised Network Block Device which is served to the host via NBDkit filters such as delay or error and then attach a block device to it via nbd-client (kernel + NBD userspace bits, kernel >= 4.18 built with NBD support, nbdclient >= 3.18 and nbdkit >= 1.8.1 recommended - see NBDKit demo video around 20 min mark)


        Bonus fact: SQLite has a VFS driver for simulating errors so it can get good test coverage.



        Related:




        • How can I simulate a failed disk during testing?

        • Simulate a faulty block device with read errors?

        • Generate a read error

        • Intentionally cause an I/O error in Linux?







        share|improve this answer














        share|improve this answer



        share|improve this answer








        edited 25 mins ago

























        answered Jul 12 '14 at 17:16









        AnonAnon

        1,4641220




        1,4641220























            5














            You want a fault injection mechanism for I/O.



            On Linux, here's a method that doesn't require any prior setup and generates an unusual error (not EIO “Input/output error” but ESRCH “No such process”):



            cat /proc/1234/mem


            where 1234 is the PID of a process running as the same user as the process you're testing, but not that process itself. Credits to rubasov for thinking of /proc/$pid/mem.



            If you use the PID of the process itself, you get EIO, but only if you're reading from an area that isn't mapped in the process's memory. The first page is never mapped, so it's ok if you read the file sequentially, but not suitable for a database process that seeks directly to the middle of the file.



            With some more setup as root, you can leverage the device mapper to create files with valid sectors and bad sectors.



            Another approach would be to implement a small FUSE filesystem. EIO is the default error code when your userspace filesystem driver does something wrong, so it's easy to achieve. Both the Perl and Python bindings come with examples to get started, you can quickly write a filesystem that mostly mirrors existing files but injects an EIO in carefully chosen places. There's an existing such filesystem: petardfs (article), I don't know how well it works out of the box.



            Yet another method is an LD_PRELOAD wrapper. An existing one is Libfiu (fault injection in userspace). It works by preloading a library that overloads the POSIX API calls. You can write simple directives or arbitrary C code to override the normal behavior.






            share|improve this answer


























            • Libfiu looks really promising (and it's in the debian repos). Great answer, thanks, +1

              – mreithub
              May 30 '13 at 22:21
















            5














            You want a fault injection mechanism for I/O.



            On Linux, here's a method that doesn't require any prior setup and generates an unusual error (not EIO “Input/output error” but ESRCH “No such process”):



            cat /proc/1234/mem


            where 1234 is the PID of a process running as the same user as the process you're testing, but not that process itself. Credits to rubasov for thinking of /proc/$pid/mem.



            If you use the PID of the process itself, you get EIO, but only if you're reading from an area that isn't mapped in the process's memory. The first page is never mapped, so it's ok if you read the file sequentially, but not suitable for a database process that seeks directly to the middle of the file.



            With some more setup as root, you can leverage the device mapper to create files with valid sectors and bad sectors.



            Another approach would be to implement a small FUSE filesystem. EIO is the default error code when your userspace filesystem driver does something wrong, so it's easy to achieve. Both the Perl and Python bindings come with examples to get started, you can quickly write a filesystem that mostly mirrors existing files but injects an EIO in carefully chosen places. There's an existing such filesystem: petardfs (article), I don't know how well it works out of the box.



            Yet another method is an LD_PRELOAD wrapper. An existing one is Libfiu (fault injection in userspace). It works by preloading a library that overloads the POSIX API calls. You can write simple directives or arbitrary C code to override the normal behavior.






            share|improve this answer


























            • Libfiu looks really promising (and it's in the debian repos). Great answer, thanks, +1

              – mreithub
              May 30 '13 at 22:21














            5












            5








            5







            You want a fault injection mechanism for I/O.



            On Linux, here's a method that doesn't require any prior setup and generates an unusual error (not EIO “Input/output error” but ESRCH “No such process”):



            cat /proc/1234/mem


            where 1234 is the PID of a process running as the same user as the process you're testing, but not that process itself. Credits to rubasov for thinking of /proc/$pid/mem.



            If you use the PID of the process itself, you get EIO, but only if you're reading from an area that isn't mapped in the process's memory. The first page is never mapped, so it's ok if you read the file sequentially, but not suitable for a database process that seeks directly to the middle of the file.



            With some more setup as root, you can leverage the device mapper to create files with valid sectors and bad sectors.



            Another approach would be to implement a small FUSE filesystem. EIO is the default error code when your userspace filesystem driver does something wrong, so it's easy to achieve. Both the Perl and Python bindings come with examples to get started, you can quickly write a filesystem that mostly mirrors existing files but injects an EIO in carefully chosen places. There's an existing such filesystem: petardfs (article), I don't know how well it works out of the box.



            Yet another method is an LD_PRELOAD wrapper. An existing one is Libfiu (fault injection in userspace). It works by preloading a library that overloads the POSIX API calls. You can write simple directives or arbitrary C code to override the normal behavior.






            share|improve this answer















            You want a fault injection mechanism for I/O.



            On Linux, here's a method that doesn't require any prior setup and generates an unusual error (not EIO “Input/output error” but ESRCH “No such process”):



            cat /proc/1234/mem


            where 1234 is the PID of a process running as the same user as the process you're testing, but not that process itself. Credits to rubasov for thinking of /proc/$pid/mem.



            If you use the PID of the process itself, you get EIO, but only if you're reading from an area that isn't mapped in the process's memory. The first page is never mapped, so it's ok if you read the file sequentially, but not suitable for a database process that seeks directly to the middle of the file.



            With some more setup as root, you can leverage the device mapper to create files with valid sectors and bad sectors.



            Another approach would be to implement a small FUSE filesystem. EIO is the default error code when your userspace filesystem driver does something wrong, so it's easy to achieve. Both the Perl and Python bindings come with examples to get started, you can quickly write a filesystem that mostly mirrors existing files but injects an EIO in carefully chosen places. There's an existing such filesystem: petardfs (article), I don't know how well it works out of the box.



            Yet another method is an LD_PRELOAD wrapper. An existing one is Libfiu (fault injection in userspace). It works by preloading a library that overloads the POSIX API calls. You can write simple directives or arbitrary C code to override the normal behavior.







            share|improve this answer














            share|improve this answer



            share|improve this answer








            edited May 23 '17 at 11:33









            Community

            1




            1










            answered May 29 '13 at 21:28









            GillesGilles

            532k12810661594




            532k12810661594













            • Libfiu looks really promising (and it's in the debian repos). Great answer, thanks, +1

              – mreithub
              May 30 '13 at 22:21



















            • Libfiu looks really promising (and it's in the debian repos). Great answer, thanks, +1

              – mreithub
              May 30 '13 at 22:21

















            Libfiu looks really promising (and it's in the debian repos). Great answer, thanks, +1

            – mreithub
            May 30 '13 at 22:21





            Libfiu looks really promising (and it's in the debian repos). Great answer, thanks, +1

            – mreithub
            May 30 '13 at 22:21











            1














            The solution is a lot easier if it's OK to use a device file as "file with I/O errors". My proposal is for those cases where a regular file shall have such errors.



            > dd if=/dev/zero of=/path/to/ext2.img bs=10M count=10
            > losetup /dev/loop0 /path/to/ext2.img
            > blockdev --getsz /dev/loop0
            204800
            > echo "0 204800 linear /dev/loop0 0" | dmsetup create sane_dev
            > mke2fs /dev/mapper/sane_dev # ext2 reicht
            > mount -t ext2 /dev/mapper/sane_dev /some/where
            > dd if=/dev/zero of=/some/where/unreadable_file bs=512 count=4
            > hdparm --fibmap /some/where/unreadable_file
            /mnt/tmp/unreadable_file:
            filesystem blocksize 1024, begins at LBA 0; assuming 512 byte sectors.
            byte_offset begin_LBA end_LBA sectors
            0 2050 2053 4
            > umount /dev/mapper/sane_dev
            > dmsetup remove sane_dev
            > start_sector=$((204800-2053-1))
            > echo $'0 2053 linear /dev/loop0 0n2053 1 errorn2054 '"${start_sector} linear /dev/loop0 2054" |
            > dmsetup create error_dev
            > mount -t ext2 /dev/mapper/error_dev /some/where
            > cat /some/where/unreadable_file # 3rd sector of file is unreadable
            cat: /some/where/unreadable_file: Input/output error


            I must admit that I am a bit confused because I haven't managed to read single sectors from that file without an error (with dd .. seek=...). Maybe that is a read-ahead problem.






            share|improve this answer


























            • Your filesystem's blocks are at least 4096 bytes in size so they will span multiple sectors even if the file is small.

              – Anon
              Sep 11 '17 at 6:38
















            1














            The solution is a lot easier if it's OK to use a device file as "file with I/O errors". My proposal is for those cases where a regular file shall have such errors.



            > dd if=/dev/zero of=/path/to/ext2.img bs=10M count=10
            > losetup /dev/loop0 /path/to/ext2.img
            > blockdev --getsz /dev/loop0
            204800
            > echo "0 204800 linear /dev/loop0 0" | dmsetup create sane_dev
            > mke2fs /dev/mapper/sane_dev # ext2 reicht
            > mount -t ext2 /dev/mapper/sane_dev /some/where
            > dd if=/dev/zero of=/some/where/unreadable_file bs=512 count=4
            > hdparm --fibmap /some/where/unreadable_file
            /mnt/tmp/unreadable_file:
            filesystem blocksize 1024, begins at LBA 0; assuming 512 byte sectors.
            byte_offset begin_LBA end_LBA sectors
            0 2050 2053 4
            > umount /dev/mapper/sane_dev
            > dmsetup remove sane_dev
            > start_sector=$((204800-2053-1))
            > echo $'0 2053 linear /dev/loop0 0n2053 1 errorn2054 '"${start_sector} linear /dev/loop0 2054" |
            > dmsetup create error_dev
            > mount -t ext2 /dev/mapper/error_dev /some/where
            > cat /some/where/unreadable_file # 3rd sector of file is unreadable
            cat: /some/where/unreadable_file: Input/output error


            I must admit that I am a bit confused because I haven't managed to read single sectors from that file without an error (with dd .. seek=...). Maybe that is a read-ahead problem.






            share|improve this answer


























            • Your filesystem's blocks are at least 4096 bytes in size so they will span multiple sectors even if the file is small.

              – Anon
              Sep 11 '17 at 6:38














            1












            1








            1







            The solution is a lot easier if it's OK to use a device file as "file with I/O errors". My proposal is for those cases where a regular file shall have such errors.



            > dd if=/dev/zero of=/path/to/ext2.img bs=10M count=10
            > losetup /dev/loop0 /path/to/ext2.img
            > blockdev --getsz /dev/loop0
            204800
            > echo "0 204800 linear /dev/loop0 0" | dmsetup create sane_dev
            > mke2fs /dev/mapper/sane_dev # ext2 reicht
            > mount -t ext2 /dev/mapper/sane_dev /some/where
            > dd if=/dev/zero of=/some/where/unreadable_file bs=512 count=4
            > hdparm --fibmap /some/where/unreadable_file
            /mnt/tmp/unreadable_file:
            filesystem blocksize 1024, begins at LBA 0; assuming 512 byte sectors.
            byte_offset begin_LBA end_LBA sectors
            0 2050 2053 4
            > umount /dev/mapper/sane_dev
            > dmsetup remove sane_dev
            > start_sector=$((204800-2053-1))
            > echo $'0 2053 linear /dev/loop0 0n2053 1 errorn2054 '"${start_sector} linear /dev/loop0 2054" |
            > dmsetup create error_dev
            > mount -t ext2 /dev/mapper/error_dev /some/where
            > cat /some/where/unreadable_file # 3rd sector of file is unreadable
            cat: /some/where/unreadable_file: Input/output error


            I must admit that I am a bit confused because I haven't managed to read single sectors from that file without an error (with dd .. seek=...). Maybe that is a read-ahead problem.






            share|improve this answer















            The solution is a lot easier if it's OK to use a device file as "file with I/O errors". My proposal is for those cases where a regular file shall have such errors.



            > dd if=/dev/zero of=/path/to/ext2.img bs=10M count=10
            > losetup /dev/loop0 /path/to/ext2.img
            > blockdev --getsz /dev/loop0
            204800
            > echo "0 204800 linear /dev/loop0 0" | dmsetup create sane_dev
            > mke2fs /dev/mapper/sane_dev # ext2 reicht
            > mount -t ext2 /dev/mapper/sane_dev /some/where
            > dd if=/dev/zero of=/some/where/unreadable_file bs=512 count=4
            > hdparm --fibmap /some/where/unreadable_file
            /mnt/tmp/unreadable_file:
            filesystem blocksize 1024, begins at LBA 0; assuming 512 byte sectors.
            byte_offset begin_LBA end_LBA sectors
            0 2050 2053 4
            > umount /dev/mapper/sane_dev
            > dmsetup remove sane_dev
            > start_sector=$((204800-2053-1))
            > echo $'0 2053 linear /dev/loop0 0n2053 1 errorn2054 '"${start_sector} linear /dev/loop0 2054" |
            > dmsetup create error_dev
            > mount -t ext2 /dev/mapper/error_dev /some/where
            > cat /some/where/unreadable_file # 3rd sector of file is unreadable
            cat: /some/where/unreadable_file: Input/output error


            I must admit that I am a bit confused because I haven't managed to read single sectors from that file without an error (with dd .. seek=...). Maybe that is a read-ahead problem.







            share|improve this answer














            share|improve this answer



            share|improve this answer








            edited May 29 '13 at 13:49

























            answered May 29 '13 at 13:27









            Hauke LagingHauke Laging

            56.2k1285135




            56.2k1285135













            • Your filesystem's blocks are at least 4096 bytes in size so they will span multiple sectors even if the file is small.

              – Anon
              Sep 11 '17 at 6:38



















            • Your filesystem's blocks are at least 4096 bytes in size so they will span multiple sectors even if the file is small.

              – Anon
              Sep 11 '17 at 6:38

















            Your filesystem's blocks are at least 4096 bytes in size so they will span multiple sectors even if the file is small.

            – Anon
            Sep 11 '17 at 6:38





            Your filesystem's blocks are at least 4096 bytes in size so they will span multiple sectors even if the file is small.

            – Anon
            Sep 11 '17 at 6:38











            1














            You could use CharybdeFS that was made exactly for this kind of purpose.



            It's a passthrough fuse filesystem like PetardFS but much more configurable.



            See the CharybdeFS cookbook here: http://www.scylladb.com/2016/05/02/fault-injection-filesystem-cookbook/



            It's advanced enough to test a database.






            share|improve this answer



















            • 1





              you must disclose your affiliation in your answers.

              – Michael Mrozek
              May 2 '16 at 19:56
















            1














            You could use CharybdeFS that was made exactly for this kind of purpose.



            It's a passthrough fuse filesystem like PetardFS but much more configurable.



            See the CharybdeFS cookbook here: http://www.scylladb.com/2016/05/02/fault-injection-filesystem-cookbook/



            It's advanced enough to test a database.






            share|improve this answer



















            • 1





              you must disclose your affiliation in your answers.

              – Michael Mrozek
              May 2 '16 at 19:56














            1












            1








            1







            You could use CharybdeFS that was made exactly for this kind of purpose.



            It's a passthrough fuse filesystem like PetardFS but much more configurable.



            See the CharybdeFS cookbook here: http://www.scylladb.com/2016/05/02/fault-injection-filesystem-cookbook/



            It's advanced enough to test a database.






            share|improve this answer













            You could use CharybdeFS that was made exactly for this kind of purpose.



            It's a passthrough fuse filesystem like PetardFS but much more configurable.



            See the CharybdeFS cookbook here: http://www.scylladb.com/2016/05/02/fault-injection-filesystem-cookbook/



            It's advanced enough to test a database.







            share|improve this answer












            share|improve this answer



            share|improve this answer










            answered May 2 '16 at 17:17









            Benoît CanetBenoît Canet

            111




            111








            • 1





              you must disclose your affiliation in your answers.

              – Michael Mrozek
              May 2 '16 at 19:56














            • 1





              you must disclose your affiliation in your answers.

              – Michael Mrozek
              May 2 '16 at 19:56








            1




            1





            you must disclose your affiliation in your answers.

            – Michael Mrozek
            May 2 '16 at 19:56





            you must disclose your affiliation in your answers.

            – Michael Mrozek
            May 2 '16 at 19:56


















            draft saved

            draft discarded




















































            Thanks for contributing an answer to Unix & Linux Stack Exchange!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f77492%2fspecial-file-that-causes-i-o-error%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Loup dans la culture

            How to solve the problem of ntp “Unable to contact time server” from KDE?

            Connection limited (no internet access)