Remove duplicate lines from a file but leave 1 occurrence












0















I'm looking to remove duplicate lines from a file but leave 1 occurrence in the file.



Example of the file:



this is a string
test line
test line 2
this is a string


From the above example, I would want to remove 1 occurrence of "this is a string".



Best way to do this?










share|improve this question




















  • 1





    With such questions you should always provide example input and output.

    – Hauke Laging
    May 19 '18 at 13:12






  • 1





    Possibly related: Remove duplicate lines while keeping the order of the lines

    – steeldriver
    May 19 '18 at 13:12











  • Are the duplicated lines adjacent to one another? Is the output to remain in the same order or would it be ok to sort the data?

    – Kusalananda
    May 19 '18 at 13:14






  • 1





    Keep one occurrence of a duplicate (ie two identical lines per match) or simply "remove all duplicate lines, leaving only one line per set of duplicates"? Does the final order matter?

    – roaima
    May 19 '18 at 13:17






  • 1





    it is not a problem for you that the lines will be sorted, then a sort file|uniq will do what you want.

    – peterh
    May 19 '18 at 19:03
















0















I'm looking to remove duplicate lines from a file but leave 1 occurrence in the file.



Example of the file:



this is a string
test line
test line 2
this is a string


From the above example, I would want to remove 1 occurrence of "this is a string".



Best way to do this?










share|improve this question




















  • 1





    With such questions you should always provide example input and output.

    – Hauke Laging
    May 19 '18 at 13:12






  • 1





    Possibly related: Remove duplicate lines while keeping the order of the lines

    – steeldriver
    May 19 '18 at 13:12











  • Are the duplicated lines adjacent to one another? Is the output to remain in the same order or would it be ok to sort the data?

    – Kusalananda
    May 19 '18 at 13:14






  • 1





    Keep one occurrence of a duplicate (ie two identical lines per match) or simply "remove all duplicate lines, leaving only one line per set of duplicates"? Does the final order matter?

    – roaima
    May 19 '18 at 13:17






  • 1





    it is not a problem for you that the lines will be sorted, then a sort file|uniq will do what you want.

    – peterh
    May 19 '18 at 19:03














0












0








0








I'm looking to remove duplicate lines from a file but leave 1 occurrence in the file.



Example of the file:



this is a string
test line
test line 2
this is a string


From the above example, I would want to remove 1 occurrence of "this is a string".



Best way to do this?










share|improve this question
















I'm looking to remove duplicate lines from a file but leave 1 occurrence in the file.



Example of the file:



this is a string
test line
test line 2
this is a string


From the above example, I would want to remove 1 occurrence of "this is a string".



Best way to do this?







linux






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited May 19 '18 at 18:33







Tom Bailey

















asked May 19 '18 at 13:09









Tom BaileyTom Bailey

161




161








  • 1





    With such questions you should always provide example input and output.

    – Hauke Laging
    May 19 '18 at 13:12






  • 1





    Possibly related: Remove duplicate lines while keeping the order of the lines

    – steeldriver
    May 19 '18 at 13:12











  • Are the duplicated lines adjacent to one another? Is the output to remain in the same order or would it be ok to sort the data?

    – Kusalananda
    May 19 '18 at 13:14






  • 1





    Keep one occurrence of a duplicate (ie two identical lines per match) or simply "remove all duplicate lines, leaving only one line per set of duplicates"? Does the final order matter?

    – roaima
    May 19 '18 at 13:17






  • 1





    it is not a problem for you that the lines will be sorted, then a sort file|uniq will do what you want.

    – peterh
    May 19 '18 at 19:03














  • 1





    With such questions you should always provide example input and output.

    – Hauke Laging
    May 19 '18 at 13:12






  • 1





    Possibly related: Remove duplicate lines while keeping the order of the lines

    – steeldriver
    May 19 '18 at 13:12











  • Are the duplicated lines adjacent to one another? Is the output to remain in the same order or would it be ok to sort the data?

    – Kusalananda
    May 19 '18 at 13:14






  • 1





    Keep one occurrence of a duplicate (ie two identical lines per match) or simply "remove all duplicate lines, leaving only one line per set of duplicates"? Does the final order matter?

    – roaima
    May 19 '18 at 13:17






  • 1





    it is not a problem for you that the lines will be sorted, then a sort file|uniq will do what you want.

    – peterh
    May 19 '18 at 19:03








1




1





With such questions you should always provide example input and output.

– Hauke Laging
May 19 '18 at 13:12





With such questions you should always provide example input and output.

– Hauke Laging
May 19 '18 at 13:12




1




1





Possibly related: Remove duplicate lines while keeping the order of the lines

– steeldriver
May 19 '18 at 13:12





Possibly related: Remove duplicate lines while keeping the order of the lines

– steeldriver
May 19 '18 at 13:12













Are the duplicated lines adjacent to one another? Is the output to remain in the same order or would it be ok to sort the data?

– Kusalananda
May 19 '18 at 13:14





Are the duplicated lines adjacent to one another? Is the output to remain in the same order or would it be ok to sort the data?

– Kusalananda
May 19 '18 at 13:14




1




1





Keep one occurrence of a duplicate (ie two identical lines per match) or simply "remove all duplicate lines, leaving only one line per set of duplicates"? Does the final order matter?

– roaima
May 19 '18 at 13:17





Keep one occurrence of a duplicate (ie two identical lines per match) or simply "remove all duplicate lines, leaving only one line per set of duplicates"? Does the final order matter?

– roaima
May 19 '18 at 13:17




1




1





it is not a problem for you that the lines will be sorted, then a sort file|uniq will do what you want.

– peterh
May 19 '18 at 19:03





it is not a problem for you that the lines will be sorted, then a sort file|uniq will do what you want.

– peterh
May 19 '18 at 19:03










2 Answers
2






active

oldest

votes


















0














This leaves the first occurrence:



awk '! a[$0]++' inputfile

start cmd:> echo 'this is a string
cont. cmd:> test line
cont. cmd:> test line 2
cont. cmd:> this is a string' | awk '! a[$0]++'
this is a string
test line
test line 2





share|improve this answer


























  • It seems to just print out and not actually make in changes in the file.

    – Tom Bailey
    May 19 '18 at 15:49











  • @TomBailey That's why I told you to provide example input and output. I did test it and it works fine for me.

    – Hauke Laging
    May 19 '18 at 16:49













  • I have edited it now.

    – Tom Bailey
    May 19 '18 at 19:29











  • @TomBailey works fine for me.

    – Hauke Laging
    May 19 '18 at 20:16



















0














Demo file stuff.txt contains:



one
two
three
one
two
four
five


Remove duplicate lines from a file assuming you don't mind that lines are sorted



$ sort -u stuff.txt 
five
four
one
three
two


Explanation: the u flag sent to sort says sort the lines of the file and force unique.



Remove duplicate lines from a file, preserve original ordering, keep the first:



$ cat -n stuff.txt | sort -uk2 | sort -nk1 | cut -f2-
one
two
three
four
five


Explanation: The n flag passed to cat appends line numbers to left of every line, plus space, then the first sort says sort by unique and but only after the first word, the second sort command says use the line numbers we stored in step 1 to resort by the original ordering, finally cut off the first word.



Remove duplicate lines from a file, preserve order, keep last.



tac stuff.txt > stuff2.txt; cat -n stuff2.txt | sort -uk2 | sort -nk1 | cut -f2- > stuff3.txt; tac stuff3.txt > stuff4.txt; cat stuff4.txt
three
one
two
four
five


Explanation: Same as before, but tac reverse the file, achieving the desired result.






share|improve this answer























    Your Answer








    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "106"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: false,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: null,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f444795%2fremove-duplicate-lines-from-a-file-but-leave-1-occurrence%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    2 Answers
    2






    active

    oldest

    votes








    2 Answers
    2






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    0














    This leaves the first occurrence:



    awk '! a[$0]++' inputfile

    start cmd:> echo 'this is a string
    cont. cmd:> test line
    cont. cmd:> test line 2
    cont. cmd:> this is a string' | awk '! a[$0]++'
    this is a string
    test line
    test line 2





    share|improve this answer


























    • It seems to just print out and not actually make in changes in the file.

      – Tom Bailey
      May 19 '18 at 15:49











    • @TomBailey That's why I told you to provide example input and output. I did test it and it works fine for me.

      – Hauke Laging
      May 19 '18 at 16:49













    • I have edited it now.

      – Tom Bailey
      May 19 '18 at 19:29











    • @TomBailey works fine for me.

      – Hauke Laging
      May 19 '18 at 20:16
















    0














    This leaves the first occurrence:



    awk '! a[$0]++' inputfile

    start cmd:> echo 'this is a string
    cont. cmd:> test line
    cont. cmd:> test line 2
    cont. cmd:> this is a string' | awk '! a[$0]++'
    this is a string
    test line
    test line 2





    share|improve this answer


























    • It seems to just print out and not actually make in changes in the file.

      – Tom Bailey
      May 19 '18 at 15:49











    • @TomBailey That's why I told you to provide example input and output. I did test it and it works fine for me.

      – Hauke Laging
      May 19 '18 at 16:49













    • I have edited it now.

      – Tom Bailey
      May 19 '18 at 19:29











    • @TomBailey works fine for me.

      – Hauke Laging
      May 19 '18 at 20:16














    0












    0








    0







    This leaves the first occurrence:



    awk '! a[$0]++' inputfile

    start cmd:> echo 'this is a string
    cont. cmd:> test line
    cont. cmd:> test line 2
    cont. cmd:> this is a string' | awk '! a[$0]++'
    this is a string
    test line
    test line 2





    share|improve this answer















    This leaves the first occurrence:



    awk '! a[$0]++' inputfile

    start cmd:> echo 'this is a string
    cont. cmd:> test line
    cont. cmd:> test line 2
    cont. cmd:> this is a string' | awk '! a[$0]++'
    this is a string
    test line
    test line 2






    share|improve this answer














    share|improve this answer



    share|improve this answer








    edited May 19 '18 at 20:16

























    answered May 19 '18 at 13:16









    Hauke LagingHauke Laging

    57k1287135




    57k1287135













    • It seems to just print out and not actually make in changes in the file.

      – Tom Bailey
      May 19 '18 at 15:49











    • @TomBailey That's why I told you to provide example input and output. I did test it and it works fine for me.

      – Hauke Laging
      May 19 '18 at 16:49













    • I have edited it now.

      – Tom Bailey
      May 19 '18 at 19:29











    • @TomBailey works fine for me.

      – Hauke Laging
      May 19 '18 at 20:16



















    • It seems to just print out and not actually make in changes in the file.

      – Tom Bailey
      May 19 '18 at 15:49











    • @TomBailey That's why I told you to provide example input and output. I did test it and it works fine for me.

      – Hauke Laging
      May 19 '18 at 16:49













    • I have edited it now.

      – Tom Bailey
      May 19 '18 at 19:29











    • @TomBailey works fine for me.

      – Hauke Laging
      May 19 '18 at 20:16

















    It seems to just print out and not actually make in changes in the file.

    – Tom Bailey
    May 19 '18 at 15:49





    It seems to just print out and not actually make in changes in the file.

    – Tom Bailey
    May 19 '18 at 15:49













    @TomBailey That's why I told you to provide example input and output. I did test it and it works fine for me.

    – Hauke Laging
    May 19 '18 at 16:49







    @TomBailey That's why I told you to provide example input and output. I did test it and it works fine for me.

    – Hauke Laging
    May 19 '18 at 16:49















    I have edited it now.

    – Tom Bailey
    May 19 '18 at 19:29





    I have edited it now.

    – Tom Bailey
    May 19 '18 at 19:29













    @TomBailey works fine for me.

    – Hauke Laging
    May 19 '18 at 20:16





    @TomBailey works fine for me.

    – Hauke Laging
    May 19 '18 at 20:16













    0














    Demo file stuff.txt contains:



    one
    two
    three
    one
    two
    four
    five


    Remove duplicate lines from a file assuming you don't mind that lines are sorted



    $ sort -u stuff.txt 
    five
    four
    one
    three
    two


    Explanation: the u flag sent to sort says sort the lines of the file and force unique.



    Remove duplicate lines from a file, preserve original ordering, keep the first:



    $ cat -n stuff.txt | sort -uk2 | sort -nk1 | cut -f2-
    one
    two
    three
    four
    five


    Explanation: The n flag passed to cat appends line numbers to left of every line, plus space, then the first sort says sort by unique and but only after the first word, the second sort command says use the line numbers we stored in step 1 to resort by the original ordering, finally cut off the first word.



    Remove duplicate lines from a file, preserve order, keep last.



    tac stuff.txt > stuff2.txt; cat -n stuff2.txt | sort -uk2 | sort -nk1 | cut -f2- > stuff3.txt; tac stuff3.txt > stuff4.txt; cat stuff4.txt
    three
    one
    two
    four
    five


    Explanation: Same as before, but tac reverse the file, achieving the desired result.






    share|improve this answer




























      0














      Demo file stuff.txt contains:



      one
      two
      three
      one
      two
      four
      five


      Remove duplicate lines from a file assuming you don't mind that lines are sorted



      $ sort -u stuff.txt 
      five
      four
      one
      three
      two


      Explanation: the u flag sent to sort says sort the lines of the file and force unique.



      Remove duplicate lines from a file, preserve original ordering, keep the first:



      $ cat -n stuff.txt | sort -uk2 | sort -nk1 | cut -f2-
      one
      two
      three
      four
      five


      Explanation: The n flag passed to cat appends line numbers to left of every line, plus space, then the first sort says sort by unique and but only after the first word, the second sort command says use the line numbers we stored in step 1 to resort by the original ordering, finally cut off the first word.



      Remove duplicate lines from a file, preserve order, keep last.



      tac stuff.txt > stuff2.txt; cat -n stuff2.txt | sort -uk2 | sort -nk1 | cut -f2- > stuff3.txt; tac stuff3.txt > stuff4.txt; cat stuff4.txt
      three
      one
      two
      four
      five


      Explanation: Same as before, but tac reverse the file, achieving the desired result.






      share|improve this answer


























        0












        0








        0







        Demo file stuff.txt contains:



        one
        two
        three
        one
        two
        four
        five


        Remove duplicate lines from a file assuming you don't mind that lines are sorted



        $ sort -u stuff.txt 
        five
        four
        one
        three
        two


        Explanation: the u flag sent to sort says sort the lines of the file and force unique.



        Remove duplicate lines from a file, preserve original ordering, keep the first:



        $ cat -n stuff.txt | sort -uk2 | sort -nk1 | cut -f2-
        one
        two
        three
        four
        five


        Explanation: The n flag passed to cat appends line numbers to left of every line, plus space, then the first sort says sort by unique and but only after the first word, the second sort command says use the line numbers we stored in step 1 to resort by the original ordering, finally cut off the first word.



        Remove duplicate lines from a file, preserve order, keep last.



        tac stuff.txt > stuff2.txt; cat -n stuff2.txt | sort -uk2 | sort -nk1 | cut -f2- > stuff3.txt; tac stuff3.txt > stuff4.txt; cat stuff4.txt
        three
        one
        two
        four
        five


        Explanation: Same as before, but tac reverse the file, achieving the desired result.






        share|improve this answer













        Demo file stuff.txt contains:



        one
        two
        three
        one
        two
        four
        five


        Remove duplicate lines from a file assuming you don't mind that lines are sorted



        $ sort -u stuff.txt 
        five
        four
        one
        three
        two


        Explanation: the u flag sent to sort says sort the lines of the file and force unique.



        Remove duplicate lines from a file, preserve original ordering, keep the first:



        $ cat -n stuff.txt | sort -uk2 | sort -nk1 | cut -f2-
        one
        two
        three
        four
        five


        Explanation: The n flag passed to cat appends line numbers to left of every line, plus space, then the first sort says sort by unique and but only after the first word, the second sort command says use the line numbers we stored in step 1 to resort by the original ordering, finally cut off the first word.



        Remove duplicate lines from a file, preserve order, keep last.



        tac stuff.txt > stuff2.txt; cat -n stuff2.txt | sort -uk2 | sort -nk1 | cut -f2- > stuff3.txt; tac stuff3.txt > stuff4.txt; cat stuff4.txt
        three
        one
        two
        four
        five


        Explanation: Same as before, but tac reverse the file, achieving the desired result.







        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered 13 mins ago









        Eric LeschinskiEric Leschinski

        1,30711416




        1,30711416






























            draft saved

            draft discarded




















































            Thanks for contributing an answer to Unix & Linux Stack Exchange!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f444795%2fremove-duplicate-lines-from-a-file-but-leave-1-occurrence%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Loup dans la culture

            How to solve the problem of ntp “Unable to contact time server” from KDE?

            ASUS Zenbook UX433/UX333 — Configure Touchpad-embedded numpad on Linux