how to remove last two "-delimited strings from each line in a large file












1















I have numerous 2GB space-delimited files from a source system. Each row in each file contains the same number of strings surrounded by " as text qualifiers.



I need to eliminate the last two strings and their text qualifiers from every row in each file. (like removing the last two columns from a columnar report). With smaller files, I can import into Excel, delimit, delete the columns, save as tab delimited (much more useful than spaces).



Anycase, these files are too large and have too many rows for excel. So sed??



"text1" "text2" "text3" "text4" "text5" "text6"


Every row has the same number of strings. How do I drop "text5" "text6" from every row?










share|improve this question

























  • awk '{$5=$6=""}1' file...

    – jasonwryan
    May 18 '17 at 1:41











  • @jasonwryan: Or just awk 'NF=4'

    – Thor
    May 18 '17 at 5:04











  • @Thor better...

    – jasonwryan
    May 18 '17 at 5:07
















1















I have numerous 2GB space-delimited files from a source system. Each row in each file contains the same number of strings surrounded by " as text qualifiers.



I need to eliminate the last two strings and their text qualifiers from every row in each file. (like removing the last two columns from a columnar report). With smaller files, I can import into Excel, delimit, delete the columns, save as tab delimited (much more useful than spaces).



Anycase, these files are too large and have too many rows for excel. So sed??



"text1" "text2" "text3" "text4" "text5" "text6"


Every row has the same number of strings. How do I drop "text5" "text6" from every row?










share|improve this question

























  • awk '{$5=$6=""}1' file...

    – jasonwryan
    May 18 '17 at 1:41











  • @jasonwryan: Or just awk 'NF=4'

    – Thor
    May 18 '17 at 5:04











  • @Thor better...

    – jasonwryan
    May 18 '17 at 5:07














1












1








1








I have numerous 2GB space-delimited files from a source system. Each row in each file contains the same number of strings surrounded by " as text qualifiers.



I need to eliminate the last two strings and their text qualifiers from every row in each file. (like removing the last two columns from a columnar report). With smaller files, I can import into Excel, delimit, delete the columns, save as tab delimited (much more useful than spaces).



Anycase, these files are too large and have too many rows for excel. So sed??



"text1" "text2" "text3" "text4" "text5" "text6"


Every row has the same number of strings. How do I drop "text5" "text6" from every row?










share|improve this question
















I have numerous 2GB space-delimited files from a source system. Each row in each file contains the same number of strings surrounded by " as text qualifiers.



I need to eliminate the last two strings and their text qualifiers from every row in each file. (like removing the last two columns from a columnar report). With smaller files, I can import into Excel, delimit, delete the columns, save as tab delimited (much more useful than spaces).



Anycase, these files are too large and have too many rows for excel. So sed??



"text1" "text2" "text3" "text4" "text5" "text6"


Every row has the same number of strings. How do I drop "text5" "text6" from every row?







text-processing sed text delete






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited May 18 '17 at 1:30









Stephen Rauch

3,344101428




3,344101428










asked May 18 '17 at 1:19









user231894user231894

61




61













  • awk '{$5=$6=""}1' file...

    – jasonwryan
    May 18 '17 at 1:41











  • @jasonwryan: Or just awk 'NF=4'

    – Thor
    May 18 '17 at 5:04











  • @Thor better...

    – jasonwryan
    May 18 '17 at 5:07



















  • awk '{$5=$6=""}1' file...

    – jasonwryan
    May 18 '17 at 1:41











  • @jasonwryan: Or just awk 'NF=4'

    – Thor
    May 18 '17 at 5:04











  • @Thor better...

    – jasonwryan
    May 18 '17 at 5:07

















awk '{$5=$6=""}1' file...

– jasonwryan
May 18 '17 at 1:41





awk '{$5=$6=""}1' file...

– jasonwryan
May 18 '17 at 1:41













@jasonwryan: Or just awk 'NF=4'

– Thor
May 18 '17 at 5:04





@jasonwryan: Or just awk 'NF=4'

– Thor
May 18 '17 at 5:04













@Thor better...

– jasonwryan
May 18 '17 at 5:07





@Thor better...

– jasonwryan
May 18 '17 at 5:07










4 Answers
4






active

oldest

votes


















2














This sed command will remove the last two space separated, quoted strings from the end of each line from file infile and send the results to outfile:



sed 's/ *"[^"]*" *"[^"]*" *$//' < infile > outfile





share|improve this answer































    2














    If you know that you always want to delete the last two columns, this idiom can be used:



    awk 'NF-=2' file


    I noticed that this does not work with nawk, not sure why. The portable way is to force the field splitting with `$1=$1:



    awk '{NF-=2} $1=$1' file


    Output:



    "text1" "text2" "text3" "text4"





    share|improve this answer

































      1














      awk '{$(NF-1)=$NF=""}1'  inp

      perl -pale '$_ = "@F[0..@F-3]"' inp

      sed -ne '
      s/" "/"
      "/g
      :a
      s/n/ /
      /n.*n.*n/ba
      P
      ' inp


      Explanation:





      • awk code nulls out the last and second-last fields and prints.

      • In perl fields are stored in @F array and the slice from 0th to third-last are selected and stored in the current line $_. The double quotes are there to effect the array->string xformation and joined together by the $" superglobal whose default value is a space. -p Perl option then carries the $_ value to the stdout.

      • In sed we first turn all the patterns " " ---> "n" then we enter a loop where we take back the n till there are two left. At which point of time we use the P uppercase p, command to print the first portion of the pattern space.






      share|improve this answer































        0














        echo "text1" "text2" "text3" "text4" "text5" "text6" | awk  -v ORS=""  '{
        for(i=1;i<=NF-2;i++)print $i, " " ; print "n"}'




        share








        New contributor




        Deepika Reddy Billuri is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
        Check out our Code of Conduct.




















          Your Answer








          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "106"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: false,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: null,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });














          draft saved

          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f365749%2fhow-to-remove-last-two-delimited-strings-from-each-line-in-a-large-file%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          4 Answers
          4






          active

          oldest

          votes








          4 Answers
          4






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          2














          This sed command will remove the last two space separated, quoted strings from the end of each line from file infile and send the results to outfile:



          sed 's/ *"[^"]*" *"[^"]*" *$//' < infile > outfile





          share|improve this answer




























            2














            This sed command will remove the last two space separated, quoted strings from the end of each line from file infile and send the results to outfile:



            sed 's/ *"[^"]*" *"[^"]*" *$//' < infile > outfile





            share|improve this answer


























              2












              2








              2







              This sed command will remove the last two space separated, quoted strings from the end of each line from file infile and send the results to outfile:



              sed 's/ *"[^"]*" *"[^"]*" *$//' < infile > outfile





              share|improve this answer













              This sed command will remove the last two space separated, quoted strings from the end of each line from file infile and send the results to outfile:



              sed 's/ *"[^"]*" *"[^"]*" *$//' < infile > outfile






              share|improve this answer












              share|improve this answer



              share|improve this answer










              answered May 18 '17 at 1:38









              Stephen RauchStephen Rauch

              3,344101428




              3,344101428

























                  2














                  If you know that you always want to delete the last two columns, this idiom can be used:



                  awk 'NF-=2' file


                  I noticed that this does not work with nawk, not sure why. The portable way is to force the field splitting with `$1=$1:



                  awk '{NF-=2} $1=$1' file


                  Output:



                  "text1" "text2" "text3" "text4"





                  share|improve this answer






























                    2














                    If you know that you always want to delete the last two columns, this idiom can be used:



                    awk 'NF-=2' file


                    I noticed that this does not work with nawk, not sure why. The portable way is to force the field splitting with `$1=$1:



                    awk '{NF-=2} $1=$1' file


                    Output:



                    "text1" "text2" "text3" "text4"





                    share|improve this answer




























                      2












                      2








                      2







                      If you know that you always want to delete the last two columns, this idiom can be used:



                      awk 'NF-=2' file


                      I noticed that this does not work with nawk, not sure why. The portable way is to force the field splitting with `$1=$1:



                      awk '{NF-=2} $1=$1' file


                      Output:



                      "text1" "text2" "text3" "text4"





                      share|improve this answer















                      If you know that you always want to delete the last two columns, this idiom can be used:



                      awk 'NF-=2' file


                      I noticed that this does not work with nawk, not sure why. The portable way is to force the field splitting with `$1=$1:



                      awk '{NF-=2} $1=$1' file


                      Output:



                      "text1" "text2" "text3" "text4"






                      share|improve this answer














                      share|improve this answer



                      share|improve this answer








                      edited May 18 '17 at 5:18

























                      answered May 18 '17 at 5:08









                      ThorThor

                      11.9k13459




                      11.9k13459























                          1














                          awk '{$(NF-1)=$NF=""}1'  inp

                          perl -pale '$_ = "@F[0..@F-3]"' inp

                          sed -ne '
                          s/" "/"
                          "/g
                          :a
                          s/n/ /
                          /n.*n.*n/ba
                          P
                          ' inp


                          Explanation:





                          • awk code nulls out the last and second-last fields and prints.

                          • In perl fields are stored in @F array and the slice from 0th to third-last are selected and stored in the current line $_. The double quotes are there to effect the array->string xformation and joined together by the $" superglobal whose default value is a space. -p Perl option then carries the $_ value to the stdout.

                          • In sed we first turn all the patterns " " ---> "n" then we enter a loop where we take back the n till there are two left. At which point of time we use the P uppercase p, command to print the first portion of the pattern space.






                          share|improve this answer




























                            1














                            awk '{$(NF-1)=$NF=""}1'  inp

                            perl -pale '$_ = "@F[0..@F-3]"' inp

                            sed -ne '
                            s/" "/"
                            "/g
                            :a
                            s/n/ /
                            /n.*n.*n/ba
                            P
                            ' inp


                            Explanation:





                            • awk code nulls out the last and second-last fields and prints.

                            • In perl fields are stored in @F array and the slice from 0th to third-last are selected and stored in the current line $_. The double quotes are there to effect the array->string xformation and joined together by the $" superglobal whose default value is a space. -p Perl option then carries the $_ value to the stdout.

                            • In sed we first turn all the patterns " " ---> "n" then we enter a loop where we take back the n till there are two left. At which point of time we use the P uppercase p, command to print the first portion of the pattern space.






                            share|improve this answer


























                              1












                              1








                              1







                              awk '{$(NF-1)=$NF=""}1'  inp

                              perl -pale '$_ = "@F[0..@F-3]"' inp

                              sed -ne '
                              s/" "/"
                              "/g
                              :a
                              s/n/ /
                              /n.*n.*n/ba
                              P
                              ' inp


                              Explanation:





                              • awk code nulls out the last and second-last fields and prints.

                              • In perl fields are stored in @F array and the slice from 0th to third-last are selected and stored in the current line $_. The double quotes are there to effect the array->string xformation and joined together by the $" superglobal whose default value is a space. -p Perl option then carries the $_ value to the stdout.

                              • In sed we first turn all the patterns " " ---> "n" then we enter a loop where we take back the n till there are two left. At which point of time we use the P uppercase p, command to print the first portion of the pattern space.






                              share|improve this answer













                              awk '{$(NF-1)=$NF=""}1'  inp

                              perl -pale '$_ = "@F[0..@F-3]"' inp

                              sed -ne '
                              s/" "/"
                              "/g
                              :a
                              s/n/ /
                              /n.*n.*n/ba
                              P
                              ' inp


                              Explanation:





                              • awk code nulls out the last and second-last fields and prints.

                              • In perl fields are stored in @F array and the slice from 0th to third-last are selected and stored in the current line $_. The double quotes are there to effect the array->string xformation and joined together by the $" superglobal whose default value is a space. -p Perl option then carries the $_ value to the stdout.

                              • In sed we first turn all the patterns " " ---> "n" then we enter a loop where we take back the n till there are two left. At which point of time we use the P uppercase p, command to print the first portion of the pattern space.







                              share|improve this answer












                              share|improve this answer



                              share|improve this answer










                              answered May 18 '17 at 3:53







                              user218374






























                                  0














                                  echo "text1" "text2" "text3" "text4" "text5" "text6" | awk  -v ORS=""  '{
                                  for(i=1;i<=NF-2;i++)print $i, " " ; print "n"}'




                                  share








                                  New contributor




                                  Deepika Reddy Billuri is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                                  Check out our Code of Conduct.

























                                    0














                                    echo "text1" "text2" "text3" "text4" "text5" "text6" | awk  -v ORS=""  '{
                                    for(i=1;i<=NF-2;i++)print $i, " " ; print "n"}'




                                    share








                                    New contributor




                                    Deepika Reddy Billuri is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                                    Check out our Code of Conduct.























                                      0












                                      0








                                      0







                                      echo "text1" "text2" "text3" "text4" "text5" "text6" | awk  -v ORS=""  '{
                                      for(i=1;i<=NF-2;i++)print $i, " " ; print "n"}'




                                      share








                                      New contributor




                                      Deepika Reddy Billuri is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                                      Check out our Code of Conduct.










                                      echo "text1" "text2" "text3" "text4" "text5" "text6" | awk  -v ORS=""  '{
                                      for(i=1;i<=NF-2;i++)print $i, " " ; print "n"}'





                                      share








                                      New contributor




                                      Deepika Reddy Billuri is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                                      Check out our Code of Conduct.








                                      share


                                      share






                                      New contributor




                                      Deepika Reddy Billuri is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                                      Check out our Code of Conduct.









                                      answered 2 mins ago









                                      Deepika Reddy BilluriDeepika Reddy Billuri

                                      11




                                      11




                                      New contributor




                                      Deepika Reddy Billuri is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                                      Check out our Code of Conduct.





                                      New contributor





                                      Deepika Reddy Billuri is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                                      Check out our Code of Conduct.






                                      Deepika Reddy Billuri is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                                      Check out our Code of Conduct.






























                                          draft saved

                                          draft discarded




















































                                          Thanks for contributing an answer to Unix & Linux Stack Exchange!


                                          • Please be sure to answer the question. Provide details and share your research!

                                          But avoid



                                          • Asking for help, clarification, or responding to other answers.

                                          • Making statements based on opinion; back them up with references or personal experience.


                                          To learn more, see our tips on writing great answers.




                                          draft saved


                                          draft discarded














                                          StackExchange.ready(
                                          function () {
                                          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f365749%2fhow-to-remove-last-two-delimited-strings-from-each-line-in-a-large-file%23new-answer', 'question_page');
                                          }
                                          );

                                          Post as a guest















                                          Required, but never shown





















































                                          Required, but never shown














                                          Required, but never shown












                                          Required, but never shown







                                          Required, but never shown

































                                          Required, but never shown














                                          Required, but never shown












                                          Required, but never shown







                                          Required, but never shown







                                          Popular posts from this blog

                                          Loup dans la culture

                                          How to solve the problem of ntp “Unable to contact time server” from KDE?

                                          ASUS Zenbook UX433/UX333 — Configure Touchpad-embedded numpad on Linux