split file lines by regex delimeter












0















I want to split each line from input file by a non-alphanumeric regex W and print all the split chunks in the output file like so:



Input file:



www.wifi.in.ua
YI-HondBrychka


Output file:



www
wifi
in
ua
YI
HondBrynchka









share|improve this question



























    0















    I want to split each line from input file by a non-alphanumeric regex W and print all the split chunks in the output file like so:



    Input file:



    www.wifi.in.ua
    YI-HondBrychka


    Output file:



    www
    wifi
    in
    ua
    YI
    HondBrynchka









    share|improve this question

























      0












      0








      0








      I want to split each line from input file by a non-alphanumeric regex W and print all the split chunks in the output file like so:



      Input file:



      www.wifi.in.ua
      YI-HondBrychka


      Output file:



      www
      wifi
      in
      ua
      YI
      HondBrynchka









      share|improve this question














      I want to split each line from input file by a non-alphanumeric regex W and print all the split chunks in the output file like so:



      Input file:



      www.wifi.in.ua
      YI-HondBrychka


      Output file:



      www
      wifi
      in
      ua
      YI
      HondBrynchka






      regular-expression






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked 7 hours ago









      dizczadizcza

      103




      103






















          2 Answers
          2






          active

          oldest

          votes


















          1














          Try using the -o flag, to only print matching strings, e.g.



          $ cat <<HEREDOC | grep -Po 'w+'
          www.wifi.in.ua
          YI-HondBrychka
          HEREDOC

          www
          wifi
          in
          ua
          YI
          HondBrychka





          share|improve this answer
























          • Did you mean grep -Po 'w+' HEREDOC? This works, thank you.

            – dizcza
            7 hours ago











          • No, I meant what I wrote there - it works for me if I copy-paste it. In practice, I imagine you would use a file instead of a heredoc string, and it would look like grep -Po 'w+' /path/to/file.

            – igal
            6 hours ago











          • Substituting HEREDOC for path to file, cat <<HEREDOC | grep -Po 'w+' doesn't work for me while cat HEREDOC | grep -Po 'w+' (or better grep -Po 'w+' HEREDOC) works fine.

            – dizcza
            6 hours ago











          • It sounds to me like you're confused about what a heredoc is - it's not supposed to represent a file path. You can read more about "here documents" in the Bash manual: gnu.org/software/bash/manual/bashref.html#Here-Documents

            – igal
            6 hours ago











          • Oh, I see, I didn't know that. Thank you.

            – dizcza
            6 hours ago



















          0














          Replacing all matches of W with newlines, using Perl (from which the W expression originated):



          $ perl -pe '$_ =~ s/W/n/g' <file
          www
          wifi
          in
          ua
          YI
          HondBrychka


          Or, more in line with the actual wording of the question:



          $ perl -pe '$_ = join("n", split(/W/)) . "n"' <file
          www
          wifi
          in
          ua
          YI
          HondBrychka


          Expressing the PCRE W as the ERE [^[:alnum:]] and using GNU awk:



          awk -v RS='[^[:alnum:]]' 1 file


          The 1 is short for '{ print }' and this sets the input record separator to any W character. The records are then printed on individual lines.



          Or with GNU sed:



          sed 's/[^[:alnum:]]/n/g' file


          With tr, it becomes



          $ tr -c '[:alnum:]' 'n' <file
          www
          wifi
          in
          ua
          YI
          HondBrychka


          where -c makes it replace each character that is not an [:alnum:] with a newline.






          share|improve this answer


























          • The last solution has an artifact of empty new lines. Compare tr -c '[:alpha:]' 'n' < HEREDOC with grep -Po '[a-zA-Z]+' HEREDOC if we add "Zhenek_Lebed98" to the input file HEREDOC.

            – dizcza
            6 hours ago











          • @dizcza Add -s to the command line. It will make it squeeze multiple newlines into one. But on the other hand, you do have an empty word between 9 and 8.

            – Kusalananda
            6 hours ago













          • Now it works as well.

            – dizcza
            6 hours ago











          Your Answer








          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "106"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: false,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: null,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });














          draft saved

          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f505512%2fsplit-file-lines-by-regex-delimeter%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          2 Answers
          2






          active

          oldest

          votes








          2 Answers
          2






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          1














          Try using the -o flag, to only print matching strings, e.g.



          $ cat <<HEREDOC | grep -Po 'w+'
          www.wifi.in.ua
          YI-HondBrychka
          HEREDOC

          www
          wifi
          in
          ua
          YI
          HondBrychka





          share|improve this answer
























          • Did you mean grep -Po 'w+' HEREDOC? This works, thank you.

            – dizcza
            7 hours ago











          • No, I meant what I wrote there - it works for me if I copy-paste it. In practice, I imagine you would use a file instead of a heredoc string, and it would look like grep -Po 'w+' /path/to/file.

            – igal
            6 hours ago











          • Substituting HEREDOC for path to file, cat <<HEREDOC | grep -Po 'w+' doesn't work for me while cat HEREDOC | grep -Po 'w+' (or better grep -Po 'w+' HEREDOC) works fine.

            – dizcza
            6 hours ago











          • It sounds to me like you're confused about what a heredoc is - it's not supposed to represent a file path. You can read more about "here documents" in the Bash manual: gnu.org/software/bash/manual/bashref.html#Here-Documents

            – igal
            6 hours ago











          • Oh, I see, I didn't know that. Thank you.

            – dizcza
            6 hours ago
















          1














          Try using the -o flag, to only print matching strings, e.g.



          $ cat <<HEREDOC | grep -Po 'w+'
          www.wifi.in.ua
          YI-HondBrychka
          HEREDOC

          www
          wifi
          in
          ua
          YI
          HondBrychka





          share|improve this answer
























          • Did you mean grep -Po 'w+' HEREDOC? This works, thank you.

            – dizcza
            7 hours ago











          • No, I meant what I wrote there - it works for me if I copy-paste it. In practice, I imagine you would use a file instead of a heredoc string, and it would look like grep -Po 'w+' /path/to/file.

            – igal
            6 hours ago











          • Substituting HEREDOC for path to file, cat <<HEREDOC | grep -Po 'w+' doesn't work for me while cat HEREDOC | grep -Po 'w+' (or better grep -Po 'w+' HEREDOC) works fine.

            – dizcza
            6 hours ago











          • It sounds to me like you're confused about what a heredoc is - it's not supposed to represent a file path. You can read more about "here documents" in the Bash manual: gnu.org/software/bash/manual/bashref.html#Here-Documents

            – igal
            6 hours ago











          • Oh, I see, I didn't know that. Thank you.

            – dizcza
            6 hours ago














          1












          1








          1







          Try using the -o flag, to only print matching strings, e.g.



          $ cat <<HEREDOC | grep -Po 'w+'
          www.wifi.in.ua
          YI-HondBrychka
          HEREDOC

          www
          wifi
          in
          ua
          YI
          HondBrychka





          share|improve this answer













          Try using the -o flag, to only print matching strings, e.g.



          $ cat <<HEREDOC | grep -Po 'w+'
          www.wifi.in.ua
          YI-HondBrychka
          HEREDOC

          www
          wifi
          in
          ua
          YI
          HondBrychka






          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered 7 hours ago









          igaligal

          5,7061535




          5,7061535













          • Did you mean grep -Po 'w+' HEREDOC? This works, thank you.

            – dizcza
            7 hours ago











          • No, I meant what I wrote there - it works for me if I copy-paste it. In practice, I imagine you would use a file instead of a heredoc string, and it would look like grep -Po 'w+' /path/to/file.

            – igal
            6 hours ago











          • Substituting HEREDOC for path to file, cat <<HEREDOC | grep -Po 'w+' doesn't work for me while cat HEREDOC | grep -Po 'w+' (or better grep -Po 'w+' HEREDOC) works fine.

            – dizcza
            6 hours ago











          • It sounds to me like you're confused about what a heredoc is - it's not supposed to represent a file path. You can read more about "here documents" in the Bash manual: gnu.org/software/bash/manual/bashref.html#Here-Documents

            – igal
            6 hours ago











          • Oh, I see, I didn't know that. Thank you.

            – dizcza
            6 hours ago



















          • Did you mean grep -Po 'w+' HEREDOC? This works, thank you.

            – dizcza
            7 hours ago











          • No, I meant what I wrote there - it works for me if I copy-paste it. In practice, I imagine you would use a file instead of a heredoc string, and it would look like grep -Po 'w+' /path/to/file.

            – igal
            6 hours ago











          • Substituting HEREDOC for path to file, cat <<HEREDOC | grep -Po 'w+' doesn't work for me while cat HEREDOC | grep -Po 'w+' (or better grep -Po 'w+' HEREDOC) works fine.

            – dizcza
            6 hours ago











          • It sounds to me like you're confused about what a heredoc is - it's not supposed to represent a file path. You can read more about "here documents" in the Bash manual: gnu.org/software/bash/manual/bashref.html#Here-Documents

            – igal
            6 hours ago











          • Oh, I see, I didn't know that. Thank you.

            – dizcza
            6 hours ago

















          Did you mean grep -Po 'w+' HEREDOC? This works, thank you.

          – dizcza
          7 hours ago





          Did you mean grep -Po 'w+' HEREDOC? This works, thank you.

          – dizcza
          7 hours ago













          No, I meant what I wrote there - it works for me if I copy-paste it. In practice, I imagine you would use a file instead of a heredoc string, and it would look like grep -Po 'w+' /path/to/file.

          – igal
          6 hours ago





          No, I meant what I wrote there - it works for me if I copy-paste it. In practice, I imagine you would use a file instead of a heredoc string, and it would look like grep -Po 'w+' /path/to/file.

          – igal
          6 hours ago













          Substituting HEREDOC for path to file, cat <<HEREDOC | grep -Po 'w+' doesn't work for me while cat HEREDOC | grep -Po 'w+' (or better grep -Po 'w+' HEREDOC) works fine.

          – dizcza
          6 hours ago





          Substituting HEREDOC for path to file, cat <<HEREDOC | grep -Po 'w+' doesn't work for me while cat HEREDOC | grep -Po 'w+' (or better grep -Po 'w+' HEREDOC) works fine.

          – dizcza
          6 hours ago













          It sounds to me like you're confused about what a heredoc is - it's not supposed to represent a file path. You can read more about "here documents" in the Bash manual: gnu.org/software/bash/manual/bashref.html#Here-Documents

          – igal
          6 hours ago





          It sounds to me like you're confused about what a heredoc is - it's not supposed to represent a file path. You can read more about "here documents" in the Bash manual: gnu.org/software/bash/manual/bashref.html#Here-Documents

          – igal
          6 hours ago













          Oh, I see, I didn't know that. Thank you.

          – dizcza
          6 hours ago





          Oh, I see, I didn't know that. Thank you.

          – dizcza
          6 hours ago













          0














          Replacing all matches of W with newlines, using Perl (from which the W expression originated):



          $ perl -pe '$_ =~ s/W/n/g' <file
          www
          wifi
          in
          ua
          YI
          HondBrychka


          Or, more in line with the actual wording of the question:



          $ perl -pe '$_ = join("n", split(/W/)) . "n"' <file
          www
          wifi
          in
          ua
          YI
          HondBrychka


          Expressing the PCRE W as the ERE [^[:alnum:]] and using GNU awk:



          awk -v RS='[^[:alnum:]]' 1 file


          The 1 is short for '{ print }' and this sets the input record separator to any W character. The records are then printed on individual lines.



          Or with GNU sed:



          sed 's/[^[:alnum:]]/n/g' file


          With tr, it becomes



          $ tr -c '[:alnum:]' 'n' <file
          www
          wifi
          in
          ua
          YI
          HondBrychka


          where -c makes it replace each character that is not an [:alnum:] with a newline.






          share|improve this answer


























          • The last solution has an artifact of empty new lines. Compare tr -c '[:alpha:]' 'n' < HEREDOC with grep -Po '[a-zA-Z]+' HEREDOC if we add "Zhenek_Lebed98" to the input file HEREDOC.

            – dizcza
            6 hours ago











          • @dizcza Add -s to the command line. It will make it squeeze multiple newlines into one. But on the other hand, you do have an empty word between 9 and 8.

            – Kusalananda
            6 hours ago













          • Now it works as well.

            – dizcza
            6 hours ago
















          0














          Replacing all matches of W with newlines, using Perl (from which the W expression originated):



          $ perl -pe '$_ =~ s/W/n/g' <file
          www
          wifi
          in
          ua
          YI
          HondBrychka


          Or, more in line with the actual wording of the question:



          $ perl -pe '$_ = join("n", split(/W/)) . "n"' <file
          www
          wifi
          in
          ua
          YI
          HondBrychka


          Expressing the PCRE W as the ERE [^[:alnum:]] and using GNU awk:



          awk -v RS='[^[:alnum:]]' 1 file


          The 1 is short for '{ print }' and this sets the input record separator to any W character. The records are then printed on individual lines.



          Or with GNU sed:



          sed 's/[^[:alnum:]]/n/g' file


          With tr, it becomes



          $ tr -c '[:alnum:]' 'n' <file
          www
          wifi
          in
          ua
          YI
          HondBrychka


          where -c makes it replace each character that is not an [:alnum:] with a newline.






          share|improve this answer


























          • The last solution has an artifact of empty new lines. Compare tr -c '[:alpha:]' 'n' < HEREDOC with grep -Po '[a-zA-Z]+' HEREDOC if we add "Zhenek_Lebed98" to the input file HEREDOC.

            – dizcza
            6 hours ago











          • @dizcza Add -s to the command line. It will make it squeeze multiple newlines into one. But on the other hand, you do have an empty word between 9 and 8.

            – Kusalananda
            6 hours ago













          • Now it works as well.

            – dizcza
            6 hours ago














          0












          0








          0







          Replacing all matches of W with newlines, using Perl (from which the W expression originated):



          $ perl -pe '$_ =~ s/W/n/g' <file
          www
          wifi
          in
          ua
          YI
          HondBrychka


          Or, more in line with the actual wording of the question:



          $ perl -pe '$_ = join("n", split(/W/)) . "n"' <file
          www
          wifi
          in
          ua
          YI
          HondBrychka


          Expressing the PCRE W as the ERE [^[:alnum:]] and using GNU awk:



          awk -v RS='[^[:alnum:]]' 1 file


          The 1 is short for '{ print }' and this sets the input record separator to any W character. The records are then printed on individual lines.



          Or with GNU sed:



          sed 's/[^[:alnum:]]/n/g' file


          With tr, it becomes



          $ tr -c '[:alnum:]' 'n' <file
          www
          wifi
          in
          ua
          YI
          HondBrychka


          where -c makes it replace each character that is not an [:alnum:] with a newline.






          share|improve this answer















          Replacing all matches of W with newlines, using Perl (from which the W expression originated):



          $ perl -pe '$_ =~ s/W/n/g' <file
          www
          wifi
          in
          ua
          YI
          HondBrychka


          Or, more in line with the actual wording of the question:



          $ perl -pe '$_ = join("n", split(/W/)) . "n"' <file
          www
          wifi
          in
          ua
          YI
          HondBrychka


          Expressing the PCRE W as the ERE [^[:alnum:]] and using GNU awk:



          awk -v RS='[^[:alnum:]]' 1 file


          The 1 is short for '{ print }' and this sets the input record separator to any W character. The records are then printed on individual lines.



          Or with GNU sed:



          sed 's/[^[:alnum:]]/n/g' file


          With tr, it becomes



          $ tr -c '[:alnum:]' 'n' <file
          www
          wifi
          in
          ua
          YI
          HondBrychka


          where -c makes it replace each character that is not an [:alnum:] with a newline.







          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited 6 hours ago

























          answered 7 hours ago









          KusalanandaKusalananda

          135k17255418




          135k17255418













          • The last solution has an artifact of empty new lines. Compare tr -c '[:alpha:]' 'n' < HEREDOC with grep -Po '[a-zA-Z]+' HEREDOC if we add "Zhenek_Lebed98" to the input file HEREDOC.

            – dizcza
            6 hours ago











          • @dizcza Add -s to the command line. It will make it squeeze multiple newlines into one. But on the other hand, you do have an empty word between 9 and 8.

            – Kusalananda
            6 hours ago













          • Now it works as well.

            – dizcza
            6 hours ago



















          • The last solution has an artifact of empty new lines. Compare tr -c '[:alpha:]' 'n' < HEREDOC with grep -Po '[a-zA-Z]+' HEREDOC if we add "Zhenek_Lebed98" to the input file HEREDOC.

            – dizcza
            6 hours ago











          • @dizcza Add -s to the command line. It will make it squeeze multiple newlines into one. But on the other hand, you do have an empty word between 9 and 8.

            – Kusalananda
            6 hours ago













          • Now it works as well.

            – dizcza
            6 hours ago

















          The last solution has an artifact of empty new lines. Compare tr -c '[:alpha:]' 'n' < HEREDOC with grep -Po '[a-zA-Z]+' HEREDOC if we add "Zhenek_Lebed98" to the input file HEREDOC.

          – dizcza
          6 hours ago





          The last solution has an artifact of empty new lines. Compare tr -c '[:alpha:]' 'n' < HEREDOC with grep -Po '[a-zA-Z]+' HEREDOC if we add "Zhenek_Lebed98" to the input file HEREDOC.

          – dizcza
          6 hours ago













          @dizcza Add -s to the command line. It will make it squeeze multiple newlines into one. But on the other hand, you do have an empty word between 9 and 8.

          – Kusalananda
          6 hours ago







          @dizcza Add -s to the command line. It will make it squeeze multiple newlines into one. But on the other hand, you do have an empty word between 9 and 8.

          – Kusalananda
          6 hours ago















          Now it works as well.

          – dizcza
          6 hours ago





          Now it works as well.

          – dizcza
          6 hours ago


















          draft saved

          draft discarded




















































          Thanks for contributing an answer to Unix & Linux Stack Exchange!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f505512%2fsplit-file-lines-by-regex-delimeter%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          Loup dans la culture

          How to solve the problem of ntp “Unable to contact time server” from KDE?

          ASUS Zenbook UX433/UX333 — Configure Touchpad-embedded numpad on Linux