split file lines by regex delimeter

I want to split each line from input file by a non-alphanumeric regex W and print all the split chunks in the output file like so:

Input file:

www.wifi.in.ua

YI-HondBrychka

Output file:

www

wifi

in

ua

YI

HondBrynchka

asked 7 hours ago

dizcza

103

add a comment |

I want to split each line from input file by a non-alphanumeric regex W and print all the split chunks in the output file like so:

Input file:

www.wifi.in.ua

YI-HondBrychka

Output file:

www

wifi

in

ua

YI

HondBrynchka

asked 7 hours ago

dizcza

103

add a comment |

I want to split each line from input file by a non-alphanumeric regex W and print all the split chunks in the output file like so:

Input file:

www.wifi.in.ua

YI-HondBrychka

Output file:

www

wifi

in

ua

YI

HondBrynchka

asked 7 hours ago

dizcza

103

I want to split each line from input file by a non-alphanumeric regex W and print all the split chunks in the output file like so:

Input file:

www.wifi.in.ua

YI-HondBrychka

Output file:

www

wifi

in

ua

YI

HondBrynchka

regular-expression

asked 7 hours ago

dizcza

103

asked 7 hours ago

dizcza

103

asked 7 hours ago

dizcza

103

asked 7 hours ago

dizcza

103

asked 7 hours ago

dizcza

103

add a comment |

2 Answers
2

active

oldest

votes

Try using the -o flag, to only print matching strings, e.g.

$ cat <<HEREDOC | grep -Po 'w+'

www.wifi.in.ua

YI-HondBrychka

HEREDOC



www

wifi

in

ua

YI

HondBrychka

answered 7 hours ago

igal

5,7061535

Did you mean grep -Po 'w+' HEREDOC? This works, thank you.

– dizcza
7 hours ago

No, I meant what I wrote there - it works for me if I copy-paste it. In practice, I imagine you would use a file instead of a heredoc string, and it would look like grep -Po 'w+' /path/to/file.

– igal
6 hours ago

Substituting HEREDOC for path to file, cat <<HEREDOC | grep -Po 'w+' doesn't work for me while cat HEREDOC | grep -Po 'w+' (or better grep -Po 'w+' HEREDOC) works fine.

– dizcza
6 hours ago

It sounds to me like you're confused about what a heredoc is - it's not supposed to represent a file path. You can read more about "here documents" in the Bash manual: gnu.org/software/bash/manual/bashref.html#Here-Documents

– igal
6 hours ago

Oh, I see, I didn't know that. Thank you.

– dizcza
6 hours ago

|
show 1 more comment

Replacing all matches of W with newlines, using Perl (from which the W expression originated):

$ perl -pe '$_ =~ s/W/n/g' <file

www

wifi

in

ua

YI

HondBrychka

Or, more in line with the actual wording of the question:

$ perl -pe '$_ = join("n", split(/W/)) . "n"' <file

www

wifi

in

ua

YI

HondBrychka

Expressing the PCRE W as the ERE [^[:alnum:]] and using GNU awk:

awk -v RS='[^[:alnum:]]' 1 file

The 1 is short for '{ print }' and this sets the input record separator to any W character. The records are then printed on individual lines.

Or with GNU sed:

sed 's/[^[:alnum:]]/n/g' file

With tr, it becomes

$ tr -c '[:alnum:]' 'n' <file

www

wifi

in

ua

YI

HondBrychka

where -c makes it replace each character that is not an [:alnum:] with a newline.

edited 6 hours ago

answered 7 hours ago

Kusalananda

135k17255418

The last solution has an artifact of empty new lines. Compare tr -c '[:alpha:]' 'n' < HEREDOC with grep -Po '[a-zA-Z]+' HEREDOC if we add "Zhenek_Lebed98" to the input file HEREDOC.

– dizcza
6 hours ago

@dizcza Add -s to the command line. It will make it squeeze multiple newlines into one. But on the other hand, you do have an empty word between 9 and 8.

– Kusalananda
6 hours ago

Now it works as well.

– dizcza
6 hours ago

add a comment |

Your Answer

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "106"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f505512%2fsplit-file-lines-by-regex-delimeter%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

Try using the -o flag, to only print matching strings, e.g.

$ cat <<HEREDOC | grep -Po 'w+'

www.wifi.in.ua

YI-HondBrychka

HEREDOC



www

wifi

in

ua

YI

HondBrychka

answered 7 hours ago

igal

5,7061535

Did you mean grep -Po 'w+' HEREDOC? This works, thank you.

– dizcza
7 hours ago

No, I meant what I wrote there - it works for me if I copy-paste it. In practice, I imagine you would use a file instead of a heredoc string, and it would look like grep -Po 'w+' /path/to/file.

– igal
6 hours ago

Substituting HEREDOC for path to file, cat <<HEREDOC | grep -Po 'w+' doesn't work for me while cat HEREDOC | grep -Po 'w+' (or better grep -Po 'w+' HEREDOC) works fine.

– dizcza
6 hours ago

It sounds to me like you're confused about what a heredoc is - it's not supposed to represent a file path. You can read more about "here documents" in the Bash manual: gnu.org/software/bash/manual/bashref.html#Here-Documents

– igal
6 hours ago

Oh, I see, I didn't know that. Thank you.

– dizcza
6 hours ago

|
show 1 more comment

Try using the -o flag, to only print matching strings, e.g.

$ cat <<HEREDOC | grep -Po 'w+'

www.wifi.in.ua

YI-HondBrychka

HEREDOC



www

wifi

in

ua

YI

HondBrychka

answered 7 hours ago

igal

5,7061535

Did you mean grep -Po 'w+' HEREDOC? This works, thank you.

– dizcza
7 hours ago

No, I meant what I wrote there - it works for me if I copy-paste it. In practice, I imagine you would use a file instead of a heredoc string, and it would look like grep -Po 'w+' /path/to/file.

– igal
6 hours ago

Substituting HEREDOC for path to file, cat <<HEREDOC | grep -Po 'w+' doesn't work for me while cat HEREDOC | grep -Po 'w+' (or better grep -Po 'w+' HEREDOC) works fine.

– dizcza
6 hours ago

It sounds to me like you're confused about what a heredoc is - it's not supposed to represent a file path. You can read more about "here documents" in the Bash manual: gnu.org/software/bash/manual/bashref.html#Here-Documents

– igal
6 hours ago

Oh, I see, I didn't know that. Thank you.

– dizcza
6 hours ago

|
show 1 more comment

Try using the -o flag, to only print matching strings, e.g.

$ cat <<HEREDOC | grep -Po 'w+'

www.wifi.in.ua

YI-HondBrychka

HEREDOC



www

wifi

in

ua

YI

HondBrychka

answered 7 hours ago

igal

5,7061535

Try using the -o flag, to only print matching strings, e.g.

$ cat <<HEREDOC | grep -Po 'w+'

www.wifi.in.ua

YI-HondBrychka

HEREDOC



www

wifi

in

ua

YI

HondBrychka

answered 7 hours ago

igal

5,7061535

answered 7 hours ago

igal

5,7061535

answered 7 hours ago

igal

5,7061535

answered 7 hours ago

igal

5,7061535

Did you mean grep -Po 'w+' HEREDOC? This works, thank you.

– dizcza
7 hours ago

No, I meant what I wrote there - it works for me if I copy-paste it. In practice, I imagine you would use a file instead of a heredoc string, and it would look like grep -Po 'w+' /path/to/file.

– igal
6 hours ago

Substituting HEREDOC for path to file, cat <<HEREDOC | grep -Po 'w+' doesn't work for me while cat HEREDOC | grep -Po 'w+' (or better grep -Po 'w+' HEREDOC) works fine.

– dizcza
6 hours ago

It sounds to me like you're confused about what a heredoc is - it's not supposed to represent a file path. You can read more about "here documents" in the Bash manual: gnu.org/software/bash/manual/bashref.html#Here-Documents

– igal
6 hours ago

Oh, I see, I didn't know that. Thank you.

– dizcza
6 hours ago

|
show 1 more comment

Did you mean grep -Po 'w+' HEREDOC? This works, thank you.

– dizcza
7 hours ago

No, I meant what I wrote there - it works for me if I copy-paste it. In practice, I imagine you would use a file instead of a heredoc string, and it would look like grep -Po 'w+' /path/to/file.

– igal
6 hours ago

Substituting HEREDOC for path to file, cat <<HEREDOC | grep -Po 'w+' doesn't work for me while cat HEREDOC | grep -Po 'w+' (or better grep -Po 'w+' HEREDOC) works fine.

– dizcza
6 hours ago

It sounds to me like you're confused about what a heredoc is - it's not supposed to represent a file path. You can read more about "here documents" in the Bash manual: gnu.org/software/bash/manual/bashref.html#Here-Documents

– igal
6 hours ago

Oh, I see, I didn't know that. Thank you.

– dizcza
6 hours ago

Did you mean grep -Po 'w+' HEREDOC? This works, thank you.

– dizcza
7 hours ago

No, I meant what I wrote there - it works for me if I copy-paste it. In practice, I imagine you would use a file instead of a heredoc string, and it would look like grep -Po 'w+' /path/to/file.

– igal
6 hours ago

Substituting HEREDOC for path to file, cat <<HEREDOC | grep -Po 'w+' doesn't work for me while cat HEREDOC | grep -Po 'w+' (or better grep -Po 'w+' HEREDOC) works fine.

– dizcza
6 hours ago

It sounds to me like you're confused about what a heredoc is - it's not supposed to represent a file path. You can read more about "here documents" in the Bash manual: gnu.org/software/bash/manual/bashref.html#Here-Documents

– igal
6 hours ago

Oh, I see, I didn't know that. Thank you.

– dizcza
6 hours ago

|
show 1 more comment

Replacing all matches of W with newlines, using Perl (from which the W expression originated):

$ perl -pe '$_ =~ s/W/n/g' <file

www

wifi

in

ua

YI

HondBrychka

Or, more in line with the actual wording of the question:

$ perl -pe '$_ = join("n", split(/W/)) . "n"' <file

www

wifi

in

ua

YI

HondBrychka

Expressing the PCRE W as the ERE [^[:alnum:]] and using GNU awk:

awk -v RS='[^[:alnum:]]' 1 file

The 1 is short for '{ print }' and this sets the input record separator to any W character. The records are then printed on individual lines.

Or with GNU sed:

sed 's/[^[:alnum:]]/n/g' file

With tr, it becomes

$ tr -c '[:alnum:]' 'n' <file

www

wifi

in

ua

YI

HondBrychka

where -c makes it replace each character that is not an [:alnum:] with a newline.

edited 6 hours ago

answered 7 hours ago

Kusalananda

135k17255418

The last solution has an artifact of empty new lines. Compare tr -c '[:alpha:]' 'n' < HEREDOC with grep -Po '[a-zA-Z]+' HEREDOC if we add "Zhenek_Lebed98" to the input file HEREDOC.

– dizcza
6 hours ago

@dizcza Add -s to the command line. It will make it squeeze multiple newlines into one. But on the other hand, you do have an empty word between 9 and 8.

– Kusalananda
6 hours ago

Now it works as well.

– dizcza
6 hours ago

add a comment |

Replacing all matches of W with newlines, using Perl (from which the W expression originated):

$ perl -pe '$_ =~ s/W/n/g' <file

www

wifi

in

ua

YI

HondBrychka

Or, more in line with the actual wording of the question:

$ perl -pe '$_ = join("n", split(/W/)) . "n"' <file

www

wifi

in

ua

YI

HondBrychka

Expressing the PCRE W as the ERE [^[:alnum:]] and using GNU awk:

awk -v RS='[^[:alnum:]]' 1 file

The 1 is short for '{ print }' and this sets the input record separator to any W character. The records are then printed on individual lines.

Or with GNU sed:

sed 's/[^[:alnum:]]/n/g' file

With tr, it becomes

$ tr -c '[:alnum:]' 'n' <file

www

wifi

in

ua

YI

HondBrychka

where -c makes it replace each character that is not an [:alnum:] with a newline.

edited 6 hours ago

answered 7 hours ago

Kusalananda

135k17255418

The last solution has an artifact of empty new lines. Compare tr -c '[:alpha:]' 'n' < HEREDOC with grep -Po '[a-zA-Z]+' HEREDOC if we add "Zhenek_Lebed98" to the input file HEREDOC.

– dizcza
6 hours ago

@dizcza Add -s to the command line. It will make it squeeze multiple newlines into one. But on the other hand, you do have an empty word between 9 and 8.

– Kusalananda
6 hours ago

Now it works as well.

– dizcza
6 hours ago

add a comment |

Replacing all matches of W with newlines, using Perl (from which the W expression originated):

$ perl -pe '$_ =~ s/W/n/g' <file

www

wifi

in

ua

YI

HondBrychka

Or, more in line with the actual wording of the question:

$ perl -pe '$_ = join("n", split(/W/)) . "n"' <file

www

wifi

in

ua

YI

HondBrychka

Expressing the PCRE W as the ERE [^[:alnum:]] and using GNU awk:

awk -v RS='[^[:alnum:]]' 1 file

The 1 is short for '{ print }' and this sets the input record separator to any W character. The records are then printed on individual lines.

Or with GNU sed:

sed 's/[^[:alnum:]]/n/g' file

With tr, it becomes

$ tr -c '[:alnum:]' 'n' <file

www

wifi

in

ua

YI

HondBrychka

where -c makes it replace each character that is not an [:alnum:] with a newline.

edited 6 hours ago

answered 7 hours ago

Kusalananda

135k17255418

Replacing all matches of W with newlines, using Perl (from which the W expression originated):

$ perl -pe '$_ =~ s/W/n/g' <file

www

wifi

in

ua

YI

HondBrychka

Or, more in line with the actual wording of the question:

$ perl -pe '$_ = join("n", split(/W/)) . "n"' <file

www

wifi

in

ua

YI

HondBrychka

Expressing the PCRE W as the ERE [^[:alnum:]] and using GNU awk:

awk -v RS='[^[:alnum:]]' 1 file

The 1 is short for '{ print }' and this sets the input record separator to any W character. The records are then printed on individual lines.

Or with GNU sed:

sed 's/[^[:alnum:]]/n/g' file

With tr, it becomes

$ tr -c '[:alnum:]' 'n' <file

www

wifi

in

ua

YI

HondBrychka

where -c makes it replace each character that is not an [:alnum:] with a newline.

edited 6 hours ago

answered 7 hours ago

Kusalananda

135k17255418

edited 6 hours ago

answered 7 hours ago

Kusalananda

135k17255418

answered 7 hours ago

Kusalananda

135k17255418

answered 7 hours ago

Kusalananda

135k17255418

The last solution has an artifact of empty new lines. Compare tr -c '[:alpha:]' 'n' < HEREDOC with grep -Po '[a-zA-Z]+' HEREDOC if we add "Zhenek_Lebed98" to the input file HEREDOC.

– dizcza
6 hours ago

@dizcza Add -s to the command line. It will make it squeeze multiple newlines into one. But on the other hand, you do have an empty word between 9 and 8.

– Kusalananda
6 hours ago

Now it works as well.

– dizcza
6 hours ago

add a comment |

The last solution has an artifact of empty new lines. Compare tr -c '[:alpha:]' 'n' < HEREDOC with grep -Po '[a-zA-Z]+' HEREDOC if we add "Zhenek_Lebed98" to the input file HEREDOC.

– dizcza
6 hours ago

@dizcza Add -s to the command line. It will make it squeeze multiple newlines into one. But on the other hand, you do have an empty word between 9 and 8.

– Kusalananda
6 hours ago

Now it works as well.

– dizcza
6 hours ago

The last solution has an artifact of empty new lines. Compare tr -c '[:alpha:]' 'n' < HEREDOC with grep -Po '[a-zA-Z]+' HEREDOC if we add "Zhenek_Lebed98" to the input file HEREDOC.

– dizcza
6 hours ago

@dizcza Add -s to the command line. It will make it squeeze multiple newlines into one. But on the other hand, you do have an empty word between 9 and 8.

– Kusalananda
6 hours ago

Now it works as well.

– dizcza
6 hours ago

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Unix & Linux Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Cdtjkyj