split file lines by regex delimeter
I want to split each line from input file by a non-alphanumeric regex W
and print all the split chunks in the output file like so:
Input file:
www.wifi.in.ua
YI-HondBrychka
Output file:
www
wifi
in
ua
YI
HondBrynchka
regular-expression
add a comment |
I want to split each line from input file by a non-alphanumeric regex W
and print all the split chunks in the output file like so:
Input file:
www.wifi.in.ua
YI-HondBrychka
Output file:
www
wifi
in
ua
YI
HondBrynchka
regular-expression
add a comment |
I want to split each line from input file by a non-alphanumeric regex W
and print all the split chunks in the output file like so:
Input file:
www.wifi.in.ua
YI-HondBrychka
Output file:
www
wifi
in
ua
YI
HondBrynchka
regular-expression
I want to split each line from input file by a non-alphanumeric regex W
and print all the split chunks in the output file like so:
Input file:
www.wifi.in.ua
YI-HondBrychka
Output file:
www
wifi
in
ua
YI
HondBrynchka
regular-expression
regular-expression
asked 7 hours ago
dizczadizcza
103
103
add a comment |
add a comment |
2 Answers
2
active
oldest
votes
Try using the -o
flag, to only print matching strings, e.g.
$ cat <<HEREDOC | grep -Po 'w+'
www.wifi.in.ua
YI-HondBrychka
HEREDOC
www
wifi
in
ua
YI
HondBrychka
Did you meangrep -Po 'w+' HEREDOC
? This works, thank you.
– dizcza
7 hours ago
No, I meant what I wrote there - it works for me if I copy-paste it. In practice, I imagine you would use a file instead of a heredoc string, and it would look likegrep -Po 'w+' /path/to/file
.
– igal
6 hours ago
Substituting HEREDOC for path to file,cat <<HEREDOC | grep -Po 'w+'
doesn't work for me whilecat HEREDOC | grep -Po 'w+'
(or bettergrep -Po 'w+' HEREDOC
) works fine.
– dizcza
6 hours ago
It sounds to me like you're confused about what a heredoc is - it's not supposed to represent a file path. You can read more about "here documents" in the Bash manual: gnu.org/software/bash/manual/bashref.html#Here-Documents
– igal
6 hours ago
Oh, I see, I didn't know that. Thank you.
– dizcza
6 hours ago
|
show 1 more comment
Replacing all matches of W
with newlines, using Perl (from which the W
expression originated):
$ perl -pe '$_ =~ s/W/n/g' <file
www
wifi
in
ua
YI
HondBrychka
Or, more in line with the actual wording of the question:
$ perl -pe '$_ = join("n", split(/W/)) . "n"' <file
www
wifi
in
ua
YI
HondBrychka
Expressing the PCRE W
as the ERE [^[:alnum:]]
and using GNU awk
:
awk -v RS='[^[:alnum:]]' 1 file
The 1
is short for '{ print }'
and this sets the input record separator to any W
character. The records are then printed on individual lines.
Or with GNU sed
:
sed 's/[^[:alnum:]]/n/g' file
With tr
, it becomes
$ tr -c '[:alnum:]' 'n' <file
www
wifi
in
ua
YI
HondBrychka
where -c
makes it replace each character that is not an [:alnum:]
with a newline.
The last solution has an artifact of empty new lines. Comparetr -c '[:alpha:]' 'n' < HEREDOC
withgrep -Po '[a-zA-Z]+' HEREDOC
if we add "Zhenek_Lebed98" to the input file HEREDOC.
– dizcza
6 hours ago
@dizcza Add-s
to the command line. It will make it squeeze multiple newlines into one. But on the other hand, you do have an empty word between9
and8
.
– Kusalananda
6 hours ago
Now it works as well.
– dizcza
6 hours ago
add a comment |
Your Answer
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "106"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f505512%2fsplit-file-lines-by-regex-delimeter%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
Try using the -o
flag, to only print matching strings, e.g.
$ cat <<HEREDOC | grep -Po 'w+'
www.wifi.in.ua
YI-HondBrychka
HEREDOC
www
wifi
in
ua
YI
HondBrychka
Did you meangrep -Po 'w+' HEREDOC
? This works, thank you.
– dizcza
7 hours ago
No, I meant what I wrote there - it works for me if I copy-paste it. In practice, I imagine you would use a file instead of a heredoc string, and it would look likegrep -Po 'w+' /path/to/file
.
– igal
6 hours ago
Substituting HEREDOC for path to file,cat <<HEREDOC | grep -Po 'w+'
doesn't work for me whilecat HEREDOC | grep -Po 'w+'
(or bettergrep -Po 'w+' HEREDOC
) works fine.
– dizcza
6 hours ago
It sounds to me like you're confused about what a heredoc is - it's not supposed to represent a file path. You can read more about "here documents" in the Bash manual: gnu.org/software/bash/manual/bashref.html#Here-Documents
– igal
6 hours ago
Oh, I see, I didn't know that. Thank you.
– dizcza
6 hours ago
|
show 1 more comment
Try using the -o
flag, to only print matching strings, e.g.
$ cat <<HEREDOC | grep -Po 'w+'
www.wifi.in.ua
YI-HondBrychka
HEREDOC
www
wifi
in
ua
YI
HondBrychka
Did you meangrep -Po 'w+' HEREDOC
? This works, thank you.
– dizcza
7 hours ago
No, I meant what I wrote there - it works for me if I copy-paste it. In practice, I imagine you would use a file instead of a heredoc string, and it would look likegrep -Po 'w+' /path/to/file
.
– igal
6 hours ago
Substituting HEREDOC for path to file,cat <<HEREDOC | grep -Po 'w+'
doesn't work for me whilecat HEREDOC | grep -Po 'w+'
(or bettergrep -Po 'w+' HEREDOC
) works fine.
– dizcza
6 hours ago
It sounds to me like you're confused about what a heredoc is - it's not supposed to represent a file path. You can read more about "here documents" in the Bash manual: gnu.org/software/bash/manual/bashref.html#Here-Documents
– igal
6 hours ago
Oh, I see, I didn't know that. Thank you.
– dizcza
6 hours ago
|
show 1 more comment
Try using the -o
flag, to only print matching strings, e.g.
$ cat <<HEREDOC | grep -Po 'w+'
www.wifi.in.ua
YI-HondBrychka
HEREDOC
www
wifi
in
ua
YI
HondBrychka
Try using the -o
flag, to only print matching strings, e.g.
$ cat <<HEREDOC | grep -Po 'w+'
www.wifi.in.ua
YI-HondBrychka
HEREDOC
www
wifi
in
ua
YI
HondBrychka
answered 7 hours ago
igaligal
5,7061535
5,7061535
Did you meangrep -Po 'w+' HEREDOC
? This works, thank you.
– dizcza
7 hours ago
No, I meant what I wrote there - it works for me if I copy-paste it. In practice, I imagine you would use a file instead of a heredoc string, and it would look likegrep -Po 'w+' /path/to/file
.
– igal
6 hours ago
Substituting HEREDOC for path to file,cat <<HEREDOC | grep -Po 'w+'
doesn't work for me whilecat HEREDOC | grep -Po 'w+'
(or bettergrep -Po 'w+' HEREDOC
) works fine.
– dizcza
6 hours ago
It sounds to me like you're confused about what a heredoc is - it's not supposed to represent a file path. You can read more about "here documents" in the Bash manual: gnu.org/software/bash/manual/bashref.html#Here-Documents
– igal
6 hours ago
Oh, I see, I didn't know that. Thank you.
– dizcza
6 hours ago
|
show 1 more comment
Did you meangrep -Po 'w+' HEREDOC
? This works, thank you.
– dizcza
7 hours ago
No, I meant what I wrote there - it works for me if I copy-paste it. In practice, I imagine you would use a file instead of a heredoc string, and it would look likegrep -Po 'w+' /path/to/file
.
– igal
6 hours ago
Substituting HEREDOC for path to file,cat <<HEREDOC | grep -Po 'w+'
doesn't work for me whilecat HEREDOC | grep -Po 'w+'
(or bettergrep -Po 'w+' HEREDOC
) works fine.
– dizcza
6 hours ago
It sounds to me like you're confused about what a heredoc is - it's not supposed to represent a file path. You can read more about "here documents" in the Bash manual: gnu.org/software/bash/manual/bashref.html#Here-Documents
– igal
6 hours ago
Oh, I see, I didn't know that. Thank you.
– dizcza
6 hours ago
Did you mean
grep -Po 'w+' HEREDOC
? This works, thank you.– dizcza
7 hours ago
Did you mean
grep -Po 'w+' HEREDOC
? This works, thank you.– dizcza
7 hours ago
No, I meant what I wrote there - it works for me if I copy-paste it. In practice, I imagine you would use a file instead of a heredoc string, and it would look like
grep -Po 'w+' /path/to/file
.– igal
6 hours ago
No, I meant what I wrote there - it works for me if I copy-paste it. In practice, I imagine you would use a file instead of a heredoc string, and it would look like
grep -Po 'w+' /path/to/file
.– igal
6 hours ago
Substituting HEREDOC for path to file,
cat <<HEREDOC | grep -Po 'w+'
doesn't work for me while cat HEREDOC | grep -Po 'w+'
(or better grep -Po 'w+' HEREDOC
) works fine.– dizcza
6 hours ago
Substituting HEREDOC for path to file,
cat <<HEREDOC | grep -Po 'w+'
doesn't work for me while cat HEREDOC | grep -Po 'w+'
(or better grep -Po 'w+' HEREDOC
) works fine.– dizcza
6 hours ago
It sounds to me like you're confused about what a heredoc is - it's not supposed to represent a file path. You can read more about "here documents" in the Bash manual: gnu.org/software/bash/manual/bashref.html#Here-Documents
– igal
6 hours ago
It sounds to me like you're confused about what a heredoc is - it's not supposed to represent a file path. You can read more about "here documents" in the Bash manual: gnu.org/software/bash/manual/bashref.html#Here-Documents
– igal
6 hours ago
Oh, I see, I didn't know that. Thank you.
– dizcza
6 hours ago
Oh, I see, I didn't know that. Thank you.
– dizcza
6 hours ago
|
show 1 more comment
Replacing all matches of W
with newlines, using Perl (from which the W
expression originated):
$ perl -pe '$_ =~ s/W/n/g' <file
www
wifi
in
ua
YI
HondBrychka
Or, more in line with the actual wording of the question:
$ perl -pe '$_ = join("n", split(/W/)) . "n"' <file
www
wifi
in
ua
YI
HondBrychka
Expressing the PCRE W
as the ERE [^[:alnum:]]
and using GNU awk
:
awk -v RS='[^[:alnum:]]' 1 file
The 1
is short for '{ print }'
and this sets the input record separator to any W
character. The records are then printed on individual lines.
Or with GNU sed
:
sed 's/[^[:alnum:]]/n/g' file
With tr
, it becomes
$ tr -c '[:alnum:]' 'n' <file
www
wifi
in
ua
YI
HondBrychka
where -c
makes it replace each character that is not an [:alnum:]
with a newline.
The last solution has an artifact of empty new lines. Comparetr -c '[:alpha:]' 'n' < HEREDOC
withgrep -Po '[a-zA-Z]+' HEREDOC
if we add "Zhenek_Lebed98" to the input file HEREDOC.
– dizcza
6 hours ago
@dizcza Add-s
to the command line. It will make it squeeze multiple newlines into one. But on the other hand, you do have an empty word between9
and8
.
– Kusalananda
6 hours ago
Now it works as well.
– dizcza
6 hours ago
add a comment |
Replacing all matches of W
with newlines, using Perl (from which the W
expression originated):
$ perl -pe '$_ =~ s/W/n/g' <file
www
wifi
in
ua
YI
HondBrychka
Or, more in line with the actual wording of the question:
$ perl -pe '$_ = join("n", split(/W/)) . "n"' <file
www
wifi
in
ua
YI
HondBrychka
Expressing the PCRE W
as the ERE [^[:alnum:]]
and using GNU awk
:
awk -v RS='[^[:alnum:]]' 1 file
The 1
is short for '{ print }'
and this sets the input record separator to any W
character. The records are then printed on individual lines.
Or with GNU sed
:
sed 's/[^[:alnum:]]/n/g' file
With tr
, it becomes
$ tr -c '[:alnum:]' 'n' <file
www
wifi
in
ua
YI
HondBrychka
where -c
makes it replace each character that is not an [:alnum:]
with a newline.
The last solution has an artifact of empty new lines. Comparetr -c '[:alpha:]' 'n' < HEREDOC
withgrep -Po '[a-zA-Z]+' HEREDOC
if we add "Zhenek_Lebed98" to the input file HEREDOC.
– dizcza
6 hours ago
@dizcza Add-s
to the command line. It will make it squeeze multiple newlines into one. But on the other hand, you do have an empty word between9
and8
.
– Kusalananda
6 hours ago
Now it works as well.
– dizcza
6 hours ago
add a comment |
Replacing all matches of W
with newlines, using Perl (from which the W
expression originated):
$ perl -pe '$_ =~ s/W/n/g' <file
www
wifi
in
ua
YI
HondBrychka
Or, more in line with the actual wording of the question:
$ perl -pe '$_ = join("n", split(/W/)) . "n"' <file
www
wifi
in
ua
YI
HondBrychka
Expressing the PCRE W
as the ERE [^[:alnum:]]
and using GNU awk
:
awk -v RS='[^[:alnum:]]' 1 file
The 1
is short for '{ print }'
and this sets the input record separator to any W
character. The records are then printed on individual lines.
Or with GNU sed
:
sed 's/[^[:alnum:]]/n/g' file
With tr
, it becomes
$ tr -c '[:alnum:]' 'n' <file
www
wifi
in
ua
YI
HondBrychka
where -c
makes it replace each character that is not an [:alnum:]
with a newline.
Replacing all matches of W
with newlines, using Perl (from which the W
expression originated):
$ perl -pe '$_ =~ s/W/n/g' <file
www
wifi
in
ua
YI
HondBrychka
Or, more in line with the actual wording of the question:
$ perl -pe '$_ = join("n", split(/W/)) . "n"' <file
www
wifi
in
ua
YI
HondBrychka
Expressing the PCRE W
as the ERE [^[:alnum:]]
and using GNU awk
:
awk -v RS='[^[:alnum:]]' 1 file
The 1
is short for '{ print }'
and this sets the input record separator to any W
character. The records are then printed on individual lines.
Or with GNU sed
:
sed 's/[^[:alnum:]]/n/g' file
With tr
, it becomes
$ tr -c '[:alnum:]' 'n' <file
www
wifi
in
ua
YI
HondBrychka
where -c
makes it replace each character that is not an [:alnum:]
with a newline.
edited 6 hours ago
answered 7 hours ago
KusalanandaKusalananda
135k17255418
135k17255418
The last solution has an artifact of empty new lines. Comparetr -c '[:alpha:]' 'n' < HEREDOC
withgrep -Po '[a-zA-Z]+' HEREDOC
if we add "Zhenek_Lebed98" to the input file HEREDOC.
– dizcza
6 hours ago
@dizcza Add-s
to the command line. It will make it squeeze multiple newlines into one. But on the other hand, you do have an empty word between9
and8
.
– Kusalananda
6 hours ago
Now it works as well.
– dizcza
6 hours ago
add a comment |
The last solution has an artifact of empty new lines. Comparetr -c '[:alpha:]' 'n' < HEREDOC
withgrep -Po '[a-zA-Z]+' HEREDOC
if we add "Zhenek_Lebed98" to the input file HEREDOC.
– dizcza
6 hours ago
@dizcza Add-s
to the command line. It will make it squeeze multiple newlines into one. But on the other hand, you do have an empty word between9
and8
.
– Kusalananda
6 hours ago
Now it works as well.
– dizcza
6 hours ago
The last solution has an artifact of empty new lines. Compare
tr -c '[:alpha:]' 'n' < HEREDOC
with grep -Po '[a-zA-Z]+' HEREDOC
if we add "Zhenek_Lebed98" to the input file HEREDOC.– dizcza
6 hours ago
The last solution has an artifact of empty new lines. Compare
tr -c '[:alpha:]' 'n' < HEREDOC
with grep -Po '[a-zA-Z]+' HEREDOC
if we add "Zhenek_Lebed98" to the input file HEREDOC.– dizcza
6 hours ago
@dizcza Add
-s
to the command line. It will make it squeeze multiple newlines into one. But on the other hand, you do have an empty word between 9
and 8
.– Kusalananda
6 hours ago
@dizcza Add
-s
to the command line. It will make it squeeze multiple newlines into one. But on the other hand, you do have an empty word between 9
and 8
.– Kusalananda
6 hours ago
Now it works as well.
– dizcza
6 hours ago
Now it works as well.
– dizcza
6 hours ago
add a comment |
Thanks for contributing an answer to Unix & Linux Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f505512%2fsplit-file-lines-by-regex-delimeter%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown