Grep doesn't work when trying to match a word from a file in a second file
I've got 2 files, both with numerous lines that only contain one number. I'm trying to see if any number from file1 matches a number in file2. This is what I tried, and for some reason it doesn't work:
for i in $(cat file1); do grep ${i} file2; done
Fore reference here is data from file1 and file2
file1 file2
2134 1251
2135 5626
5342 4327
6456 8453
3413 4537
4525 3533
2347 5738
1235 1235
7453 3462
So shouldn't this command take each line from file 1 and grep it against the whole of file2? In that case, shouldn't a match be printed on screen?
bash grep
add a comment |
I've got 2 files, both with numerous lines that only contain one number. I'm trying to see if any number from file1 matches a number in file2. This is what I tried, and for some reason it doesn't work:
for i in $(cat file1); do grep ${i} file2; done
Fore reference here is data from file1 and file2
file1 file2
2134 1251
2135 5626
5342 4327
6456 8453
3413 4537
4525 3533
2347 5738
1235 1235
7453 3462
So shouldn't this command take each line from file 1 and grep it against the whole of file2? In that case, shouldn't a match be printed on screen?
bash grep
It should - but you would probably be advised to use something likegrep -Fwf file1 file2
instead
– steeldriver
7 hours ago
It should but it consistently doesn't, even if I do it on the same file likefor i in $(cat file1); do grep ${i} file1; done
it still doesn't work. I'll try your advice
– user323587
7 hours ago
Notice: I just tryed your code and works for me. Is there any chance that file1 contains hidden characters? ... may be or r or tabs?
– Juan
7 hours ago
if what you really wants is to compare the files, you might want to usesort
,uniq
anddiff
(orkompare
ork3diff
or any other file comparison tool)
– Juan
7 hours ago
With those columns of numbers in two files, and that command, I get the1235
as output, it seems to be the lone duplicate. In other words, I can't see an issue with the result here. Of course if the data is broken, like CRLF line endings infile1
but not infile2
, then you'd have problems.
– ilkkachu
5 hours ago
add a comment |
I've got 2 files, both with numerous lines that only contain one number. I'm trying to see if any number from file1 matches a number in file2. This is what I tried, and for some reason it doesn't work:
for i in $(cat file1); do grep ${i} file2; done
Fore reference here is data from file1 and file2
file1 file2
2134 1251
2135 5626
5342 4327
6456 8453
3413 4537
4525 3533
2347 5738
1235 1235
7453 3462
So shouldn't this command take each line from file 1 and grep it against the whole of file2? In that case, shouldn't a match be printed on screen?
bash grep
I've got 2 files, both with numerous lines that only contain one number. I'm trying to see if any number from file1 matches a number in file2. This is what I tried, and for some reason it doesn't work:
for i in $(cat file1); do grep ${i} file2; done
Fore reference here is data from file1 and file2
file1 file2
2134 1251
2135 5626
5342 4327
6456 8453
3413 4537
4525 3533
2347 5738
1235 1235
7453 3462
So shouldn't this command take each line from file 1 and grep it against the whole of file2? In that case, shouldn't a match be printed on screen?
bash grep
bash grep
edited 7 hours ago
Rui F Ribeiro
41.4k1481140
41.4k1481140
asked 7 hours ago
user323587user323587
132
132
It should - but you would probably be advised to use something likegrep -Fwf file1 file2
instead
– steeldriver
7 hours ago
It should but it consistently doesn't, even if I do it on the same file likefor i in $(cat file1); do grep ${i} file1; done
it still doesn't work. I'll try your advice
– user323587
7 hours ago
Notice: I just tryed your code and works for me. Is there any chance that file1 contains hidden characters? ... may be or r or tabs?
– Juan
7 hours ago
if what you really wants is to compare the files, you might want to usesort
,uniq
anddiff
(orkompare
ork3diff
or any other file comparison tool)
– Juan
7 hours ago
With those columns of numbers in two files, and that command, I get the1235
as output, it seems to be the lone duplicate. In other words, I can't see an issue with the result here. Of course if the data is broken, like CRLF line endings infile1
but not infile2
, then you'd have problems.
– ilkkachu
5 hours ago
add a comment |
It should - but you would probably be advised to use something likegrep -Fwf file1 file2
instead
– steeldriver
7 hours ago
It should but it consistently doesn't, even if I do it on the same file likefor i in $(cat file1); do grep ${i} file1; done
it still doesn't work. I'll try your advice
– user323587
7 hours ago
Notice: I just tryed your code and works for me. Is there any chance that file1 contains hidden characters? ... may be or r or tabs?
– Juan
7 hours ago
if what you really wants is to compare the files, you might want to usesort
,uniq
anddiff
(orkompare
ork3diff
or any other file comparison tool)
– Juan
7 hours ago
With those columns of numbers in two files, and that command, I get the1235
as output, it seems to be the lone duplicate. In other words, I can't see an issue with the result here. Of course if the data is broken, like CRLF line endings infile1
but not infile2
, then you'd have problems.
– ilkkachu
5 hours ago
It should - but you would probably be advised to use something like
grep -Fwf file1 file2
instead– steeldriver
7 hours ago
It should - but you would probably be advised to use something like
grep -Fwf file1 file2
instead– steeldriver
7 hours ago
It should but it consistently doesn't, even if I do it on the same file like
for i in $(cat file1); do grep ${i} file1; done
it still doesn't work. I'll try your advice– user323587
7 hours ago
It should but it consistently doesn't, even if I do it on the same file like
for i in $(cat file1); do grep ${i} file1; done
it still doesn't work. I'll try your advice– user323587
7 hours ago
Notice: I just tryed your code and works for me. Is there any chance that file1 contains hidden characters? ... may be or r or tabs?
– Juan
7 hours ago
Notice: I just tryed your code and works for me. Is there any chance that file1 contains hidden characters? ... may be or r or tabs?
– Juan
7 hours ago
if what you really wants is to compare the files, you might want to use
sort
, uniq
and diff
(or kompare
or k3diff
or any other file comparison tool)– Juan
7 hours ago
if what you really wants is to compare the files, you might want to use
sort
, uniq
and diff
(or kompare
or k3diff
or any other file comparison tool)– Juan
7 hours ago
With those columns of numbers in two files, and that command, I get the
1235
as output, it seems to be the lone duplicate. In other words, I can't see an issue with the result here. Of course if the data is broken, like CRLF line endings in file1
but not in file2
, then you'd have problems.– ilkkachu
5 hours ago
With those columns of numbers in two files, and that command, I get the
1235
as output, it seems to be the lone duplicate. In other words, I can't see an issue with the result here. Of course if the data is broken, like CRLF line endings in file1
but not in file2
, then you'd have problems.– ilkkachu
5 hours ago
add a comment |
2 Answers
2
active
oldest
votes
Given two ordinary Unix text files, your shell loop prints
1235
since this is the line that occurs in both files. If it does not, then one of your files may be a DOS text file. You can convert DOS text files into Unix text files with the dos2unix
utility.
There is nothing major wrong with your loop given the type of data that you have, apart from the fact that it calls grep
once for every line in file1
. It also would match substrings, for example 100
in 1001
, and it would, if any line in file1
contained spaces or tabs, split these lines into multiple words (due to the for i in $(cat ...)
where the $(cat ...)
is unquoted).
If you want to solve your issue this way (with a loop), you would better do
while IFS= read -r word; do
grep -xF -e "$word" file2
done <file1
The -x
and -F
are explained later in my answer, and -e
signifies that the next argument is the pattern to match with (otherwise, it may be taken as a command line option if it starts with a dash (-
).
This would still execute grep
once for each line in file1
, but it would do it correctly.
To extract lines in file2
that exactly correspond to line in file1
, without using a shell loop, you would use
$ grep -xF -f file1 file2
1235
This is assuming that file1
contains a reasonable number of lines, but not too many ("too many" will depend on the amount of memory that you have).
The command uses grep
with -x
, which forces matches across full lines only (no substring matches), and with -F
which changes grep
to do string comparisons rather than regular expression matches.
The -f file1
instructs grep
to read the patterns (the strings to match with) from file1
.
For really massive amounts of data, it would be hugely inefficient to use grep
though. Instead, for this task and with this type of data (single words on individual lines), it would be better to do a relational join operation between the files:
$ join file1 file2
1235
This would, assuming that both files are lexicographically sorted, return the numbers that are the same between both files.
Using comm
:
$ comm -1 -2 file1 file2
1235
comm
also compares sorted files and can easily handle very large datasets. It prints three columns by default:
- lines that occur in the first file only
- lines that occur in the second file only
- lines that occurs in both files
With -1
we turn off the output of the first column, and with -2
we disable the second column, leaving comm
to only output the lines that are the same in both files.
add a comment |
You simply need to use grep -f file1 file2
OR you may also use cat file1 | grep -f /dev/stdin file2
1
Thank you for contributing. Please note that a) a good answer explains what you do, so others can not just use it, but learn from it and b) without specifying-x
or-w
to grep you can get unwanted results if not all numbers are four-digit numbers like in the example (like234
infile1
ould match1234
infile2
). That was probably the reason for someone to downvote your answer (sadly, without leaving a comment)
– Philippos
5 hours ago
add a comment |
Your Answer
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "106"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f505501%2fgrep-doesnt-work-when-trying-to-match-a-word-from-a-file-in-a-second-file%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
Given two ordinary Unix text files, your shell loop prints
1235
since this is the line that occurs in both files. If it does not, then one of your files may be a DOS text file. You can convert DOS text files into Unix text files with the dos2unix
utility.
There is nothing major wrong with your loop given the type of data that you have, apart from the fact that it calls grep
once for every line in file1
. It also would match substrings, for example 100
in 1001
, and it would, if any line in file1
contained spaces or tabs, split these lines into multiple words (due to the for i in $(cat ...)
where the $(cat ...)
is unquoted).
If you want to solve your issue this way (with a loop), you would better do
while IFS= read -r word; do
grep -xF -e "$word" file2
done <file1
The -x
and -F
are explained later in my answer, and -e
signifies that the next argument is the pattern to match with (otherwise, it may be taken as a command line option if it starts with a dash (-
).
This would still execute grep
once for each line in file1
, but it would do it correctly.
To extract lines in file2
that exactly correspond to line in file1
, without using a shell loop, you would use
$ grep -xF -f file1 file2
1235
This is assuming that file1
contains a reasonable number of lines, but not too many ("too many" will depend on the amount of memory that you have).
The command uses grep
with -x
, which forces matches across full lines only (no substring matches), and with -F
which changes grep
to do string comparisons rather than regular expression matches.
The -f file1
instructs grep
to read the patterns (the strings to match with) from file1
.
For really massive amounts of data, it would be hugely inefficient to use grep
though. Instead, for this task and with this type of data (single words on individual lines), it would be better to do a relational join operation between the files:
$ join file1 file2
1235
This would, assuming that both files are lexicographically sorted, return the numbers that are the same between both files.
Using comm
:
$ comm -1 -2 file1 file2
1235
comm
also compares sorted files and can easily handle very large datasets. It prints three columns by default:
- lines that occur in the first file only
- lines that occur in the second file only
- lines that occurs in both files
With -1
we turn off the output of the first column, and with -2
we disable the second column, leaving comm
to only output the lines that are the same in both files.
add a comment |
Given two ordinary Unix text files, your shell loop prints
1235
since this is the line that occurs in both files. If it does not, then one of your files may be a DOS text file. You can convert DOS text files into Unix text files with the dos2unix
utility.
There is nothing major wrong with your loop given the type of data that you have, apart from the fact that it calls grep
once for every line in file1
. It also would match substrings, for example 100
in 1001
, and it would, if any line in file1
contained spaces or tabs, split these lines into multiple words (due to the for i in $(cat ...)
where the $(cat ...)
is unquoted).
If you want to solve your issue this way (with a loop), you would better do
while IFS= read -r word; do
grep -xF -e "$word" file2
done <file1
The -x
and -F
are explained later in my answer, and -e
signifies that the next argument is the pattern to match with (otherwise, it may be taken as a command line option if it starts with a dash (-
).
This would still execute grep
once for each line in file1
, but it would do it correctly.
To extract lines in file2
that exactly correspond to line in file1
, without using a shell loop, you would use
$ grep -xF -f file1 file2
1235
This is assuming that file1
contains a reasonable number of lines, but not too many ("too many" will depend on the amount of memory that you have).
The command uses grep
with -x
, which forces matches across full lines only (no substring matches), and with -F
which changes grep
to do string comparisons rather than regular expression matches.
The -f file1
instructs grep
to read the patterns (the strings to match with) from file1
.
For really massive amounts of data, it would be hugely inefficient to use grep
though. Instead, for this task and with this type of data (single words on individual lines), it would be better to do a relational join operation between the files:
$ join file1 file2
1235
This would, assuming that both files are lexicographically sorted, return the numbers that are the same between both files.
Using comm
:
$ comm -1 -2 file1 file2
1235
comm
also compares sorted files and can easily handle very large datasets. It prints three columns by default:
- lines that occur in the first file only
- lines that occur in the second file only
- lines that occurs in both files
With -1
we turn off the output of the first column, and with -2
we disable the second column, leaving comm
to only output the lines that are the same in both files.
add a comment |
Given two ordinary Unix text files, your shell loop prints
1235
since this is the line that occurs in both files. If it does not, then one of your files may be a DOS text file. You can convert DOS text files into Unix text files with the dos2unix
utility.
There is nothing major wrong with your loop given the type of data that you have, apart from the fact that it calls grep
once for every line in file1
. It also would match substrings, for example 100
in 1001
, and it would, if any line in file1
contained spaces or tabs, split these lines into multiple words (due to the for i in $(cat ...)
where the $(cat ...)
is unquoted).
If you want to solve your issue this way (with a loop), you would better do
while IFS= read -r word; do
grep -xF -e "$word" file2
done <file1
The -x
and -F
are explained later in my answer, and -e
signifies that the next argument is the pattern to match with (otherwise, it may be taken as a command line option if it starts with a dash (-
).
This would still execute grep
once for each line in file1
, but it would do it correctly.
To extract lines in file2
that exactly correspond to line in file1
, without using a shell loop, you would use
$ grep -xF -f file1 file2
1235
This is assuming that file1
contains a reasonable number of lines, but not too many ("too many" will depend on the amount of memory that you have).
The command uses grep
with -x
, which forces matches across full lines only (no substring matches), and with -F
which changes grep
to do string comparisons rather than regular expression matches.
The -f file1
instructs grep
to read the patterns (the strings to match with) from file1
.
For really massive amounts of data, it would be hugely inefficient to use grep
though. Instead, for this task and with this type of data (single words on individual lines), it would be better to do a relational join operation between the files:
$ join file1 file2
1235
This would, assuming that both files are lexicographically sorted, return the numbers that are the same between both files.
Using comm
:
$ comm -1 -2 file1 file2
1235
comm
also compares sorted files and can easily handle very large datasets. It prints three columns by default:
- lines that occur in the first file only
- lines that occur in the second file only
- lines that occurs in both files
With -1
we turn off the output of the first column, and with -2
we disable the second column, leaving comm
to only output the lines that are the same in both files.
Given two ordinary Unix text files, your shell loop prints
1235
since this is the line that occurs in both files. If it does not, then one of your files may be a DOS text file. You can convert DOS text files into Unix text files with the dos2unix
utility.
There is nothing major wrong with your loop given the type of data that you have, apart from the fact that it calls grep
once for every line in file1
. It also would match substrings, for example 100
in 1001
, and it would, if any line in file1
contained spaces or tabs, split these lines into multiple words (due to the for i in $(cat ...)
where the $(cat ...)
is unquoted).
If you want to solve your issue this way (with a loop), you would better do
while IFS= read -r word; do
grep -xF -e "$word" file2
done <file1
The -x
and -F
are explained later in my answer, and -e
signifies that the next argument is the pattern to match with (otherwise, it may be taken as a command line option if it starts with a dash (-
).
This would still execute grep
once for each line in file1
, but it would do it correctly.
To extract lines in file2
that exactly correspond to line in file1
, without using a shell loop, you would use
$ grep -xF -f file1 file2
1235
This is assuming that file1
contains a reasonable number of lines, but not too many ("too many" will depend on the amount of memory that you have).
The command uses grep
with -x
, which forces matches across full lines only (no substring matches), and with -F
which changes grep
to do string comparisons rather than regular expression matches.
The -f file1
instructs grep
to read the patterns (the strings to match with) from file1
.
For really massive amounts of data, it would be hugely inefficient to use grep
though. Instead, for this task and with this type of data (single words on individual lines), it would be better to do a relational join operation between the files:
$ join file1 file2
1235
This would, assuming that both files are lexicographically sorted, return the numbers that are the same between both files.
Using comm
:
$ comm -1 -2 file1 file2
1235
comm
also compares sorted files and can easily handle very large datasets. It prints three columns by default:
- lines that occur in the first file only
- lines that occur in the second file only
- lines that occurs in both files
With -1
we turn off the output of the first column, and with -2
we disable the second column, leaving comm
to only output the lines that are the same in both files.
edited 4 hours ago
answered 4 hours ago
KusalanandaKusalananda
135k17255418
135k17255418
add a comment |
add a comment |
You simply need to use grep -f file1 file2
OR you may also use cat file1 | grep -f /dev/stdin file2
1
Thank you for contributing. Please note that a) a good answer explains what you do, so others can not just use it, but learn from it and b) without specifying-x
or-w
to grep you can get unwanted results if not all numbers are four-digit numbers like in the example (like234
infile1
ould match1234
infile2
). That was probably the reason for someone to downvote your answer (sadly, without leaving a comment)
– Philippos
5 hours ago
add a comment |
You simply need to use grep -f file1 file2
OR you may also use cat file1 | grep -f /dev/stdin file2
1
Thank you for contributing. Please note that a) a good answer explains what you do, so others can not just use it, but learn from it and b) without specifying-x
or-w
to grep you can get unwanted results if not all numbers are four-digit numbers like in the example (like234
infile1
ould match1234
infile2
). That was probably the reason for someone to downvote your answer (sadly, without leaving a comment)
– Philippos
5 hours ago
add a comment |
You simply need to use grep -f file1 file2
OR you may also use cat file1 | grep -f /dev/stdin file2
You simply need to use grep -f file1 file2
OR you may also use cat file1 | grep -f /dev/stdin file2
edited 5 hours ago
Philippos
6,06711647
6,06711647
answered 7 hours ago
user335735user335735
1
1
1
Thank you for contributing. Please note that a) a good answer explains what you do, so others can not just use it, but learn from it and b) without specifying-x
or-w
to grep you can get unwanted results if not all numbers are four-digit numbers like in the example (like234
infile1
ould match1234
infile2
). That was probably the reason for someone to downvote your answer (sadly, without leaving a comment)
– Philippos
5 hours ago
add a comment |
1
Thank you for contributing. Please note that a) a good answer explains what you do, so others can not just use it, but learn from it and b) without specifying-x
or-w
to grep you can get unwanted results if not all numbers are four-digit numbers like in the example (like234
infile1
ould match1234
infile2
). That was probably the reason for someone to downvote your answer (sadly, without leaving a comment)
– Philippos
5 hours ago
1
1
Thank you for contributing. Please note that a) a good answer explains what you do, so others can not just use it, but learn from it and b) without specifying
-x
or -w
to grep you can get unwanted results if not all numbers are four-digit numbers like in the example (like 234
in file1
ould match 1234
in file2
). That was probably the reason for someone to downvote your answer (sadly, without leaving a comment)– Philippos
5 hours ago
Thank you for contributing. Please note that a) a good answer explains what you do, so others can not just use it, but learn from it and b) without specifying
-x
or -w
to grep you can get unwanted results if not all numbers are four-digit numbers like in the example (like 234
in file1
ould match 1234
in file2
). That was probably the reason for someone to downvote your answer (sadly, without leaving a comment)– Philippos
5 hours ago
add a comment |
Thanks for contributing an answer to Unix & Linux Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f505501%2fgrep-doesnt-work-when-trying-to-match-a-word-from-a-file-in-a-second-file%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
It should - but you would probably be advised to use something like
grep -Fwf file1 file2
instead– steeldriver
7 hours ago
It should but it consistently doesn't, even if I do it on the same file like
for i in $(cat file1); do grep ${i} file1; done
it still doesn't work. I'll try your advice– user323587
7 hours ago
Notice: I just tryed your code and works for me. Is there any chance that file1 contains hidden characters? ... may be or r or tabs?
– Juan
7 hours ago
if what you really wants is to compare the files, you might want to use
sort
,uniq
anddiff
(orkompare
ork3diff
or any other file comparison tool)– Juan
7 hours ago
With those columns of numbers in two files, and that command, I get the
1235
as output, it seems to be the lone duplicate. In other words, I can't see an issue with the result here. Of course if the data is broken, like CRLF line endings infile1
but not infile2
, then you'd have problems.– ilkkachu
5 hours ago