case-insensitive search of duplicate file-names
I there a way to find all files in a directory with duplicate filenames, regardless of the casing (upper-case and/or lower-case)?
find uniq case-sensitivity duplicate-files
add a comment |
I there a way to find all files in a directory with duplicate filenames, regardless of the casing (upper-case and/or lower-case)?
find uniq case-sensitivity duplicate-files
add a comment |
I there a way to find all files in a directory with duplicate filenames, regardless of the casing (upper-case and/or lower-case)?
find uniq case-sensitivity duplicate-files
I there a way to find all files in a directory with duplicate filenames, regardless of the casing (upper-case and/or lower-case)?
find uniq case-sensitivity duplicate-files
find uniq case-sensitivity duplicate-files
edited 9 mins ago
Jeff Schaller
42.5k1158135
42.5k1158135
asked Oct 18 '11 at 19:02
lamcrolamcro
7221712
7221712
add a comment |
add a comment |
6 Answers
6
active
oldest
votes
If you have GNU utilities (or at least a set that can deal with zero-terminated lines) available, another answer has a great method:
find . -maxdepth 1 -print0 | sort -z | uniq -diz
Note: the output will have zero-terminated strings; the tool you use to further process it should be able to handle that.
In the absence of tools that deal with zero-terminated lines, or if you want to make sure your code works in environments where such tools are not available, you need a small script:
#!/bin/sh
for f in *; do
find . -maxdepth 1 -iname ./"$f" -exec echo ; | wc -l | while read count; do
[ $count -gt 1 ] && echo $f
done
done
What is this madness?
See this answer for an explanation of the techniques that make this safe for crazy filenames.
1
I was just going to post a similar... But worse answer :)
– rozcietrzewiacz
Oct 18 '11 at 19:28
1
Do you really need the-mindepth
's?
– rozcietrzewiacz
Oct 18 '11 at 19:30
I'm using Solaris. Is /usr/bin/find the one you are talking about? I tried using it and gave me many errors.
– lamcro
Oct 18 '11 at 20:27
@lamcro No, Solaris doesn't use GNU'sfind
; I've edited the answer to include a non-GNU solution.
– Shawn J. Goff
Oct 18 '11 at 21:38
Ok. Do I just paste it in a text file and give it execution rights?
– lamcro
Oct 18 '11 at 22:02
|
show 13 more comments
There are many complicated answers above, this seems simpler and quicker than all of them:
find . -maxdepth 1 | sort -f | uniq -di
If you want to find duplicate file names in subdirectories then you need to compare just the file name, not the whole path:
find . -maxdepth 2 -printf "%fn" | sort -f | uniq -di
Edit: Shawn J. Goff has pointed out that this will fail if you have filenames with newline characters. If you're using GNU utilities, you can make these work too:
find . -maxdepth 1 -print0 | sort -fz | uniq -diz
The -print0
(for find) and -z
option (for sort and uniq) cause them to work on NUL-terminated strings, instead of newline terminated strings. Since file names can not contain NUL, this works for all file names.
1
But see my comment on Shawn J. Goff's answer, you can add the -print0 option to find, and the -z option to uniq and sort. Also, you want -f on sort as well. Then it works. (I'm going to edit this into your answer, feel free to revert if you don't approve)
– derobert
Oct 26 '12 at 17:41
The last command is giving me output without carriage returns (result is all in one line). I'm using Red Hat Linux to run the command. The first command line works best for me.
– Sun
Aug 26 '15 at 16:42
add a comment |
Sort the list of file names in a case-insensitive way and print duplicates. sort
has an option for case-insensitive sorting. So does GNU uniq
, but not other implementations, and all you can do with uniq
is print every element in a set of duplicates except the first that's encountered. With GNU tools, assuming that no file name contains a newline, there's an easy way to print all the elements but one in each set of duplicates:
for x in *; do printf "%sn" "$x"; done |
sort -f |
uniq -id
Portably, to print all elements in each set of duplicates, assuming that no file name contains a newline:
for x in *; do printf "%sn" "$x"; done |
sort -f |
awk '
tolower($0) == tolower(prev) {
print prev;
while (tolower($0) == tolower(prev)) {print; getline}
}
1 { prev = $0 }'
If you need to accommodate file names containing newlines, go for Perl or Python. Note that you may need to tweak the output, or better do your further processing in the same language, as the sample code below uses newlines to separate names in its own output.
perl -e '
foreach (glob("*")) {push @{$f{lc($_)}}, $_}
foreach (keys %f) {@names = @{$f{$_}}; if (@names > 1) {print "$_n" foreach @names}}
'
Here's a pure zsh solution. It's a bit verbose, as there's no built-in way to keep the duplicate elements in an array or glob result.
a=(*)(N); a=("${(@io)a}")
[[ $#a -le 1 ]] ||
for i in {2..$#a}; do
if [[ ${(L)a[$i]} == ${(L)a[$((i-1))]} ]]; then
[[ ${(L)a[$i-2]} == ${(L)a[$((i-1))]} ]] || print -r $a[$((i-1))]
print -r $a[$i]
fi
done
add a comment |
Without GNU find
:
LANG=en_US ls | tr '[A-Z]' '[a-z]' | uniq -c | awk '$1 >= 2 {print $2}'
2
tr
is very likely to wreak havoc on any character set which uses more than a single byte per character. Only the first 256 characters of UTF-8 are safe when usingtr
. From Wikipedia tr (Unix).. Most versions oftr
, including GNUtr
and classic Unixtr
, operate on SINGLE BYTES and are not Unicode compliant..
– Peter.O
Oct 19 '11 at 15:24
1
Update to my previous comment.. only the first 128 characters of UTF-8 are safe. All UTF-8 characters above the ordinal range 0..127 are all multi-byte and can have individual byte values in other characters. Only the bytes in the range 0..127 have a one-to-one association to a unique character.
– Peter.O
Aug 28 '12 at 0:08
Plusuniq
has a case-insensitive flag i.
– Jamie Kitson
Oct 26 '12 at 12:06
add a comment |
I finally managed it this way:
find . | tr '[:upper:]' '[:lower:]' | sort | uniq -d
I used find
instead of ls
cause I needed the full path (a lot of subdirectories) included. I did not find how to do this with ls
.
1
Bothsort
anduniq
have ignore-case flags, f and i respectively.
– Jamie Kitson
Oct 26 '12 at 12:03
add a comment |
For anyone else who wants to then rename etc one of the files:
find . -maxdepth 1 | sort -f | uniq -di | while read f; do echo mv "$f" "${f/.txt/_.txt}"; done
New contributor
add a comment |
Your Answer
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "106"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f22870%2fcase-insensitive-search-of-duplicate-file-names%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
6 Answers
6
active
oldest
votes
6 Answers
6
active
oldest
votes
active
oldest
votes
active
oldest
votes
If you have GNU utilities (or at least a set that can deal with zero-terminated lines) available, another answer has a great method:
find . -maxdepth 1 -print0 | sort -z | uniq -diz
Note: the output will have zero-terminated strings; the tool you use to further process it should be able to handle that.
In the absence of tools that deal with zero-terminated lines, or if you want to make sure your code works in environments where such tools are not available, you need a small script:
#!/bin/sh
for f in *; do
find . -maxdepth 1 -iname ./"$f" -exec echo ; | wc -l | while read count; do
[ $count -gt 1 ] && echo $f
done
done
What is this madness?
See this answer for an explanation of the techniques that make this safe for crazy filenames.
1
I was just going to post a similar... But worse answer :)
– rozcietrzewiacz
Oct 18 '11 at 19:28
1
Do you really need the-mindepth
's?
– rozcietrzewiacz
Oct 18 '11 at 19:30
I'm using Solaris. Is /usr/bin/find the one you are talking about? I tried using it and gave me many errors.
– lamcro
Oct 18 '11 at 20:27
@lamcro No, Solaris doesn't use GNU'sfind
; I've edited the answer to include a non-GNU solution.
– Shawn J. Goff
Oct 18 '11 at 21:38
Ok. Do I just paste it in a text file and give it execution rights?
– lamcro
Oct 18 '11 at 22:02
|
show 13 more comments
If you have GNU utilities (or at least a set that can deal with zero-terminated lines) available, another answer has a great method:
find . -maxdepth 1 -print0 | sort -z | uniq -diz
Note: the output will have zero-terminated strings; the tool you use to further process it should be able to handle that.
In the absence of tools that deal with zero-terminated lines, or if you want to make sure your code works in environments where such tools are not available, you need a small script:
#!/bin/sh
for f in *; do
find . -maxdepth 1 -iname ./"$f" -exec echo ; | wc -l | while read count; do
[ $count -gt 1 ] && echo $f
done
done
What is this madness?
See this answer for an explanation of the techniques that make this safe for crazy filenames.
1
I was just going to post a similar... But worse answer :)
– rozcietrzewiacz
Oct 18 '11 at 19:28
1
Do you really need the-mindepth
's?
– rozcietrzewiacz
Oct 18 '11 at 19:30
I'm using Solaris. Is /usr/bin/find the one you are talking about? I tried using it and gave me many errors.
– lamcro
Oct 18 '11 at 20:27
@lamcro No, Solaris doesn't use GNU'sfind
; I've edited the answer to include a non-GNU solution.
– Shawn J. Goff
Oct 18 '11 at 21:38
Ok. Do I just paste it in a text file and give it execution rights?
– lamcro
Oct 18 '11 at 22:02
|
show 13 more comments
If you have GNU utilities (or at least a set that can deal with zero-terminated lines) available, another answer has a great method:
find . -maxdepth 1 -print0 | sort -z | uniq -diz
Note: the output will have zero-terminated strings; the tool you use to further process it should be able to handle that.
In the absence of tools that deal with zero-terminated lines, or if you want to make sure your code works in environments where such tools are not available, you need a small script:
#!/bin/sh
for f in *; do
find . -maxdepth 1 -iname ./"$f" -exec echo ; | wc -l | while read count; do
[ $count -gt 1 ] && echo $f
done
done
What is this madness?
See this answer for an explanation of the techniques that make this safe for crazy filenames.
If you have GNU utilities (or at least a set that can deal with zero-terminated lines) available, another answer has a great method:
find . -maxdepth 1 -print0 | sort -z | uniq -diz
Note: the output will have zero-terminated strings; the tool you use to further process it should be able to handle that.
In the absence of tools that deal with zero-terminated lines, or if you want to make sure your code works in environments where such tools are not available, you need a small script:
#!/bin/sh
for f in *; do
find . -maxdepth 1 -iname ./"$f" -exec echo ; | wc -l | while read count; do
[ $count -gt 1 ] && echo $f
done
done
What is this madness?
See this answer for an explanation of the techniques that make this safe for crazy filenames.
edited Apr 13 '17 at 12:36
Community♦
1
1
answered Oct 18 '11 at 19:26
Shawn J. GoffShawn J. Goff
29.9k19112134
29.9k19112134
1
I was just going to post a similar... But worse answer :)
– rozcietrzewiacz
Oct 18 '11 at 19:28
1
Do you really need the-mindepth
's?
– rozcietrzewiacz
Oct 18 '11 at 19:30
I'm using Solaris. Is /usr/bin/find the one you are talking about? I tried using it and gave me many errors.
– lamcro
Oct 18 '11 at 20:27
@lamcro No, Solaris doesn't use GNU'sfind
; I've edited the answer to include a non-GNU solution.
– Shawn J. Goff
Oct 18 '11 at 21:38
Ok. Do I just paste it in a text file and give it execution rights?
– lamcro
Oct 18 '11 at 22:02
|
show 13 more comments
1
I was just going to post a similar... But worse answer :)
– rozcietrzewiacz
Oct 18 '11 at 19:28
1
Do you really need the-mindepth
's?
– rozcietrzewiacz
Oct 18 '11 at 19:30
I'm using Solaris. Is /usr/bin/find the one you are talking about? I tried using it and gave me many errors.
– lamcro
Oct 18 '11 at 20:27
@lamcro No, Solaris doesn't use GNU'sfind
; I've edited the answer to include a non-GNU solution.
– Shawn J. Goff
Oct 18 '11 at 21:38
Ok. Do I just paste it in a text file and give it execution rights?
– lamcro
Oct 18 '11 at 22:02
1
1
I was just going to post a similar... But worse answer :)
– rozcietrzewiacz
Oct 18 '11 at 19:28
I was just going to post a similar... But worse answer :)
– rozcietrzewiacz
Oct 18 '11 at 19:28
1
1
Do you really need the
-mindepth
's?– rozcietrzewiacz
Oct 18 '11 at 19:30
Do you really need the
-mindepth
's?– rozcietrzewiacz
Oct 18 '11 at 19:30
I'm using Solaris. Is /usr/bin/find the one you are talking about? I tried using it and gave me many errors.
– lamcro
Oct 18 '11 at 20:27
I'm using Solaris. Is /usr/bin/find the one you are talking about? I tried using it and gave me many errors.
– lamcro
Oct 18 '11 at 20:27
@lamcro No, Solaris doesn't use GNU's
find
; I've edited the answer to include a non-GNU solution.– Shawn J. Goff
Oct 18 '11 at 21:38
@lamcro No, Solaris doesn't use GNU's
find
; I've edited the answer to include a non-GNU solution.– Shawn J. Goff
Oct 18 '11 at 21:38
Ok. Do I just paste it in a text file and give it execution rights?
– lamcro
Oct 18 '11 at 22:02
Ok. Do I just paste it in a text file and give it execution rights?
– lamcro
Oct 18 '11 at 22:02
|
show 13 more comments
There are many complicated answers above, this seems simpler and quicker than all of them:
find . -maxdepth 1 | sort -f | uniq -di
If you want to find duplicate file names in subdirectories then you need to compare just the file name, not the whole path:
find . -maxdepth 2 -printf "%fn" | sort -f | uniq -di
Edit: Shawn J. Goff has pointed out that this will fail if you have filenames with newline characters. If you're using GNU utilities, you can make these work too:
find . -maxdepth 1 -print0 | sort -fz | uniq -diz
The -print0
(for find) and -z
option (for sort and uniq) cause them to work on NUL-terminated strings, instead of newline terminated strings. Since file names can not contain NUL, this works for all file names.
1
But see my comment on Shawn J. Goff's answer, you can add the -print0 option to find, and the -z option to uniq and sort. Also, you want -f on sort as well. Then it works. (I'm going to edit this into your answer, feel free to revert if you don't approve)
– derobert
Oct 26 '12 at 17:41
The last command is giving me output without carriage returns (result is all in one line). I'm using Red Hat Linux to run the command. The first command line works best for me.
– Sun
Aug 26 '15 at 16:42
add a comment |
There are many complicated answers above, this seems simpler and quicker than all of them:
find . -maxdepth 1 | sort -f | uniq -di
If you want to find duplicate file names in subdirectories then you need to compare just the file name, not the whole path:
find . -maxdepth 2 -printf "%fn" | sort -f | uniq -di
Edit: Shawn J. Goff has pointed out that this will fail if you have filenames with newline characters. If you're using GNU utilities, you can make these work too:
find . -maxdepth 1 -print0 | sort -fz | uniq -diz
The -print0
(for find) and -z
option (for sort and uniq) cause them to work on NUL-terminated strings, instead of newline terminated strings. Since file names can not contain NUL, this works for all file names.
1
But see my comment on Shawn J. Goff's answer, you can add the -print0 option to find, and the -z option to uniq and sort. Also, you want -f on sort as well. Then it works. (I'm going to edit this into your answer, feel free to revert if you don't approve)
– derobert
Oct 26 '12 at 17:41
The last command is giving me output without carriage returns (result is all in one line). I'm using Red Hat Linux to run the command. The first command line works best for me.
– Sun
Aug 26 '15 at 16:42
add a comment |
There are many complicated answers above, this seems simpler and quicker than all of them:
find . -maxdepth 1 | sort -f | uniq -di
If you want to find duplicate file names in subdirectories then you need to compare just the file name, not the whole path:
find . -maxdepth 2 -printf "%fn" | sort -f | uniq -di
Edit: Shawn J. Goff has pointed out that this will fail if you have filenames with newline characters. If you're using GNU utilities, you can make these work too:
find . -maxdepth 1 -print0 | sort -fz | uniq -diz
The -print0
(for find) and -z
option (for sort and uniq) cause them to work on NUL-terminated strings, instead of newline terminated strings. Since file names can not contain NUL, this works for all file names.
There are many complicated answers above, this seems simpler and quicker than all of them:
find . -maxdepth 1 | sort -f | uniq -di
If you want to find duplicate file names in subdirectories then you need to compare just the file name, not the whole path:
find . -maxdepth 2 -printf "%fn" | sort -f | uniq -di
Edit: Shawn J. Goff has pointed out that this will fail if you have filenames with newline characters. If you're using GNU utilities, you can make these work too:
find . -maxdepth 1 -print0 | sort -fz | uniq -diz
The -print0
(for find) and -z
option (for sort and uniq) cause them to work on NUL-terminated strings, instead of newline terminated strings. Since file names can not contain NUL, this works for all file names.
edited Oct 26 '12 at 17:45
derobert
74.3k8161215
74.3k8161215
answered Oct 26 '12 at 12:08
Jamie KitsonJamie Kitson
358311
358311
1
But see my comment on Shawn J. Goff's answer, you can add the -print0 option to find, and the -z option to uniq and sort. Also, you want -f on sort as well. Then it works. (I'm going to edit this into your answer, feel free to revert if you don't approve)
– derobert
Oct 26 '12 at 17:41
The last command is giving me output without carriage returns (result is all in one line). I'm using Red Hat Linux to run the command. The first command line works best for me.
– Sun
Aug 26 '15 at 16:42
add a comment |
1
But see my comment on Shawn J. Goff's answer, you can add the -print0 option to find, and the -z option to uniq and sort. Also, you want -f on sort as well. Then it works. (I'm going to edit this into your answer, feel free to revert if you don't approve)
– derobert
Oct 26 '12 at 17:41
The last command is giving me output without carriage returns (result is all in one line). I'm using Red Hat Linux to run the command. The first command line works best for me.
– Sun
Aug 26 '15 at 16:42
1
1
But see my comment on Shawn J. Goff's answer, you can add the -print0 option to find, and the -z option to uniq and sort. Also, you want -f on sort as well. Then it works. (I'm going to edit this into your answer, feel free to revert if you don't approve)
– derobert
Oct 26 '12 at 17:41
But see my comment on Shawn J. Goff's answer, you can add the -print0 option to find, and the -z option to uniq and sort. Also, you want -f on sort as well. Then it works. (I'm going to edit this into your answer, feel free to revert if you don't approve)
– derobert
Oct 26 '12 at 17:41
The last command is giving me output without carriage returns (result is all in one line). I'm using Red Hat Linux to run the command. The first command line works best for me.
– Sun
Aug 26 '15 at 16:42
The last command is giving me output without carriage returns (result is all in one line). I'm using Red Hat Linux to run the command. The first command line works best for me.
– Sun
Aug 26 '15 at 16:42
add a comment |
Sort the list of file names in a case-insensitive way and print duplicates. sort
has an option for case-insensitive sorting. So does GNU uniq
, but not other implementations, and all you can do with uniq
is print every element in a set of duplicates except the first that's encountered. With GNU tools, assuming that no file name contains a newline, there's an easy way to print all the elements but one in each set of duplicates:
for x in *; do printf "%sn" "$x"; done |
sort -f |
uniq -id
Portably, to print all elements in each set of duplicates, assuming that no file name contains a newline:
for x in *; do printf "%sn" "$x"; done |
sort -f |
awk '
tolower($0) == tolower(prev) {
print prev;
while (tolower($0) == tolower(prev)) {print; getline}
}
1 { prev = $0 }'
If you need to accommodate file names containing newlines, go for Perl or Python. Note that you may need to tweak the output, or better do your further processing in the same language, as the sample code below uses newlines to separate names in its own output.
perl -e '
foreach (glob("*")) {push @{$f{lc($_)}}, $_}
foreach (keys %f) {@names = @{$f{$_}}; if (@names > 1) {print "$_n" foreach @names}}
'
Here's a pure zsh solution. It's a bit verbose, as there's no built-in way to keep the duplicate elements in an array or glob result.
a=(*)(N); a=("${(@io)a}")
[[ $#a -le 1 ]] ||
for i in {2..$#a}; do
if [[ ${(L)a[$i]} == ${(L)a[$((i-1))]} ]]; then
[[ ${(L)a[$i-2]} == ${(L)a[$((i-1))]} ]] || print -r $a[$((i-1))]
print -r $a[$i]
fi
done
add a comment |
Sort the list of file names in a case-insensitive way and print duplicates. sort
has an option for case-insensitive sorting. So does GNU uniq
, but not other implementations, and all you can do with uniq
is print every element in a set of duplicates except the first that's encountered. With GNU tools, assuming that no file name contains a newline, there's an easy way to print all the elements but one in each set of duplicates:
for x in *; do printf "%sn" "$x"; done |
sort -f |
uniq -id
Portably, to print all elements in each set of duplicates, assuming that no file name contains a newline:
for x in *; do printf "%sn" "$x"; done |
sort -f |
awk '
tolower($0) == tolower(prev) {
print prev;
while (tolower($0) == tolower(prev)) {print; getline}
}
1 { prev = $0 }'
If you need to accommodate file names containing newlines, go for Perl or Python. Note that you may need to tweak the output, or better do your further processing in the same language, as the sample code below uses newlines to separate names in its own output.
perl -e '
foreach (glob("*")) {push @{$f{lc($_)}}, $_}
foreach (keys %f) {@names = @{$f{$_}}; if (@names > 1) {print "$_n" foreach @names}}
'
Here's a pure zsh solution. It's a bit verbose, as there's no built-in way to keep the duplicate elements in an array or glob result.
a=(*)(N); a=("${(@io)a}")
[[ $#a -le 1 ]] ||
for i in {2..$#a}; do
if [[ ${(L)a[$i]} == ${(L)a[$((i-1))]} ]]; then
[[ ${(L)a[$i-2]} == ${(L)a[$((i-1))]} ]] || print -r $a[$((i-1))]
print -r $a[$i]
fi
done
add a comment |
Sort the list of file names in a case-insensitive way and print duplicates. sort
has an option for case-insensitive sorting. So does GNU uniq
, but not other implementations, and all you can do with uniq
is print every element in a set of duplicates except the first that's encountered. With GNU tools, assuming that no file name contains a newline, there's an easy way to print all the elements but one in each set of duplicates:
for x in *; do printf "%sn" "$x"; done |
sort -f |
uniq -id
Portably, to print all elements in each set of duplicates, assuming that no file name contains a newline:
for x in *; do printf "%sn" "$x"; done |
sort -f |
awk '
tolower($0) == tolower(prev) {
print prev;
while (tolower($0) == tolower(prev)) {print; getline}
}
1 { prev = $0 }'
If you need to accommodate file names containing newlines, go for Perl or Python. Note that you may need to tweak the output, or better do your further processing in the same language, as the sample code below uses newlines to separate names in its own output.
perl -e '
foreach (glob("*")) {push @{$f{lc($_)}}, $_}
foreach (keys %f) {@names = @{$f{$_}}; if (@names > 1) {print "$_n" foreach @names}}
'
Here's a pure zsh solution. It's a bit verbose, as there's no built-in way to keep the duplicate elements in an array or glob result.
a=(*)(N); a=("${(@io)a}")
[[ $#a -le 1 ]] ||
for i in {2..$#a}; do
if [[ ${(L)a[$i]} == ${(L)a[$((i-1))]} ]]; then
[[ ${(L)a[$i-2]} == ${(L)a[$((i-1))]} ]] || print -r $a[$((i-1))]
print -r $a[$i]
fi
done
Sort the list of file names in a case-insensitive way and print duplicates. sort
has an option for case-insensitive sorting. So does GNU uniq
, but not other implementations, and all you can do with uniq
is print every element in a set of duplicates except the first that's encountered. With GNU tools, assuming that no file name contains a newline, there's an easy way to print all the elements but one in each set of duplicates:
for x in *; do printf "%sn" "$x"; done |
sort -f |
uniq -id
Portably, to print all elements in each set of duplicates, assuming that no file name contains a newline:
for x in *; do printf "%sn" "$x"; done |
sort -f |
awk '
tolower($0) == tolower(prev) {
print prev;
while (tolower($0) == tolower(prev)) {print; getline}
}
1 { prev = $0 }'
If you need to accommodate file names containing newlines, go for Perl or Python. Note that you may need to tweak the output, or better do your further processing in the same language, as the sample code below uses newlines to separate names in its own output.
perl -e '
foreach (glob("*")) {push @{$f{lc($_)}}, $_}
foreach (keys %f) {@names = @{$f{$_}}; if (@names > 1) {print "$_n" foreach @names}}
'
Here's a pure zsh solution. It's a bit verbose, as there's no built-in way to keep the duplicate elements in an array or glob result.
a=(*)(N); a=("${(@io)a}")
[[ $#a -le 1 ]] ||
for i in {2..$#a}; do
if [[ ${(L)a[$i]} == ${(L)a[$((i-1))]} ]]; then
[[ ${(L)a[$i-2]} == ${(L)a[$((i-1))]} ]] || print -r $a[$((i-1))]
print -r $a[$i]
fi
done
answered Oct 19 '11 at 9:40
GillesGilles
539k12810911606
539k12810911606
add a comment |
add a comment |
Without GNU find
:
LANG=en_US ls | tr '[A-Z]' '[a-z]' | uniq -c | awk '$1 >= 2 {print $2}'
2
tr
is very likely to wreak havoc on any character set which uses more than a single byte per character. Only the first 256 characters of UTF-8 are safe when usingtr
. From Wikipedia tr (Unix).. Most versions oftr
, including GNUtr
and classic Unixtr
, operate on SINGLE BYTES and are not Unicode compliant..
– Peter.O
Oct 19 '11 at 15:24
1
Update to my previous comment.. only the first 128 characters of UTF-8 are safe. All UTF-8 characters above the ordinal range 0..127 are all multi-byte and can have individual byte values in other characters. Only the bytes in the range 0..127 have a one-to-one association to a unique character.
– Peter.O
Aug 28 '12 at 0:08
Plusuniq
has a case-insensitive flag i.
– Jamie Kitson
Oct 26 '12 at 12:06
add a comment |
Without GNU find
:
LANG=en_US ls | tr '[A-Z]' '[a-z]' | uniq -c | awk '$1 >= 2 {print $2}'
2
tr
is very likely to wreak havoc on any character set which uses more than a single byte per character. Only the first 256 characters of UTF-8 are safe when usingtr
. From Wikipedia tr (Unix).. Most versions oftr
, including GNUtr
and classic Unixtr
, operate on SINGLE BYTES and are not Unicode compliant..
– Peter.O
Oct 19 '11 at 15:24
1
Update to my previous comment.. only the first 128 characters of UTF-8 are safe. All UTF-8 characters above the ordinal range 0..127 are all multi-byte and can have individual byte values in other characters. Only the bytes in the range 0..127 have a one-to-one association to a unique character.
– Peter.O
Aug 28 '12 at 0:08
Plusuniq
has a case-insensitive flag i.
– Jamie Kitson
Oct 26 '12 at 12:06
add a comment |
Without GNU find
:
LANG=en_US ls | tr '[A-Z]' '[a-z]' | uniq -c | awk '$1 >= 2 {print $2}'
Without GNU find
:
LANG=en_US ls | tr '[A-Z]' '[a-z]' | uniq -c | awk '$1 >= 2 {print $2}'
answered Oct 19 '11 at 14:17
Rudolf AdamkovicRudolf Adamkovic
111
111
2
tr
is very likely to wreak havoc on any character set which uses more than a single byte per character. Only the first 256 characters of UTF-8 are safe when usingtr
. From Wikipedia tr (Unix).. Most versions oftr
, including GNUtr
and classic Unixtr
, operate on SINGLE BYTES and are not Unicode compliant..
– Peter.O
Oct 19 '11 at 15:24
1
Update to my previous comment.. only the first 128 characters of UTF-8 are safe. All UTF-8 characters above the ordinal range 0..127 are all multi-byte and can have individual byte values in other characters. Only the bytes in the range 0..127 have a one-to-one association to a unique character.
– Peter.O
Aug 28 '12 at 0:08
Plusuniq
has a case-insensitive flag i.
– Jamie Kitson
Oct 26 '12 at 12:06
add a comment |
2
tr
is very likely to wreak havoc on any character set which uses more than a single byte per character. Only the first 256 characters of UTF-8 are safe when usingtr
. From Wikipedia tr (Unix).. Most versions oftr
, including GNUtr
and classic Unixtr
, operate on SINGLE BYTES and are not Unicode compliant..
– Peter.O
Oct 19 '11 at 15:24
1
Update to my previous comment.. only the first 128 characters of UTF-8 are safe. All UTF-8 characters above the ordinal range 0..127 are all multi-byte and can have individual byte values in other characters. Only the bytes in the range 0..127 have a one-to-one association to a unique character.
– Peter.O
Aug 28 '12 at 0:08
Plusuniq
has a case-insensitive flag i.
– Jamie Kitson
Oct 26 '12 at 12:06
2
2
tr
is very likely to wreak havoc on any character set which uses more than a single byte per character. Only the first 256 characters of UTF-8 are safe when using tr
. From Wikipedia tr (Unix).. Most versions of tr
, including GNU tr
and classic Unix tr
, operate on SINGLE BYTES and are not Unicode compliant..– Peter.O
Oct 19 '11 at 15:24
tr
is very likely to wreak havoc on any character set which uses more than a single byte per character. Only the first 256 characters of UTF-8 are safe when using tr
. From Wikipedia tr (Unix).. Most versions of tr
, including GNU tr
and classic Unix tr
, operate on SINGLE BYTES and are not Unicode compliant..– Peter.O
Oct 19 '11 at 15:24
1
1
Update to my previous comment.. only the first 128 characters of UTF-8 are safe. All UTF-8 characters above the ordinal range 0..127 are all multi-byte and can have individual byte values in other characters. Only the bytes in the range 0..127 have a one-to-one association to a unique character.
– Peter.O
Aug 28 '12 at 0:08
Update to my previous comment.. only the first 128 characters of UTF-8 are safe. All UTF-8 characters above the ordinal range 0..127 are all multi-byte and can have individual byte values in other characters. Only the bytes in the range 0..127 have a one-to-one association to a unique character.
– Peter.O
Aug 28 '12 at 0:08
Plus
uniq
has a case-insensitive flag i.– Jamie Kitson
Oct 26 '12 at 12:06
Plus
uniq
has a case-insensitive flag i.– Jamie Kitson
Oct 26 '12 at 12:06
add a comment |
I finally managed it this way:
find . | tr '[:upper:]' '[:lower:]' | sort | uniq -d
I used find
instead of ls
cause I needed the full path (a lot of subdirectories) included. I did not find how to do this with ls
.
1
Bothsort
anduniq
have ignore-case flags, f and i respectively.
– Jamie Kitson
Oct 26 '12 at 12:03
add a comment |
I finally managed it this way:
find . | tr '[:upper:]' '[:lower:]' | sort | uniq -d
I used find
instead of ls
cause I needed the full path (a lot of subdirectories) included. I did not find how to do this with ls
.
1
Bothsort
anduniq
have ignore-case flags, f and i respectively.
– Jamie Kitson
Oct 26 '12 at 12:03
add a comment |
I finally managed it this way:
find . | tr '[:upper:]' '[:lower:]' | sort | uniq -d
I used find
instead of ls
cause I needed the full path (a lot of subdirectories) included. I did not find how to do this with ls
.
I finally managed it this way:
find . | tr '[:upper:]' '[:lower:]' | sort | uniq -d
I used find
instead of ls
cause I needed the full path (a lot of subdirectories) included. I did not find how to do this with ls
.
answered Oct 19 '11 at 19:17
lamcrolamcro
7221712
7221712
1
Bothsort
anduniq
have ignore-case flags, f and i respectively.
– Jamie Kitson
Oct 26 '12 at 12:03
add a comment |
1
Bothsort
anduniq
have ignore-case flags, f and i respectively.
– Jamie Kitson
Oct 26 '12 at 12:03
1
1
Both
sort
and uniq
have ignore-case flags, f and i respectively.– Jamie Kitson
Oct 26 '12 at 12:03
Both
sort
and uniq
have ignore-case flags, f and i respectively.– Jamie Kitson
Oct 26 '12 at 12:03
add a comment |
For anyone else who wants to then rename etc one of the files:
find . -maxdepth 1 | sort -f | uniq -di | while read f; do echo mv "$f" "${f/.txt/_.txt}"; done
New contributor
add a comment |
For anyone else who wants to then rename etc one of the files:
find . -maxdepth 1 | sort -f | uniq -di | while read f; do echo mv "$f" "${f/.txt/_.txt}"; done
New contributor
add a comment |
For anyone else who wants to then rename etc one of the files:
find . -maxdepth 1 | sort -f | uniq -di | while read f; do echo mv "$f" "${f/.txt/_.txt}"; done
New contributor
For anyone else who wants to then rename etc one of the files:
find . -maxdepth 1 | sort -f | uniq -di | while read f; do echo mv "$f" "${f/.txt/_.txt}"; done
New contributor
New contributor
answered 32 mins ago
user3342930user3342930
1
1
New contributor
New contributor
add a comment |
add a comment |
Thanks for contributing an answer to Unix & Linux Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f22870%2fcase-insensitive-search-of-duplicate-file-names%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown