Getting a match count of objects in a file
I have a large file that has entries that look like this:
entry-id: 1
sn: John
cn: Smith
empType: A
ADID: 123456
entry-id: 2
sn: James
cn: Smith
empType: B
ADID: 123456
entry-id: 3
sn: Jobu
cn: Smith
empType: A
ADID: 123456
entry-id: 4
sn: Jobu
cn: Smith
empType: A
ADID:
Each entry is separated by a new line. I need a count of entries that have an empType of A, and MUST ALSO have a value after ADID(total of 2). I've tried to use awk and grep and egrep, and still having no luck. Any ideas?
linux text-processing command-line
add a comment |
I have a large file that has entries that look like this:
entry-id: 1
sn: John
cn: Smith
empType: A
ADID: 123456
entry-id: 2
sn: James
cn: Smith
empType: B
ADID: 123456
entry-id: 3
sn: Jobu
cn: Smith
empType: A
ADID: 123456
entry-id: 4
sn: Jobu
cn: Smith
empType: A
ADID:
Each entry is separated by a new line. I need a count of entries that have an empType of A, and MUST ALSO have a value after ADID(total of 2). I've tried to use awk and grep and egrep, and still having no luck. Any ideas?
linux text-processing command-line
What exactly did you try in awk? I would think something likeawk -vRS= '/empType: A/ && /ADID: [0-9]+/ {n++} END {print n}' file
should work
– steeldriver
Dec 14 '17 at 3:39
running your command, I got "awk: record `smapsHistory: [NDSEn...' too long record number 213244" there are only like 100 records with an employeeType of C, and it's going crazy....
– King of NES
Dec 14 '17 at 3:54
You did include the correct filename to read as input?
– bu5hman
Dec 14 '17 at 4:15
it was the correct file...
– King of NES
Dec 14 '17 at 4:18
add a comment |
I have a large file that has entries that look like this:
entry-id: 1
sn: John
cn: Smith
empType: A
ADID: 123456
entry-id: 2
sn: James
cn: Smith
empType: B
ADID: 123456
entry-id: 3
sn: Jobu
cn: Smith
empType: A
ADID: 123456
entry-id: 4
sn: Jobu
cn: Smith
empType: A
ADID:
Each entry is separated by a new line. I need a count of entries that have an empType of A, and MUST ALSO have a value after ADID(total of 2). I've tried to use awk and grep and egrep, and still having no luck. Any ideas?
linux text-processing command-line
I have a large file that has entries that look like this:
entry-id: 1
sn: John
cn: Smith
empType: A
ADID: 123456
entry-id: 2
sn: James
cn: Smith
empType: B
ADID: 123456
entry-id: 3
sn: Jobu
cn: Smith
empType: A
ADID: 123456
entry-id: 4
sn: Jobu
cn: Smith
empType: A
ADID:
Each entry is separated by a new line. I need a count of entries that have an empType of A, and MUST ALSO have a value after ADID(total of 2). I've tried to use awk and grep and egrep, and still having no luck. Any ideas?
linux text-processing command-line
linux text-processing command-line
edited 11 mins ago
PRY
2,43031025
2,43031025
asked Dec 14 '17 at 3:29
King of NESKing of NES
1163
1163
What exactly did you try in awk? I would think something likeawk -vRS= '/empType: A/ && /ADID: [0-9]+/ {n++} END {print n}' file
should work
– steeldriver
Dec 14 '17 at 3:39
running your command, I got "awk: record `smapsHistory: [NDSEn...' too long record number 213244" there are only like 100 records with an employeeType of C, and it's going crazy....
– King of NES
Dec 14 '17 at 3:54
You did include the correct filename to read as input?
– bu5hman
Dec 14 '17 at 4:15
it was the correct file...
– King of NES
Dec 14 '17 at 4:18
add a comment |
What exactly did you try in awk? I would think something likeawk -vRS= '/empType: A/ && /ADID: [0-9]+/ {n++} END {print n}' file
should work
– steeldriver
Dec 14 '17 at 3:39
running your command, I got "awk: record `smapsHistory: [NDSEn...' too long record number 213244" there are only like 100 records with an employeeType of C, and it's going crazy....
– King of NES
Dec 14 '17 at 3:54
You did include the correct filename to read as input?
– bu5hman
Dec 14 '17 at 4:15
it was the correct file...
– King of NES
Dec 14 '17 at 4:18
What exactly did you try in awk? I would think something like
awk -vRS= '/empType: A/ && /ADID: [0-9]+/ {n++} END {print n}' file
should work– steeldriver
Dec 14 '17 at 3:39
What exactly did you try in awk? I would think something like
awk -vRS= '/empType: A/ && /ADID: [0-9]+/ {n++} END {print n}' file
should work– steeldriver
Dec 14 '17 at 3:39
running your command, I got "awk: record `smapsHistory: [NDSEn...' too long record number 213244" there are only like 100 records with an employeeType of C, and it's going crazy....
– King of NES
Dec 14 '17 at 3:54
running your command, I got "awk: record `smapsHistory: [NDSEn...' too long record number 213244" there are only like 100 records with an employeeType of C, and it's going crazy....
– King of NES
Dec 14 '17 at 3:54
You did include the correct filename to read as input?
– bu5hman
Dec 14 '17 at 4:15
You did include the correct filename to read as input?
– bu5hman
Dec 14 '17 at 4:15
it was the correct file...
– King of NES
Dec 14 '17 at 4:18
it was the correct file...
– King of NES
Dec 14 '17 at 4:18
add a comment |
6 Answers
6
active
oldest
votes
Awk
solution:
awk '/empType: /{ f=($2=="A"? 1:0) }f && /ADID: [0-9]+/{ c++ }END{ print c }' file
f
- flag indicatingempType: A
section processing
c
- count ofempType: A
entries with filledADID
key
The output:
2
add a comment |
Here is an alternative awk solution that uses blank line ""
as record separator RS
and new line n
as field separator FS
BEGIN {RS=""; FS="n"}
{
split($4,a,": ")
split($5,b,": ")
}
a[2]=="A" && b[2]!="" {c++}
END {print c}
the script can be executed with
awk -f main.awk file
add a comment |
Simple two grep
method, where data is the input file:
grep -A1 'empType: A' data | grep -c 'ADID: .+'
Output:
2
add a comment |
I like the idea of get the records that satisfy your requirements (better for e.g. testing) and counting them with wc -l
. So here is an awk
script that does just that:
#!/usr/bin/env awk
# getids.awk
BEGIN{
RS="";
FS="n"
}
/ADID: [0-9]/ && /empType: A/{print $1}
And here it is in action:
user@host:~$ awk -f getids.awk data.txt
entry-id: 1
entry-id: 3
user@host:~$ awk -f getids.awk data.txt | wc -l
2
Of course if you just want the count we can do that too:
#!/usr/bin/env awk
# count.awk
BEGIN {
RS="";
FS="n";
count=0;
}
/ADID: [0-9]/ && /empType: A/{count++}
END {
print count
}
And because I love Python, here is a Python script that does the same thing:
#!/usr/bin/env python2
# -*- coding: ascii -*-
"""getids.py"""
import sys
# Create a list to store the matched records
records =
# Iterate over the lines of the input file
with open(sys.argv[1]) as data:
for line in data:
# When an "entry-id" is reached, create a new record
if line.startswith('entry-id'):
entry_id = line.split(':')[1].strip()
records.append({'entry-id': entry_id})
# For other lines, update the current record
elif line.strip():
key = line.partition(':')[0].strip()
value = line.partition(':')[2].strip()
records[-1][key] = value
# Extract the list of records meeting the desired critera
matches = [record for record in records if record['empType'] == 'A' and record['ADID']]
# Print out the entry-ids for all of the matches
for match in matches:
print('entry-id: ' + match['entry-id'])
And here's the Python script in action:
user@host:~$ python getids.py data.txt
entry-id: 1
entry-id: 3
user@host:~$ python getids.py data.txt | wc -l
2
And if we really do just want the counts:
#!/usr/bin/env python2
# -*- coding: ascii -*-
"""count.py"""
import sys
# Keep a count of the number of matches
count = 0
# Use flags to keep track of the current record
emptype_flag = False
adid_flag = False
# Iterate over the lines of the input file
with open(sys.argv[1]) as data:
for line in data:
# When an "entry-id" is reached, reset the flags
if line.startswith('entry-id'):
emptype_flag = False
adid_flag = False
elif line.strip() == "empType: A":
emptype_flag = True
elif line.startswith("ADID") and line.strip().split(':')[1]:
adid_flag = True
# If both conditions hold the increment the counter
# and reset the flags
if emptype_flag and adid_flag:
count = count + 1
emptype_flag = False
adid_flag = False
# Print the number of matches
print(count)
And, while we're at it, how about a pure Bash script? Here's one:
#!/usr/bin/env bash
# getids.bash
while read line; do
if [[ "${line}" =~ "entry-id:" ]]; then
entry_id="${line}"
emptype=false
adid=false
elif [[ "${line}" =~ "empType: A" ]]; then
emptype=true
elif [[ "${line}" =~ ADID: [0-9] ]]; then
adid=true
fi
if [[ "${emptype}" == true && "${adid}" == true ]]; then
echo "${entry_id}"
emptype=false
adid=false
fi
done < "$1"
And running the bash
script:
user@host:~$ bash getids.bash data.txt
entry-id: 1
entry-id: 3
And finally, here's something using just grep
and wc
:
user@host:~$ cat data.txt | grep -A1 'empType: A' | grep "ADID: S" | wc -l
2
add a comment |
With perl
, that could be:
perl -l -00ne '
my %f = /(.*?):s*(.*)/g;
++$n if $f{empType} eq "A" && $f{ADID} ne "";
END {print 0+$n}' < file
-n
causes the code given to-e
to be applied to each input record
-00
for records to be paragraphs.- We build a
%f
associative array where key and values are mapped to each(key):spaces(value)
in the record. - and increment
$n
where the conditions are met. - we print
$n
in theEND
(adding0
to make sure we get0
and not an empty string if there's no match).
add a comment |
I wasn't able to do anything with the -A on a grep, and the other answers returned label too long or some other error.
What I did find that worked was
perl -000 -ne 'print if/empType: A/' file.ldif|grep -i -c "^ADID: [0-9A-Za-z]"
Now I didn't know what perl -000 does, but i think it's saying search multiple lines within a paragraph,
-n while loop
e one line of program??
print paragraph if you find empType: A
now pipe those matched paragraphs to |
grep -i -c "^ADID:" find ignore cased and count number of ADIDs.
I'm not sure if the other commands failed because of my Linux version, but the above command worked pretty well, not sure how to make the empType an ignored case though....
add a comment |
Your Answer
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "106"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f410792%2fgetting-a-match-count-of-objects-in-a-file%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
6 Answers
6
active
oldest
votes
6 Answers
6
active
oldest
votes
active
oldest
votes
active
oldest
votes
Awk
solution:
awk '/empType: /{ f=($2=="A"? 1:0) }f && /ADID: [0-9]+/{ c++ }END{ print c }' file
f
- flag indicatingempType: A
section processing
c
- count ofempType: A
entries with filledADID
key
The output:
2
add a comment |
Awk
solution:
awk '/empType: /{ f=($2=="A"? 1:0) }f && /ADID: [0-9]+/{ c++ }END{ print c }' file
f
- flag indicatingempType: A
section processing
c
- count ofempType: A
entries with filledADID
key
The output:
2
add a comment |
Awk
solution:
awk '/empType: /{ f=($2=="A"? 1:0) }f && /ADID: [0-9]+/{ c++ }END{ print c }' file
f
- flag indicatingempType: A
section processing
c
- count ofempType: A
entries with filledADID
key
The output:
2
Awk
solution:
awk '/empType: /{ f=($2=="A"? 1:0) }f && /ADID: [0-9]+/{ c++ }END{ print c }' file
f
- flag indicatingempType: A
section processing
c
- count ofempType: A
entries with filledADID
key
The output:
2
edited Dec 14 '17 at 6:12
answered Dec 14 '17 at 6:00
RomanPerekhrestRomanPerekhrest
23.1k12447
23.1k12447
add a comment |
add a comment |
Here is an alternative awk solution that uses blank line ""
as record separator RS
and new line n
as field separator FS
BEGIN {RS=""; FS="n"}
{
split($4,a,": ")
split($5,b,": ")
}
a[2]=="A" && b[2]!="" {c++}
END {print c}
the script can be executed with
awk -f main.awk file
add a comment |
Here is an alternative awk solution that uses blank line ""
as record separator RS
and new line n
as field separator FS
BEGIN {RS=""; FS="n"}
{
split($4,a,": ")
split($5,b,": ")
}
a[2]=="A" && b[2]!="" {c++}
END {print c}
the script can be executed with
awk -f main.awk file
add a comment |
Here is an alternative awk solution that uses blank line ""
as record separator RS
and new line n
as field separator FS
BEGIN {RS=""; FS="n"}
{
split($4,a,": ")
split($5,b,": ")
}
a[2]=="A" && b[2]!="" {c++}
END {print c}
the script can be executed with
awk -f main.awk file
Here is an alternative awk solution that uses blank line ""
as record separator RS
and new line n
as field separator FS
BEGIN {RS=""; FS="n"}
{
split($4,a,": ")
split($5,b,": ")
}
a[2]=="A" && b[2]!="" {c++}
END {print c}
the script can be executed with
awk -f main.awk file
answered Dec 14 '17 at 6:39
etopylightetopylight
383127
383127
add a comment |
add a comment |
Simple two grep
method, where data is the input file:
grep -A1 'empType: A' data | grep -c 'ADID: .+'
Output:
2
add a comment |
Simple two grep
method, where data is the input file:
grep -A1 'empType: A' data | grep -c 'ADID: .+'
Output:
2
add a comment |
Simple two grep
method, where data is the input file:
grep -A1 'empType: A' data | grep -c 'ADID: .+'
Output:
2
Simple two grep
method, where data is the input file:
grep -A1 'empType: A' data | grep -c 'ADID: .+'
Output:
2
edited Dec 14 '17 at 7:15
answered Dec 14 '17 at 7:09
agcagc
4,71111137
4,71111137
add a comment |
add a comment |
I like the idea of get the records that satisfy your requirements (better for e.g. testing) and counting them with wc -l
. So here is an awk
script that does just that:
#!/usr/bin/env awk
# getids.awk
BEGIN{
RS="";
FS="n"
}
/ADID: [0-9]/ && /empType: A/{print $1}
And here it is in action:
user@host:~$ awk -f getids.awk data.txt
entry-id: 1
entry-id: 3
user@host:~$ awk -f getids.awk data.txt | wc -l
2
Of course if you just want the count we can do that too:
#!/usr/bin/env awk
# count.awk
BEGIN {
RS="";
FS="n";
count=0;
}
/ADID: [0-9]/ && /empType: A/{count++}
END {
print count
}
And because I love Python, here is a Python script that does the same thing:
#!/usr/bin/env python2
# -*- coding: ascii -*-
"""getids.py"""
import sys
# Create a list to store the matched records
records =
# Iterate over the lines of the input file
with open(sys.argv[1]) as data:
for line in data:
# When an "entry-id" is reached, create a new record
if line.startswith('entry-id'):
entry_id = line.split(':')[1].strip()
records.append({'entry-id': entry_id})
# For other lines, update the current record
elif line.strip():
key = line.partition(':')[0].strip()
value = line.partition(':')[2].strip()
records[-1][key] = value
# Extract the list of records meeting the desired critera
matches = [record for record in records if record['empType'] == 'A' and record['ADID']]
# Print out the entry-ids for all of the matches
for match in matches:
print('entry-id: ' + match['entry-id'])
And here's the Python script in action:
user@host:~$ python getids.py data.txt
entry-id: 1
entry-id: 3
user@host:~$ python getids.py data.txt | wc -l
2
And if we really do just want the counts:
#!/usr/bin/env python2
# -*- coding: ascii -*-
"""count.py"""
import sys
# Keep a count of the number of matches
count = 0
# Use flags to keep track of the current record
emptype_flag = False
adid_flag = False
# Iterate over the lines of the input file
with open(sys.argv[1]) as data:
for line in data:
# When an "entry-id" is reached, reset the flags
if line.startswith('entry-id'):
emptype_flag = False
adid_flag = False
elif line.strip() == "empType: A":
emptype_flag = True
elif line.startswith("ADID") and line.strip().split(':')[1]:
adid_flag = True
# If both conditions hold the increment the counter
# and reset the flags
if emptype_flag and adid_flag:
count = count + 1
emptype_flag = False
adid_flag = False
# Print the number of matches
print(count)
And, while we're at it, how about a pure Bash script? Here's one:
#!/usr/bin/env bash
# getids.bash
while read line; do
if [[ "${line}" =~ "entry-id:" ]]; then
entry_id="${line}"
emptype=false
adid=false
elif [[ "${line}" =~ "empType: A" ]]; then
emptype=true
elif [[ "${line}" =~ ADID: [0-9] ]]; then
adid=true
fi
if [[ "${emptype}" == true && "${adid}" == true ]]; then
echo "${entry_id}"
emptype=false
adid=false
fi
done < "$1"
And running the bash
script:
user@host:~$ bash getids.bash data.txt
entry-id: 1
entry-id: 3
And finally, here's something using just grep
and wc
:
user@host:~$ cat data.txt | grep -A1 'empType: A' | grep "ADID: S" | wc -l
2
add a comment |
I like the idea of get the records that satisfy your requirements (better for e.g. testing) and counting them with wc -l
. So here is an awk
script that does just that:
#!/usr/bin/env awk
# getids.awk
BEGIN{
RS="";
FS="n"
}
/ADID: [0-9]/ && /empType: A/{print $1}
And here it is in action:
user@host:~$ awk -f getids.awk data.txt
entry-id: 1
entry-id: 3
user@host:~$ awk -f getids.awk data.txt | wc -l
2
Of course if you just want the count we can do that too:
#!/usr/bin/env awk
# count.awk
BEGIN {
RS="";
FS="n";
count=0;
}
/ADID: [0-9]/ && /empType: A/{count++}
END {
print count
}
And because I love Python, here is a Python script that does the same thing:
#!/usr/bin/env python2
# -*- coding: ascii -*-
"""getids.py"""
import sys
# Create a list to store the matched records
records =
# Iterate over the lines of the input file
with open(sys.argv[1]) as data:
for line in data:
# When an "entry-id" is reached, create a new record
if line.startswith('entry-id'):
entry_id = line.split(':')[1].strip()
records.append({'entry-id': entry_id})
# For other lines, update the current record
elif line.strip():
key = line.partition(':')[0].strip()
value = line.partition(':')[2].strip()
records[-1][key] = value
# Extract the list of records meeting the desired critera
matches = [record for record in records if record['empType'] == 'A' and record['ADID']]
# Print out the entry-ids for all of the matches
for match in matches:
print('entry-id: ' + match['entry-id'])
And here's the Python script in action:
user@host:~$ python getids.py data.txt
entry-id: 1
entry-id: 3
user@host:~$ python getids.py data.txt | wc -l
2
And if we really do just want the counts:
#!/usr/bin/env python2
# -*- coding: ascii -*-
"""count.py"""
import sys
# Keep a count of the number of matches
count = 0
# Use flags to keep track of the current record
emptype_flag = False
adid_flag = False
# Iterate over the lines of the input file
with open(sys.argv[1]) as data:
for line in data:
# When an "entry-id" is reached, reset the flags
if line.startswith('entry-id'):
emptype_flag = False
adid_flag = False
elif line.strip() == "empType: A":
emptype_flag = True
elif line.startswith("ADID") and line.strip().split(':')[1]:
adid_flag = True
# If both conditions hold the increment the counter
# and reset the flags
if emptype_flag and adid_flag:
count = count + 1
emptype_flag = False
adid_flag = False
# Print the number of matches
print(count)
And, while we're at it, how about a pure Bash script? Here's one:
#!/usr/bin/env bash
# getids.bash
while read line; do
if [[ "${line}" =~ "entry-id:" ]]; then
entry_id="${line}"
emptype=false
adid=false
elif [[ "${line}" =~ "empType: A" ]]; then
emptype=true
elif [[ "${line}" =~ ADID: [0-9] ]]; then
adid=true
fi
if [[ "${emptype}" == true && "${adid}" == true ]]; then
echo "${entry_id}"
emptype=false
adid=false
fi
done < "$1"
And running the bash
script:
user@host:~$ bash getids.bash data.txt
entry-id: 1
entry-id: 3
And finally, here's something using just grep
and wc
:
user@host:~$ cat data.txt | grep -A1 'empType: A' | grep "ADID: S" | wc -l
2
add a comment |
I like the idea of get the records that satisfy your requirements (better for e.g. testing) and counting them with wc -l
. So here is an awk
script that does just that:
#!/usr/bin/env awk
# getids.awk
BEGIN{
RS="";
FS="n"
}
/ADID: [0-9]/ && /empType: A/{print $1}
And here it is in action:
user@host:~$ awk -f getids.awk data.txt
entry-id: 1
entry-id: 3
user@host:~$ awk -f getids.awk data.txt | wc -l
2
Of course if you just want the count we can do that too:
#!/usr/bin/env awk
# count.awk
BEGIN {
RS="";
FS="n";
count=0;
}
/ADID: [0-9]/ && /empType: A/{count++}
END {
print count
}
And because I love Python, here is a Python script that does the same thing:
#!/usr/bin/env python2
# -*- coding: ascii -*-
"""getids.py"""
import sys
# Create a list to store the matched records
records =
# Iterate over the lines of the input file
with open(sys.argv[1]) as data:
for line in data:
# When an "entry-id" is reached, create a new record
if line.startswith('entry-id'):
entry_id = line.split(':')[1].strip()
records.append({'entry-id': entry_id})
# For other lines, update the current record
elif line.strip():
key = line.partition(':')[0].strip()
value = line.partition(':')[2].strip()
records[-1][key] = value
# Extract the list of records meeting the desired critera
matches = [record for record in records if record['empType'] == 'A' and record['ADID']]
# Print out the entry-ids for all of the matches
for match in matches:
print('entry-id: ' + match['entry-id'])
And here's the Python script in action:
user@host:~$ python getids.py data.txt
entry-id: 1
entry-id: 3
user@host:~$ python getids.py data.txt | wc -l
2
And if we really do just want the counts:
#!/usr/bin/env python2
# -*- coding: ascii -*-
"""count.py"""
import sys
# Keep a count of the number of matches
count = 0
# Use flags to keep track of the current record
emptype_flag = False
adid_flag = False
# Iterate over the lines of the input file
with open(sys.argv[1]) as data:
for line in data:
# When an "entry-id" is reached, reset the flags
if line.startswith('entry-id'):
emptype_flag = False
adid_flag = False
elif line.strip() == "empType: A":
emptype_flag = True
elif line.startswith("ADID") and line.strip().split(':')[1]:
adid_flag = True
# If both conditions hold the increment the counter
# and reset the flags
if emptype_flag and adid_flag:
count = count + 1
emptype_flag = False
adid_flag = False
# Print the number of matches
print(count)
And, while we're at it, how about a pure Bash script? Here's one:
#!/usr/bin/env bash
# getids.bash
while read line; do
if [[ "${line}" =~ "entry-id:" ]]; then
entry_id="${line}"
emptype=false
adid=false
elif [[ "${line}" =~ "empType: A" ]]; then
emptype=true
elif [[ "${line}" =~ ADID: [0-9] ]]; then
adid=true
fi
if [[ "${emptype}" == true && "${adid}" == true ]]; then
echo "${entry_id}"
emptype=false
adid=false
fi
done < "$1"
And running the bash
script:
user@host:~$ bash getids.bash data.txt
entry-id: 1
entry-id: 3
And finally, here's something using just grep
and wc
:
user@host:~$ cat data.txt | grep -A1 'empType: A' | grep "ADID: S" | wc -l
2
I like the idea of get the records that satisfy your requirements (better for e.g. testing) and counting them with wc -l
. So here is an awk
script that does just that:
#!/usr/bin/env awk
# getids.awk
BEGIN{
RS="";
FS="n"
}
/ADID: [0-9]/ && /empType: A/{print $1}
And here it is in action:
user@host:~$ awk -f getids.awk data.txt
entry-id: 1
entry-id: 3
user@host:~$ awk -f getids.awk data.txt | wc -l
2
Of course if you just want the count we can do that too:
#!/usr/bin/env awk
# count.awk
BEGIN {
RS="";
FS="n";
count=0;
}
/ADID: [0-9]/ && /empType: A/{count++}
END {
print count
}
And because I love Python, here is a Python script that does the same thing:
#!/usr/bin/env python2
# -*- coding: ascii -*-
"""getids.py"""
import sys
# Create a list to store the matched records
records =
# Iterate over the lines of the input file
with open(sys.argv[1]) as data:
for line in data:
# When an "entry-id" is reached, create a new record
if line.startswith('entry-id'):
entry_id = line.split(':')[1].strip()
records.append({'entry-id': entry_id})
# For other lines, update the current record
elif line.strip():
key = line.partition(':')[0].strip()
value = line.partition(':')[2].strip()
records[-1][key] = value
# Extract the list of records meeting the desired critera
matches = [record for record in records if record['empType'] == 'A' and record['ADID']]
# Print out the entry-ids for all of the matches
for match in matches:
print('entry-id: ' + match['entry-id'])
And here's the Python script in action:
user@host:~$ python getids.py data.txt
entry-id: 1
entry-id: 3
user@host:~$ python getids.py data.txt | wc -l
2
And if we really do just want the counts:
#!/usr/bin/env python2
# -*- coding: ascii -*-
"""count.py"""
import sys
# Keep a count of the number of matches
count = 0
# Use flags to keep track of the current record
emptype_flag = False
adid_flag = False
# Iterate over the lines of the input file
with open(sys.argv[1]) as data:
for line in data:
# When an "entry-id" is reached, reset the flags
if line.startswith('entry-id'):
emptype_flag = False
adid_flag = False
elif line.strip() == "empType: A":
emptype_flag = True
elif line.startswith("ADID") and line.strip().split(':')[1]:
adid_flag = True
# If both conditions hold the increment the counter
# and reset the flags
if emptype_flag and adid_flag:
count = count + 1
emptype_flag = False
adid_flag = False
# Print the number of matches
print(count)
And, while we're at it, how about a pure Bash script? Here's one:
#!/usr/bin/env bash
# getids.bash
while read line; do
if [[ "${line}" =~ "entry-id:" ]]; then
entry_id="${line}"
emptype=false
adid=false
elif [[ "${line}" =~ "empType: A" ]]; then
emptype=true
elif [[ "${line}" =~ ADID: [0-9] ]]; then
adid=true
fi
if [[ "${emptype}" == true && "${adid}" == true ]]; then
echo "${entry_id}"
emptype=false
adid=false
fi
done < "$1"
And running the bash
script:
user@host:~$ bash getids.bash data.txt
entry-id: 1
entry-id: 3
And finally, here's something using just grep
and wc
:
user@host:~$ cat data.txt | grep -A1 'empType: A' | grep "ADID: S" | wc -l
2
edited Dec 14 '17 at 13:36
answered Dec 14 '17 at 5:39
igaligal
5,2211233
5,2211233
add a comment |
add a comment |
With perl
, that could be:
perl -l -00ne '
my %f = /(.*?):s*(.*)/g;
++$n if $f{empType} eq "A" && $f{ADID} ne "";
END {print 0+$n}' < file
-n
causes the code given to-e
to be applied to each input record
-00
for records to be paragraphs.- We build a
%f
associative array where key and values are mapped to each(key):spaces(value)
in the record. - and increment
$n
where the conditions are met. - we print
$n
in theEND
(adding0
to make sure we get0
and not an empty string if there's no match).
add a comment |
With perl
, that could be:
perl -l -00ne '
my %f = /(.*?):s*(.*)/g;
++$n if $f{empType} eq "A" && $f{ADID} ne "";
END {print 0+$n}' < file
-n
causes the code given to-e
to be applied to each input record
-00
for records to be paragraphs.- We build a
%f
associative array where key and values are mapped to each(key):spaces(value)
in the record. - and increment
$n
where the conditions are met. - we print
$n
in theEND
(adding0
to make sure we get0
and not an empty string if there's no match).
add a comment |
With perl
, that could be:
perl -l -00ne '
my %f = /(.*?):s*(.*)/g;
++$n if $f{empType} eq "A" && $f{ADID} ne "";
END {print 0+$n}' < file
-n
causes the code given to-e
to be applied to each input record
-00
for records to be paragraphs.- We build a
%f
associative array where key and values are mapped to each(key):spaces(value)
in the record. - and increment
$n
where the conditions are met. - we print
$n
in theEND
(adding0
to make sure we get0
and not an empty string if there's no match).
With perl
, that could be:
perl -l -00ne '
my %f = /(.*?):s*(.*)/g;
++$n if $f{empType} eq "A" && $f{ADID} ne "";
END {print 0+$n}' < file
-n
causes the code given to-e
to be applied to each input record
-00
for records to be paragraphs.- We build a
%f
associative array where key and values are mapped to each(key):spaces(value)
in the record. - and increment
$n
where the conditions are met. - we print
$n
in theEND
(adding0
to make sure we get0
and not an empty string if there's no match).
edited Dec 14 '17 at 14:57
answered Dec 14 '17 at 14:14
Stéphane ChazelasStéphane Chazelas
306k57577932
306k57577932
add a comment |
add a comment |
I wasn't able to do anything with the -A on a grep, and the other answers returned label too long or some other error.
What I did find that worked was
perl -000 -ne 'print if/empType: A/' file.ldif|grep -i -c "^ADID: [0-9A-Za-z]"
Now I didn't know what perl -000 does, but i think it's saying search multiple lines within a paragraph,
-n while loop
e one line of program??
print paragraph if you find empType: A
now pipe those matched paragraphs to |
grep -i -c "^ADID:" find ignore cased and count number of ADIDs.
I'm not sure if the other commands failed because of my Linux version, but the above command worked pretty well, not sure how to make the empType an ignored case though....
add a comment |
I wasn't able to do anything with the -A on a grep, and the other answers returned label too long or some other error.
What I did find that worked was
perl -000 -ne 'print if/empType: A/' file.ldif|grep -i -c "^ADID: [0-9A-Za-z]"
Now I didn't know what perl -000 does, but i think it's saying search multiple lines within a paragraph,
-n while loop
e one line of program??
print paragraph if you find empType: A
now pipe those matched paragraphs to |
grep -i -c "^ADID:" find ignore cased and count number of ADIDs.
I'm not sure if the other commands failed because of my Linux version, but the above command worked pretty well, not sure how to make the empType an ignored case though....
add a comment |
I wasn't able to do anything with the -A on a grep, and the other answers returned label too long or some other error.
What I did find that worked was
perl -000 -ne 'print if/empType: A/' file.ldif|grep -i -c "^ADID: [0-9A-Za-z]"
Now I didn't know what perl -000 does, but i think it's saying search multiple lines within a paragraph,
-n while loop
e one line of program??
print paragraph if you find empType: A
now pipe those matched paragraphs to |
grep -i -c "^ADID:" find ignore cased and count number of ADIDs.
I'm not sure if the other commands failed because of my Linux version, but the above command worked pretty well, not sure how to make the empType an ignored case though....
I wasn't able to do anything with the -A on a grep, and the other answers returned label too long or some other error.
What I did find that worked was
perl -000 -ne 'print if/empType: A/' file.ldif|grep -i -c "^ADID: [0-9A-Za-z]"
Now I didn't know what perl -000 does, but i think it's saying search multiple lines within a paragraph,
-n while loop
e one line of program??
print paragraph if you find empType: A
now pipe those matched paragraphs to |
grep -i -c "^ADID:" find ignore cased and count number of ADIDs.
I'm not sure if the other commands failed because of my Linux version, but the above command worked pretty well, not sure how to make the empType an ignored case though....
answered Dec 14 '17 at 16:13
King of NESKing of NES
1163
1163
add a comment |
add a comment |
Thanks for contributing an answer to Unix & Linux Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f410792%2fgetting-a-match-count-of-objects-in-a-file%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
What exactly did you try in awk? I would think something like
awk -vRS= '/empType: A/ && /ADID: [0-9]+/ {n++} END {print n}' file
should work– steeldriver
Dec 14 '17 at 3:39
running your command, I got "awk: record `smapsHistory: [NDSEn...' too long record number 213244" there are only like 100 records with an employeeType of C, and it's going crazy....
– King of NES
Dec 14 '17 at 3:54
You did include the correct filename to read as input?
– bu5hman
Dec 14 '17 at 4:15
it was the correct file...
– King of NES
Dec 14 '17 at 4:18