Python 3 pandas.groupby.filter
I am trying to perform a groupby filter that is very similar to the example in this documentation: pandas groupby filter
>>> df = pd.DataFrame({'A' : ['foo', 'bar', 'foo', 'bar',
... 'foo', 'bar'],
... 'B' : [1, 2, 3, 4, 5, 6],
... 'C' : [2.0, 5., 8., 1., 2., 9.]})
>>> grouped = df.groupby('A')
>>> grouped.filter(lambda x: x['B'].mean() > 3.)
A B C
1 bar 2 5.0
3 bar 4 1.0
5 bar 6 9.0
I am trying to return a DataFrame that has all 3 columns, but only 2 rows. Those 2 rows contain the minimum values of column B, after grouping by column A. I tried the following line of code:
grouped.filter(lambda x: x['B'] == x['B'].min())
but it doesn't work and I get this error:
TypeError: filter function returned a Series, but expected a scalar bool
The DataFrame I am trying to return should look like this:
A B C
0 foo 1 2.0
1 bar 2 5.0
I would appreciate any help can provide. Thank you, in advance, for your help.
python pandas dataframe
add a comment |
I am trying to perform a groupby filter that is very similar to the example in this documentation: pandas groupby filter
>>> df = pd.DataFrame({'A' : ['foo', 'bar', 'foo', 'bar',
... 'foo', 'bar'],
... 'B' : [1, 2, 3, 4, 5, 6],
... 'C' : [2.0, 5., 8., 1., 2., 9.]})
>>> grouped = df.groupby('A')
>>> grouped.filter(lambda x: x['B'].mean() > 3.)
A B C
1 bar 2 5.0
3 bar 4 1.0
5 bar 6 9.0
I am trying to return a DataFrame that has all 3 columns, but only 2 rows. Those 2 rows contain the minimum values of column B, after grouping by column A. I tried the following line of code:
grouped.filter(lambda x: x['B'] == x['B'].min())
but it doesn't work and I get this error:
TypeError: filter function returned a Series, but expected a scalar bool
The DataFrame I am trying to return should look like this:
A B C
0 foo 1 2.0
1 bar 2 5.0
I would appreciate any help can provide. Thank you, in advance, for your help.
python pandas dataframe
The doc string reading can seem a bit ambiguous: "Return a copy of a DataFrame excluding elements from groups that do not satisfy..." You aren't excluding elements from groups, you are excluding elements from the DataFrame of groups that do not satisfy the single condition.
– ALollz
2 hours ago
add a comment |
I am trying to perform a groupby filter that is very similar to the example in this documentation: pandas groupby filter
>>> df = pd.DataFrame({'A' : ['foo', 'bar', 'foo', 'bar',
... 'foo', 'bar'],
... 'B' : [1, 2, 3, 4, 5, 6],
... 'C' : [2.0, 5., 8., 1., 2., 9.]})
>>> grouped = df.groupby('A')
>>> grouped.filter(lambda x: x['B'].mean() > 3.)
A B C
1 bar 2 5.0
3 bar 4 1.0
5 bar 6 9.0
I am trying to return a DataFrame that has all 3 columns, but only 2 rows. Those 2 rows contain the minimum values of column B, after grouping by column A. I tried the following line of code:
grouped.filter(lambda x: x['B'] == x['B'].min())
but it doesn't work and I get this error:
TypeError: filter function returned a Series, but expected a scalar bool
The DataFrame I am trying to return should look like this:
A B C
0 foo 1 2.0
1 bar 2 5.0
I would appreciate any help can provide. Thank you, in advance, for your help.
python pandas dataframe
I am trying to perform a groupby filter that is very similar to the example in this documentation: pandas groupby filter
>>> df = pd.DataFrame({'A' : ['foo', 'bar', 'foo', 'bar',
... 'foo', 'bar'],
... 'B' : [1, 2, 3, 4, 5, 6],
... 'C' : [2.0, 5., 8., 1., 2., 9.]})
>>> grouped = df.groupby('A')
>>> grouped.filter(lambda x: x['B'].mean() > 3.)
A B C
1 bar 2 5.0
3 bar 4 1.0
5 bar 6 9.0
I am trying to return a DataFrame that has all 3 columns, but only 2 rows. Those 2 rows contain the minimum values of column B, after grouping by column A. I tried the following line of code:
grouped.filter(lambda x: x['B'] == x['B'].min())
but it doesn't work and I get this error:
TypeError: filter function returned a Series, but expected a scalar bool
The DataFrame I am trying to return should look like this:
A B C
0 foo 1 2.0
1 bar 2 5.0
I would appreciate any help can provide. Thank you, in advance, for your help.
python pandas dataframe
python pandas dataframe
edited 2 hours ago
ALollz
13.3k31636
13.3k31636
asked 2 hours ago
FinProgFinProg
333
333
The doc string reading can seem a bit ambiguous: "Return a copy of a DataFrame excluding elements from groups that do not satisfy..." You aren't excluding elements from groups, you are excluding elements from the DataFrame of groups that do not satisfy the single condition.
– ALollz
2 hours ago
add a comment |
The doc string reading can seem a bit ambiguous: "Return a copy of a DataFrame excluding elements from groups that do not satisfy..." You aren't excluding elements from groups, you are excluding elements from the DataFrame of groups that do not satisfy the single condition.
– ALollz
2 hours ago
The doc string reading can seem a bit ambiguous: "Return a copy of a DataFrame excluding elements from groups that do not satisfy..." You aren't excluding elements from groups, you are excluding elements from the DataFrame of groups that do not satisfy the single condition.
– ALollz
2 hours ago
The doc string reading can seem a bit ambiguous: "Return a copy of a DataFrame excluding elements from groups that do not satisfy..." You aren't excluding elements from groups, you are excluding elements from the DataFrame of groups that do not satisfy the single condition.
– ALollz
2 hours ago
add a comment |
5 Answers
5
active
oldest
votes
df.groupby('A').apply(lambda x: x.loc[x['B'].idxmin(), ['B','C']]).reset_index()
add a comment |
No need groupby
:-)
df.sort_values('B').drop_duplicates('A')
Out[288]:
A B C
0 foo 1 2.0
1 bar 2 5.0
add a comment |
There's a fundamental difference: In the documentation example, there is a single Boolean
value per group. That is, you return the entire group if the mean is greater than 3. In your example, you want to filter specific rows within a group.
For your task the usual trick is to sort values and use .head
or .tail
to filter to the row with the smallest or largest value respectively:
df.sort_values('B').groupby('A').head(1)
# A B C
#0 foo 1 2.0
#1 bar 2 5.0
For more complicated queries you can use .transform
or .apply
to create a Boolean Series
to slice. Also in this case safer if multiple rows share the minimum and you need all of them:
df[df.groupby('A').B.transform(lambda x: x == x.min())]
# A B C
#0 foo 1 2.0
#1 bar 2 5.0
add a comment |
The short answer:
grouped.apply(lambda x: x[x['B'] == x['B']].min())
... and the longer one:
Your grouped
object has 2 groups:
In[25]: for df in grouped:
...: print(df)
...:
('bar',
A B C
1 bar 2 5.0
3 bar 4 1.0
5 bar 6 9.0)
('foo',
A B C
0 foo 1 2.0
2 foo 3 8.0
4 foo 5 2.0)
filter()
method for GroupBy object is for filtering groups as entities, NOT for filtering their individual rows. So using the filter()
method, you may obtain only 4 results:
- an empty DataFrame (0 rows),
- rows of the group 'bar' (3 rows),
- rows of the group 'foo' (3 rows),
- rows of both groups (6 rows)
Nothing else, regardless of the used parameter (boolean function) in the filter()
method.
So you have to use some other method. An appropriate one is the very flexible apply()
method, which lets you apply an arbitrary function which
- takes a DataFrame (a group of GroupBy object) as its only parameter,
- returns either a Pandas object or a scalar.
In your case that function should return (for every of your 2 groups) the 1-row DataFrame having the minimal value in the column 'B'
, so we will use the Boolean mask
group['B'] == group['B'].min()
for selecting such a row (or - maybe - more rows):
In[26]: def select_min_b(group):
...: return group[group['B'] == group['B'].min()]
Now using this function as a parameter of the apply()
method of GroupBy object grouped
we will obtain
In[27]: grouped.apply(select_min_b)
Out[27]:
A B C
A
bar 1 bar 2 5.0
foo 0 foo 1 2.0
Note:
The same, but as only one command (using the lambda
function):
grouped.apply(lambda group: group[group['B'] == group['B']].min())
add a comment |
>>> df.loc[df.groupby('A')['B'].idxmin()]
A B C
1 bar 2 5.0
0 foo 1 2.0
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54717473%2fpython-3-pandas-groupby-filter%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
5 Answers
5
active
oldest
votes
5 Answers
5
active
oldest
votes
active
oldest
votes
active
oldest
votes
df.groupby('A').apply(lambda x: x.loc[x['B'].idxmin(), ['B','C']]).reset_index()
add a comment |
df.groupby('A').apply(lambda x: x.loc[x['B'].idxmin(), ['B','C']]).reset_index()
add a comment |
df.groupby('A').apply(lambda x: x.loc[x['B'].idxmin(), ['B','C']]).reset_index()
df.groupby('A').apply(lambda x: x.loc[x['B'].idxmin(), ['B','C']]).reset_index()
answered 2 hours ago
kudehkudeh
30519
30519
add a comment |
add a comment |
No need groupby
:-)
df.sort_values('B').drop_duplicates('A')
Out[288]:
A B C
0 foo 1 2.0
1 bar 2 5.0
add a comment |
No need groupby
:-)
df.sort_values('B').drop_duplicates('A')
Out[288]:
A B C
0 foo 1 2.0
1 bar 2 5.0
add a comment |
No need groupby
:-)
df.sort_values('B').drop_duplicates('A')
Out[288]:
A B C
0 foo 1 2.0
1 bar 2 5.0
No need groupby
:-)
df.sort_values('B').drop_duplicates('A')
Out[288]:
A B C
0 foo 1 2.0
1 bar 2 5.0
answered 1 hour ago
Wen-BenWen-Ben
110k83266
110k83266
add a comment |
add a comment |
There's a fundamental difference: In the documentation example, there is a single Boolean
value per group. That is, you return the entire group if the mean is greater than 3. In your example, you want to filter specific rows within a group.
For your task the usual trick is to sort values and use .head
or .tail
to filter to the row with the smallest or largest value respectively:
df.sort_values('B').groupby('A').head(1)
# A B C
#0 foo 1 2.0
#1 bar 2 5.0
For more complicated queries you can use .transform
or .apply
to create a Boolean Series
to slice. Also in this case safer if multiple rows share the minimum and you need all of them:
df[df.groupby('A').B.transform(lambda x: x == x.min())]
# A B C
#0 foo 1 2.0
#1 bar 2 5.0
add a comment |
There's a fundamental difference: In the documentation example, there is a single Boolean
value per group. That is, you return the entire group if the mean is greater than 3. In your example, you want to filter specific rows within a group.
For your task the usual trick is to sort values and use .head
or .tail
to filter to the row with the smallest or largest value respectively:
df.sort_values('B').groupby('A').head(1)
# A B C
#0 foo 1 2.0
#1 bar 2 5.0
For more complicated queries you can use .transform
or .apply
to create a Boolean Series
to slice. Also in this case safer if multiple rows share the minimum and you need all of them:
df[df.groupby('A').B.transform(lambda x: x == x.min())]
# A B C
#0 foo 1 2.0
#1 bar 2 5.0
add a comment |
There's a fundamental difference: In the documentation example, there is a single Boolean
value per group. That is, you return the entire group if the mean is greater than 3. In your example, you want to filter specific rows within a group.
For your task the usual trick is to sort values and use .head
or .tail
to filter to the row with the smallest or largest value respectively:
df.sort_values('B').groupby('A').head(1)
# A B C
#0 foo 1 2.0
#1 bar 2 5.0
For more complicated queries you can use .transform
or .apply
to create a Boolean Series
to slice. Also in this case safer if multiple rows share the minimum and you need all of them:
df[df.groupby('A').B.transform(lambda x: x == x.min())]
# A B C
#0 foo 1 2.0
#1 bar 2 5.0
There's a fundamental difference: In the documentation example, there is a single Boolean
value per group. That is, you return the entire group if the mean is greater than 3. In your example, you want to filter specific rows within a group.
For your task the usual trick is to sort values and use .head
or .tail
to filter to the row with the smallest or largest value respectively:
df.sort_values('B').groupby('A').head(1)
# A B C
#0 foo 1 2.0
#1 bar 2 5.0
For more complicated queries you can use .transform
or .apply
to create a Boolean Series
to slice. Also in this case safer if multiple rows share the minimum and you need all of them:
df[df.groupby('A').B.transform(lambda x: x == x.min())]
# A B C
#0 foo 1 2.0
#1 bar 2 5.0
edited 1 hour ago
answered 2 hours ago
ALollzALollz
13.3k31636
13.3k31636
add a comment |
add a comment |
The short answer:
grouped.apply(lambda x: x[x['B'] == x['B']].min())
... and the longer one:
Your grouped
object has 2 groups:
In[25]: for df in grouped:
...: print(df)
...:
('bar',
A B C
1 bar 2 5.0
3 bar 4 1.0
5 bar 6 9.0)
('foo',
A B C
0 foo 1 2.0
2 foo 3 8.0
4 foo 5 2.0)
filter()
method for GroupBy object is for filtering groups as entities, NOT for filtering their individual rows. So using the filter()
method, you may obtain only 4 results:
- an empty DataFrame (0 rows),
- rows of the group 'bar' (3 rows),
- rows of the group 'foo' (3 rows),
- rows of both groups (6 rows)
Nothing else, regardless of the used parameter (boolean function) in the filter()
method.
So you have to use some other method. An appropriate one is the very flexible apply()
method, which lets you apply an arbitrary function which
- takes a DataFrame (a group of GroupBy object) as its only parameter,
- returns either a Pandas object or a scalar.
In your case that function should return (for every of your 2 groups) the 1-row DataFrame having the minimal value in the column 'B'
, so we will use the Boolean mask
group['B'] == group['B'].min()
for selecting such a row (or - maybe - more rows):
In[26]: def select_min_b(group):
...: return group[group['B'] == group['B'].min()]
Now using this function as a parameter of the apply()
method of GroupBy object grouped
we will obtain
In[27]: grouped.apply(select_min_b)
Out[27]:
A B C
A
bar 1 bar 2 5.0
foo 0 foo 1 2.0
Note:
The same, but as only one command (using the lambda
function):
grouped.apply(lambda group: group[group['B'] == group['B']].min())
add a comment |
The short answer:
grouped.apply(lambda x: x[x['B'] == x['B']].min())
... and the longer one:
Your grouped
object has 2 groups:
In[25]: for df in grouped:
...: print(df)
...:
('bar',
A B C
1 bar 2 5.0
3 bar 4 1.0
5 bar 6 9.0)
('foo',
A B C
0 foo 1 2.0
2 foo 3 8.0
4 foo 5 2.0)
filter()
method for GroupBy object is for filtering groups as entities, NOT for filtering their individual rows. So using the filter()
method, you may obtain only 4 results:
- an empty DataFrame (0 rows),
- rows of the group 'bar' (3 rows),
- rows of the group 'foo' (3 rows),
- rows of both groups (6 rows)
Nothing else, regardless of the used parameter (boolean function) in the filter()
method.
So you have to use some other method. An appropriate one is the very flexible apply()
method, which lets you apply an arbitrary function which
- takes a DataFrame (a group of GroupBy object) as its only parameter,
- returns either a Pandas object or a scalar.
In your case that function should return (for every of your 2 groups) the 1-row DataFrame having the minimal value in the column 'B'
, so we will use the Boolean mask
group['B'] == group['B'].min()
for selecting such a row (or - maybe - more rows):
In[26]: def select_min_b(group):
...: return group[group['B'] == group['B'].min()]
Now using this function as a parameter of the apply()
method of GroupBy object grouped
we will obtain
In[27]: grouped.apply(select_min_b)
Out[27]:
A B C
A
bar 1 bar 2 5.0
foo 0 foo 1 2.0
Note:
The same, but as only one command (using the lambda
function):
grouped.apply(lambda group: group[group['B'] == group['B']].min())
add a comment |
The short answer:
grouped.apply(lambda x: x[x['B'] == x['B']].min())
... and the longer one:
Your grouped
object has 2 groups:
In[25]: for df in grouped:
...: print(df)
...:
('bar',
A B C
1 bar 2 5.0
3 bar 4 1.0
5 bar 6 9.0)
('foo',
A B C
0 foo 1 2.0
2 foo 3 8.0
4 foo 5 2.0)
filter()
method for GroupBy object is for filtering groups as entities, NOT for filtering their individual rows. So using the filter()
method, you may obtain only 4 results:
- an empty DataFrame (0 rows),
- rows of the group 'bar' (3 rows),
- rows of the group 'foo' (3 rows),
- rows of both groups (6 rows)
Nothing else, regardless of the used parameter (boolean function) in the filter()
method.
So you have to use some other method. An appropriate one is the very flexible apply()
method, which lets you apply an arbitrary function which
- takes a DataFrame (a group of GroupBy object) as its only parameter,
- returns either a Pandas object or a scalar.
In your case that function should return (for every of your 2 groups) the 1-row DataFrame having the minimal value in the column 'B'
, so we will use the Boolean mask
group['B'] == group['B'].min()
for selecting such a row (or - maybe - more rows):
In[26]: def select_min_b(group):
...: return group[group['B'] == group['B'].min()]
Now using this function as a parameter of the apply()
method of GroupBy object grouped
we will obtain
In[27]: grouped.apply(select_min_b)
Out[27]:
A B C
A
bar 1 bar 2 5.0
foo 0 foo 1 2.0
Note:
The same, but as only one command (using the lambda
function):
grouped.apply(lambda group: group[group['B'] == group['B']].min())
The short answer:
grouped.apply(lambda x: x[x['B'] == x['B']].min())
... and the longer one:
Your grouped
object has 2 groups:
In[25]: for df in grouped:
...: print(df)
...:
('bar',
A B C
1 bar 2 5.0
3 bar 4 1.0
5 bar 6 9.0)
('foo',
A B C
0 foo 1 2.0
2 foo 3 8.0
4 foo 5 2.0)
filter()
method for GroupBy object is for filtering groups as entities, NOT for filtering their individual rows. So using the filter()
method, you may obtain only 4 results:
- an empty DataFrame (0 rows),
- rows of the group 'bar' (3 rows),
- rows of the group 'foo' (3 rows),
- rows of both groups (6 rows)
Nothing else, regardless of the used parameter (boolean function) in the filter()
method.
So you have to use some other method. An appropriate one is the very flexible apply()
method, which lets you apply an arbitrary function which
- takes a DataFrame (a group of GroupBy object) as its only parameter,
- returns either a Pandas object or a scalar.
In your case that function should return (for every of your 2 groups) the 1-row DataFrame having the minimal value in the column 'B'
, so we will use the Boolean mask
group['B'] == group['B'].min()
for selecting such a row (or - maybe - more rows):
In[26]: def select_min_b(group):
...: return group[group['B'] == group['B'].min()]
Now using this function as a parameter of the apply()
method of GroupBy object grouped
we will obtain
In[27]: grouped.apply(select_min_b)
Out[27]:
A B C
A
bar 1 bar 2 5.0
foo 0 foo 1 2.0
Note:
The same, but as only one command (using the lambda
function):
grouped.apply(lambda group: group[group['B'] == group['B']].min())
edited 40 mins ago
answered 1 hour ago
MarianDMarianD
4,38761331
4,38761331
add a comment |
add a comment |
>>> df.loc[df.groupby('A')['B'].idxmin()]
A B C
1 bar 2 5.0
0 foo 1 2.0
add a comment |
>>> df.loc[df.groupby('A')['B'].idxmin()]
A B C
1 bar 2 5.0
0 foo 1 2.0
add a comment |
>>> df.loc[df.groupby('A')['B'].idxmin()]
A B C
1 bar 2 5.0
0 foo 1 2.0
>>> df.loc[df.groupby('A')['B'].idxmin()]
A B C
1 bar 2 5.0
0 foo 1 2.0
answered 14 mins ago
BallpointBenBallpointBen
3,5871438
3,5871438
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54717473%2fpython-3-pandas-groupby-filter%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
The doc string reading can seem a bit ambiguous: "Return a copy of a DataFrame excluding elements from groups that do not satisfy..." You aren't excluding elements from groups, you are excluding elements from the DataFrame of groups that do not satisfy the single condition.
– ALollz
2 hours ago