Friday 15 March 2013

python 3.x - Querying with custom columns along with normal columns on Pandas DataFrame -



python 3.x - Querying with custom columns along with normal columns on Pandas DataFrame -

this illustration info frame ,

df

index,customer_mailid,event_quantity,amount_final,channel,week_name,venue_name,event_genre1 1,aa@hotmail.com,2,172,web,mon-to-thu,tivoli cinema: extreem,comedy 2,bb@gmail.com,2,234,web,mon-to-thu,cinemax: pacific mall subhash nagar,action 3,cc@yahoo.com,3,502,mobile,mon-to-thu,dt city centre: shalimar bagh,action 4,dr.d@gmail.com,4,1402,web,sunday,rajiv gandhi cricket stadium: hyderabad,sports 5,dd@hotmail.com,4,6449,web,saturday,subrata roy sahara stadium: gahunje,sports 6,deep.d@gmail.com2,1,82,mobile,mon-to-thu,tivoli cinema: hyderabad,action 7,r@yahoo.co.in,1,219,web,mon-to-thu,inox:jp nagar -central mantri junction,action 8,nnd@gmail.com,2,384,web,mon-to-thu,wave: city emporium mall,action 9,v90@gmail.com,4,1402,web,sunday,rajiv gandhi cricket stadium: hyderabad,sports

i want execute next kind of query on it..

select set of columns of info frame (or) columns of dataframe where, ((sum(amount) >=1000)) && (event_quantity <5)) , on.. adding n number of conditions mixed & , | condition. problem here facing there no such column called (sum(amount)) on original info frame.in such scenarios there generic solution available querying pandas info frame.

in example, info every customer_mailid used once. presume in real info there multiples create sum(amount_final) != amount_final. if presumption correct, 1 solution create column carry sum of amount_final , utilize in subset.

something this:

totalamount = pd.dataframe( df.groupby('customer_mailid')['amount_final'].sum()).reset_index() totalamount.columns = ['customer_mailid', 'total_amount_final'] df = df.merge(totalamount)

at point you'll have new column called total_amount_final can utilize in subset this:

df[(df.total_amount_final > 1000) & (df.event_quantity <5)]

python-3.x pandas

No comments:

Post a Comment