r - Sort a dataframe column by the frequency of occurrence -
i have dataframe in called df, there 3 column lets say,
region id salary 1 a1 100 1 a2 1001 1 a3 2000 1 a4 2431 1 a5 1001 .............. .............. 2 a6 1002 2 a7 1002 2 a8 1002 3 a9 3001 3 a10 3001 3 a11 4001
now want sort column salary occurrence of them region, using frequency table or something, probability of occurrence per region , sort them. please assume dataset large enough (1000 rows)
p.s: can suggest method some. please use column name in answers since real table has column in middle
thanks in advance
**edit 1**
i think not clear enough, replied, sincerely apologise not being clear:
with current dataset need create frequency table say:
region salary(bin) count 1 1k 6 1 5k 3 1 2k 2 1 15k 2 1 0.5k 2 1 24k 1 1 0k 0
using can classify add new columns in our data frame df called bin(bucket histogram)
region id salary (bin) count 1 a1 100 1k 6 1 a2 1001 2k 2 1 a3 2000 2k 2 1 a4 2431 5k 3
..........................so on...............
we can above using:
df$bin <- cut(df$salary, breaks=hist(df$salary)$breaks)
after sorting region , count , salary get:
region id salary (bin) count 1 a1 100 1k 6 1 a4 2431 5k 3 1 a3 2000 2k 2 1 a2 1001 2k 2
as can see, need create frequency table each region , sort. did above using tableau want automate in r
hope clear
one possible approach use data.table
add freq
column, sort data accordingly:
library(data.table) setdt(df)[,freq := .n, = c("region","salary")] # sort df[order(freq, decreasing = t),] # oneliner (thx @jaap) setdt(df)[, freq := .n, = .(region,salary)][order(-freq)]
Comments
Post a Comment