r - Sort a dataframe column by the frequency of occurrence -


i have dataframe in called df, there 3 column lets say,

region id  salary 1      a1  100 1      a2  1001 1      a3  2000 1      a4  2431 1      a5  1001 .............. .............. 2      a6  1002 2      a7  1002 2      a8  1002 3      a9  3001 3      a10 3001 3      a11 4001 

now want sort column salary occurrence of them region, using frequency table or something, probability of occurrence per region , sort them. please assume dataset large enough (1000 rows)

p.s: can suggest method some. please use column name in answers since real table has column in middle

thanks in advance

                  **edit 1** 

i think not clear enough, replied, sincerely apologise not being clear:

with current dataset need create frequency table say:

region  salary(bin)     count 1       1k              6                    1       5k              3                    1       2k              2                    1       15k             2                    1       0.5k            2                    1       24k             1                    1       0k              0                    

using can classify add new columns in our data frame df called bin(bucket histogram)

region     id  salary  (bin)   count     1      a1  100     1k      6     1      a2  1001    2k      2     1      a3  2000    2k      2     1      a4  2431    5k      3 

..........................so on...............

we can above using:

df$bin <- cut(df$salary, breaks=hist(df$salary)$breaks) 

after sorting region , count , salary get:

region     id  salary  (bin)   count     1      a1  100     1k      6     1      a4  2431    5k      3     1      a3  2000    2k      2     1      a2  1001    2k      2 

as can see, need create frequency table each region , sort. did above using tableau want automate in r

hope clear

one possible approach use data.table add freq column, sort data accordingly:

library(data.table) setdt(df)[,freq := .n, = c("region","salary")]  # sort df[order(freq, decreasing = t),]  # oneliner (thx @jaap) setdt(df)[, freq := .n, = .(region,salary)][order(-freq)] 

Comments

Popular posts from this blog

php - Wordpress website dashboard page or post editor content is not showing but front end data is showing properly -

How to get the ip address of VM and use it to configure SSH connection dynamically in Ansible -

javascript - Get parameter of GET request -