Awk: What wrong with CJK characters? #Korean -

given .txt files space-separated words such as:

but esope holly bastard 생 지 옥 이 군 지 옥 이 지 옥 지 我 是 你 的 爸 爸 ！ 爸 爸 ！ ！ ！ 你 不 會 的 ！

and the awk function :

cat /pathway/to/your/file.txt | tr ' ' '\n' | sort | uniq -c | awk '{print $2" "$1}'

i following output in console invalid korean words (valid english , chinese space-separated words)

생 16 bastard 1 2 esope 1 holly 1 2 1 2 不 1 你 2 我 1 是 1 會 1 爸 4 的 2

how works korean words ? note: have 300.000 lines , near 2 millions words.

edit: used answer:

$ awk '{a[$1]++}end{for(k in a)print a[k],k}' rs=" |\n" myfile.txt | sort > myfileout.txt

a single awk script can handle , far more efficient current pipeline:

$ awk '{a[$1]++}end{for(k in a)print k,a[k]}' rs=" |\n" file  옥 3 bastard 1 ！ 5 爸 4 군 1 지 4 2 會 1 你 2 1 是 1 不 1 이 2 esope 1 的 2 holly 1 2 생 1 我 1 2

if want store results file can use redirection like:

$ awk '{a[$1]++}end{for(k in a)print k,a[k]}' rs=" |\n" file > outfile

Live one