hadoop - Replace multiple space with single space, Remove " and leading 0 using APACHE PIG -
need pig
a = load 'input.txt' (line:chararray); b = foreach generate flatten(tobag(*)); c = foreach b generate replace(($0, '\\s+', ' ')
need on last line replace multiple space single space, remove "
(quotes) , leading 00
using apache pig
note:- approach should not field specific there more 70 fields, basically, expecting replace or strstring or regex function can perfomr mentioned operations on line.
input.txt
00595, ab 000cdef california "state, 00usa 00733, 0ds ds "arizona 00state, usa
expected output
595, ab cdef califormia state, usa 733, ds ds arizona state, usa
you can use replace
function in pig cleaning , loading int remove leading zeros number.
a = load '/usr/pigfiles/pigo.txt' using pigstorage(',') (value: int, state: chararray, country: chararray); b = foreach generate value,replace(replace(state,' ', ' ' ),'\\"',''), country; dump b;
Comments
Post a Comment