hive - Reading HDFS extended attributes in HiveQL -
i working on use case add metadata (e.g. load time, data source...) raw files hdfs extended attributes (xattrs). wondering if there way hiveql retrieve such metadata in queries in result set. avoid storing such metadata in each record within raw files. custom hive serde way make such xattrs available? otherwise, see way make possible?
i still relatively novice this, bear me if misused terms.
thanks
there may other ways implement it, after discovered hive virtual column 'input__file__name' containing url of source hdfs file, create user-defined function in java read extended attributes. function can used in hive query as:
xattrsimpleudf(input__file__name,'user.my_key')
the (quick , dirty) java source code of udf looks like:
public class xattrsimpleudf extends udf { public text evaluate(text uri, text attr) { if(uri == null || attr == null) return null; text xattrtxt = null; try { configuration myconf = new configuration(); //creating filesystem using uri uri myuri = uri.create(uri.tostring()); filesystem fs = filesystem.get(myuri, myconf); // retrieve value of extended attribute xattrtxt = new text(fs.getxattr(new path(myuri), attr.tostring())); } catch (ioexception e) { e.printstacktrace(); } catch (exception e) { e.printstacktrace(); } return xattrtxt; } }
i didn't test performance of when querying large data sets. wished extended attributes retrieved directly virtual column in way similar using virtual column input__file__name.
Comments
Post a Comment