hive - Reading HDFS extended attributes in HiveQL -


i working on use case add metadata (e.g. load time, data source...) raw files hdfs extended attributes (xattrs). wondering if there way hiveql retrieve such metadata in queries in result set. avoid storing such metadata in each record within raw files. custom hive serde way make such xattrs available? otherwise, see way make possible?

i still relatively novice this, bear me if misused terms.

thanks

there may other ways implement it, after discovered hive virtual column 'input__file__name' containing url of source hdfs file, create user-defined function in java read extended attributes. function can used in hive query as:

xattrsimpleudf(input__file__name,'user.my_key') 

the (quick , dirty) java source code of udf looks like:

public class xattrsimpleudf extends udf {    public text evaluate(text uri, text attr) {     if(uri == null || attr == null) return null;      text xattrtxt = null;     try {         configuration myconf = new configuration();          //creating filesystem using uri         uri myuri = uri.create(uri.tostring());         filesystem fs = filesystem.get(myuri, myconf);          // retrieve value of extended attribute         xattrtxt = new text(fs.getxattr(new path(myuri), attr.tostring()));     } catch (ioexception e) {         e.printstacktrace();     } catch (exception e) {         e.printstacktrace();     }     return xattrtxt;   } } 

i didn't test performance of when querying large data sets. wished extended attributes retrieved directly virtual column in way similar using virtual column input__file__name.


Comments

Popular posts from this blog

php - Wordpress website dashboard page or post editor content is not showing but front end data is showing properly -

How to get the ip address of VM and use it to configure SSH connection dynamically in Ansible -

javascript - Get parameter of GET request -