sql server - best practice to load multiple client data into Hadoop -
we creating poc on hadoop framework cloudera cdh. want load data of multiple client hive tables.
as of now, have separate database each client on sql server. infrastructure remain same oltp. hadoop used olap. have primary dimension tables same each client. client database has exact same schema. these tables have same primary key value. till now, fine have separate database client. trying load multiple client data same data container (hive tables). have multiple row same primary key value if load data directly hive multiple sql server databases through sqoop job. thinking use surrogate key in hive tables hive not support auto increment can achieved udf.
we don't want modify sql server data it's running production data.
a. standard/generic way/solution load multiple client data hadoop ecosystem?
b. how primary key of sql server database table can mapped hadoop hive table ?
c. how can ensure 1 client never able see data of other client?
thanks
@praveen: use mappers overcome downtime each clients data hadoop servers, client data holds primary keys in case. use best use of partitions each client , respect date partition. have implement tde zone hdfs file location, before start sqoop import. *tde: trasparent data encryption zone, best practice secured zone client data.
Comments
Post a Comment