Minimum 3 years of experience as Hadoop admin in a large (multi-Petabyte) production environment
- Experience supporting data science teams, super users and analytics teams on complex code deployment, debugging and performance optimization problems
- Experience with Hadoop, HDFS, Hive, Spark, R, Python, Java and UNIX
- System monitoring and controls
- Experience managing CPU, memory, storage resources for a large Hadoop cluster with 100s of users
Responsibilities:
- Maintain the 160 node production Hadoop cluster along with smaller stage cluster
- Install software on Hadoop cluster to support data science and analytics users. This includes but not lmited to R, Python packages, Spark, H2O and other Big Data Open source components/services
- Maintain user accounts/system security and cluster resource management to optimize user query execution time and data loading time
- Help with migration of Hadoop to cloud based distribution and help in deciding and implementing the new configuration
- Help with data migration and issue resolution while migrating the existing Hadoop/Java applications from on-premise cluster to cloud distribution