Posted 12 months ago
- Build end-to-end ETL pipelines to enable training and operationalization of machine learning models.
- Build code for ingesting data from relational databases, NoSQL database, flat files, and message queues into big data solutions.
- Integrating the code, produced by data scientists, into data pipelines.
- Build code or configuration to push data from big data solutions into reporting tools and other software.
- Diagnose and mitigate performance issues.
- Communicate with the customer IT personnel to clarify technical details.
- Document the implementation.
- Assist with the installation and set up of big data solutions and DataRobot.
Citizenship and Clearance Requirements:
- 2+ years of production experience of building Mapreduce jobs, Spark scripts, Oozie workflows, or other Hadoop based applications.
- Good understanding of the distributed data processing.
- Experience of creating Spark scripts either in Python or Scala.
- Experience of diagnosing and mitigating performance issues in Spark scripts.
- Experience of setting up and querying Hive, Presto, or Impala databases.
- Experience of diagnosing and mitigating performance issues in Hive, Presto, or Impala queries.
- Good to have experience of creating streaming solutions and reporting tools.
- Must be US citizen.
- Must to have at least TS active clearance.
- Preferably to have TS SCI full scope active clearance.
Individuals seeking employment at DataRobot are considered without regards to race, color, religion, national origin, age, sex, marital status, ancestry, physical or mental disability, veteran status, gender identity, or sexual orientation.