Description:
· Minimum 9+ years of software programming experience
· 7 + years of SQL scripting experience
· 2+ years of Anaconda Python development experience around data management.
· 5+ years of Unix cron experience
· Must have the ability to work in a dynamic, fast-paced environment
· Strong communication skills to interact with Agile team members
Data Sources: Cloud based as well as on premise example: SAP CRM, ECC, people soft, PLM, Calliduscloud
High-level architecture:
· Amazon S3 bucket have folders for specific source files delivered by Fusion middleware. Data quality check is not handled by the sources.
· Anaconda Python based programs are in place to clean any abnormalities in the source files and re-write them back in their respective folders this is to avoid load failures.
· most of the known errors are handled using this mechanism, moving forward you will require to enhance these scripts to include a fix for any new errors.
· currently mechanism is processing one file at a time, target is to move it to parallel processing.
· Next is a Python program for data pipeline, based on AWS API and snowflake API. This is to write files on snowflake EDW
· There are 3 different layers on the snowflake side Pre=processing, staging and base tables
· Unix cron is used for scheduling purpose.