AWS Data Bricks Engineer
AWS Data Bricks Engineer123
Applications
123
Applications
Not Accepting Applications
About the Job
Skills
Job Description
This is a remote position.
Requirements
● Strong experience as a AWS Data Engineer and must have AWS Databricks experience.
● Expert proficiency in Spark Scala, Python, and PySpark is a plus
● Must have data migration experience from on prem to cloud
● Hands-on experience in Kinesis to process & analyze Streaming data, and AWS DynamoDB
● In depth understanding of AWS cloud and AWS Data lake and Analytics solutions.
● Expert level hands-on development Design and Develop applications on Databricks, Databricks Workflows, AWS Managed Airflow, Apache Airflow is required.
● Extensive hands-on experience implementing data migration and data processing using AWS services: VPC/SG, EC2, S3, AutoScaling, CloudFormation, LakeFormation, DMS, Kinesis, Kafka, Nifi, CDC processing, Amazon S3, EMR, Redshift, Athena, Snowflake, RDS, Aurora, Neptune, DynamoDB, Cloudtrail, CloudWatch, Docker, Lambda, Spark, Glue, SageMaker, AI/ML, API GW, etc.
● Hands-on experience with the Technology stack available in the industry for data management, data ingestion, capture, processing, and curation: Kafka, StreamSets, Attunity, GoldenGate, Map Reduce, Hadoop, Hive, Hbase, Cassandra, Spark, Flume, Hive, Impala, etc.
● Knowledge of different programming and scripting languages
● Good working knowledge of code versioning tools [such as Git, Bitbucket or SVN]
● Hands-on experience in using Spark SQL with various data sources like JSON, Parquet and Key Value Pair
● Experience preparing data for Data Science and Machine Learning.
● Experience preparing data for use in SageMaker and AWS Databricks.
● Demonstrated experience preparing data, automating and building data pipelines for AI Use Cases (text, voice, image, IoT data etc.…).
● Good to have programming language experience with .NET or Spark/Scala
● Experience in creating tables, partitioning, bucketing, loading and aggregating data using Spark Scala, Spark SQL/PySpark
● Knowledge of AWS/Azure DevOps processes like CI/CD as well as Agile tools and processes including Git, Jenkins, Jira, and Confluence
● Working experience with Visual Studio, PowerShell Scripting, and ARM templates.
● Strong understanding of Data Modeling and defining conceptual logical and physical data models.
● Big Data/analytics/information analysis/database management in the cloud
● IoT/event-driven/microservices in the cloud- Experience with private and public cloud architectures, pros/cons, and migration considerations.
● Ability to remain up to date with industry standards and technological advancements that will enhance data quality and reliability to advance strategic initiatives
● Basic experience with or knowledge of agile methodologies
● Working knowledge of RESTful APIs, OAuth2 authorization framework and security best practices for API Gateways
Responsibilities:
· Work closely with team members to lead and drive enterprise solutions, advising on key decision points on trade-offs, best practices, and risk mitigation
· Manage data related requests, analyze issues, and provide efficient resolution. Design all program specifications and perform required tests
· Design and Develop data Ingestion using Glue, AWS Managed Airflow, Apache Airflow and processing layer using Databricks.
· Work with the SMEs to implement data strategies and build data flows.
· Prepare codes for all modules according to required specification.
· Monitor all production issues and inquiries and provide efficient resolution.
· Evaluate all functional requirements, map documents, and troubleshoot all development processes
· Document all technical specifications and associates project deliverables.
· Design all test cases to provide support to all systems and perform unit tests.
Qualifications:
● 2+ years of hands-on experience designing and implementing multi-tenant solutions using AWS Databricks for data governance, data pipelines for near real-time data warehouse, and machine learning solutions.
● 5+ years’ experience in a software development, data engineering, or data analytics field using Python, PySpark, Scala, Spark, Java, or equivalent technologies.
● Bachelor’s or Master’s degree in Big Data, Computer Science, Engineering, Mathematics, or similar area of study or equivalent work experience
● Strong written and verbal communication skills
● Ability to manage competing priorities in a fast-paced environment
● Ability to resolve issues
● Self-Motivated and ability to work independently
● Nice to have-
- AWS Certified: Solutions Architect Professional
- Databricks Certified Associate Developer for Apache Spark
About the company
Industry
Human Resources
Company Size
51-200 Employees
Headquarter
Pune