
Data Engineer

Data Engineer
About the Job
Skills
Data Engineer (5+ Years Experience) – Heavy Data Analytics Project
Tech Stack: Spark, PySpark, Scala, Python, SQL, Databricks, Data Lake, Data Warehouse, Snowflake, Azure (ADF/Synapse/ADLS)
About the Role
We are hiring a Data Engineer with strong hands-on experience in building high‑performance data pipelines for a heavy data analytics project. The candidate must be excellent at writing complex aggregations, understanding business processes and analytical requirements, and designing scalable data lake and data warehouse solutions. Experience across multiple data platforms (Databricks, Snowflake, Azure Data Factory, Synapse, etc.) is a strong advantage.
Key Responsibilities
1. Data Pipeline & ETL/ELT Development
• Develop, optimize, and productionize Spark (PySpark/Scala) pipelines.
• Ingest, transform, cleanse, and aggregate large datasets from varied sources.
• Implement scalable ETL/ELT logic for batch and near-real-time pipelines.
• Apply best practices in partitioning, caching, Delta Lake optimization, and performance tuning.
2. Heavy Data Analytics & Business Understanding
• Write complex aggregation logic (window functions, rollups, grouping sets, analytical functions).
• Understand business KPIs, metrics, and analytical use cases.
• Translate business needs into technical transformations and data models.
• Validate data outputs against business logic and analytics expectations.
• Collaborate with analysts on calculations: weekly/monthly aggregates, trend lines, performance metrics, dimensional rollups.
• Ensure accuracy, consistency, and traceability of business-critical metrics.
3. Data Lake Engineering
• Build and maintain multi-layer Data Lake architectures (Bronze/Silver/Gold).
• Work with Parquet, Delta Lake, ORC, and columnar storage formats.
• Implement schema evolution, auditing, and metadata strategies.
4. Data Warehouse Engineering
• Design dimensional models: Star Schema and Snowflake Schema.
• Build fact and dimension tables supporting analytics and reporting.
• Optimize table structures, keys, and partitioning strategies.
5. Databricks (Added Advantage)
• Develop notebooks/jobs using PySpark/Scala.
• Manage clusters, workflows, and Delta Live Tables.
• Implement best practices for performance and cost efficiency.
6. SQL Engineering
• Strong command of SQL for aggregations, analytical functions, joins, profiling, and validation.
• Write and optimize complex queries supporting dashboards, metrics, and reports.
7. Cloud Data Platforms
Azure: Data Factory, Synapse Analytics, ADLS Gen2, Azure Functions (optional).
Snowflake: Virtual Warehouses, Snowpipe, Streams & Tasks, performance tuning.
8. Data Quality & Documentation
• Validate transformation logic against business rules.
• Document data flows, transformation rules, aggregation logic, and data dictionary/metadata.
• Work with QA and analysts to ensure outputs match business expectations.
Required Qualifications
• 5+ years of hands-on data engineering experience.
• Strong programming skills: Spark, Scala, Python.
• Strong SQL skills (aggregations, analytical functions, large joins).
• Experience with Data Lake and Data Warehouse concepts.
• Experience with Spark-based processing (delta optimization, shuffle tuning, partitioning).
• Experience with at least one cloud data ecosystem (Azure/AWS/GCP).
Preferred Skills
• Experience with Databricks (highly desirable).
• Experience with Snowflake or modern cloud DWH.
• Experience with ADF/Synapse/Airflow/dbt for orchestration.
• Knowledge of CI/CD for data pipelines.
• Experience with large-scale data analytics environments.
Soft Skills
• Strong understanding of business logic behind analytics outputs.
• Ability to translate business metrics into technical transformations.
• Strong problem-solving and debugging skills.
• Good communication and cross-team collaboration.
About the company
Industry
Consumer Services
Company Size
10001+ Employees
Headquarter
Texas,USA
Other open jobs from Expertia AI
