Big Data Engineer - Hadoop/Spark Job in Rockville | Software Guidance & Assistance

3650 IT & Software Developer jobs in the US

Company Size

200-500

Company Type

Services

Exp Level

Senior

Job Type

Full-Time

Language

English

Visa sponsorship

Requirements

Must:

- Bachelors degree in Computer Science, Information Systems, or a related field with a minimum of five years of pertinent experience, or equivalent training and/or experience; a Masters degree and previous experience in the Financial Services sector is preferred. - Proven technical proficiency in Object-Oriented and database technologies, leading to the delivery of enterprise-grade solutions. - Comprehensive understanding of industry-standard software engineering practices, including Test Automation, Build Automation, and Configuration Management frameworks. - Excellent written and verbal communication skills in a technical context. - Proven capability in fostering effective working relationships to enhance the quality of deliverables. - Ability to maintain focus and swiftly acquire new skills in a fast-paced environment. - Practical experience with AI development tools (e.g., GitHub Copilot, Q Developer, ChatGPT, Claude, etc.). - Familiarity with Big Data technologies such as Hadoop, Spark, Hive, and Trino. - Knowledge of common challenges such as data skew, managing massive data volumes in Petabytes, and troubleshooting job failures due to resource constraints or data issues. - Hands-on experience with debugging and mitigation techniques. - Profound understanding of Kubernetes architecture, including management of pods, services, and deployments. - Experience with deploying Apache Spark workloads on Amazon EMR through EKS (Kubernetes). - Familiarity with Kubernetes resource management, scheduling, and auto-scaling practices. - Knowledge of Helm charts for application deployment and management on Kubernetes. - Understanding of Kubernetes networking, storage (PVs, PVCs), and security best practices. - Experience using kubectl and working with Kubernetes YAML manifests. - Capability to troubleshoot Kubernetes cluster issues, pod failures, and resource limitations. - Experience integrating Spark with Kubernetes operators and dynamic allocation techniques. - Expertise in prompt engineering for AI coding assistants and analysis tools. - Proficiency in designing AI workflows to enhance development processes. - Ability to interpret AI-generated insights for actionable team improvements. - Experience in leading teams through AI adoption and workflow changes. - Deep understanding of Sparks architecture, including executors, tasks, stages, and Directed Acyclic Graph (DAG). - Proven skills in Spark performance tuning methods such as partitioning, caching, and broadcast joins. - Experience troubleshooting performance issues in Spark jobs. - Proven ability to optimize Spark jobs using large datasets. - Hands-on experience running Spark on Kubernetes, with knowledge of Spark-on-K8s architecture. - Familiarity with AWS services like S3, EMR, EMR on EKS, Glue, Lambda, and Athena. - Practical experience using S3 with Spark, including dealing with file formats and consistency challenges. - Strong understanding of Amazon EKS architecture and best practices. - Familiarity with AWS IAM roles for service accounts (IRSA) related to Kubernetes workloads. - Knowledge of AWS networking principles for EKS (VPC, subnets, security groups). - Experience with AWS monitoring and logging services (CloudWatch, CloudTrail) for Kubernetes workloads. - Basic understanding of serverless technologies (Lambda, Fargate). - Ability to write clean, modular, and efficient code in Python or Scala. - Experience with functional programming concepts such as immutability and higher-order functions. - Proven track record of implementing scalable data processing solutions. - Strong grasp of collections, concurrency, and memory management techniques. - Proficiency in SQL window functions, multi-table joins, and aggregation queries. - Ability to write and optimize complex SQL statements, addressing edge cases like NULLs and duplicates.

Technologies

AWS

Lambda

Architect

ArgoCD

Big Data

ChatGPT

CI/CD

Cloud

CloudWatch

Copilot

Docker

ELK

ETL

GitHub

Responsibilities

- Design, develop, and sustain large-scale data processing pipelines utilizing Big Data technologies (e.g., Hadoop, Spark, Python, Scala). - Architect and deploy containerized big data workloads using Amazon EMR on EKS (Elastic Kubernetes Service). - Create and set up Kubernetes-based infrastructure for executing Spark applications at scale. - Develop scalable, efficient, and reliable solutions for data ingestion, storage, transformation, and analysis. - Keep abreast of industry trends and new Big Data technologies to enhance data architecture continuously. - Collaborate with cross-functional teams to comprehend business needs and convert them into technical solutions. - Optimize and improve existing data pipelines for enhanced performance, scalability, and reliability. - Develop automated testing frameworks and implement continuous testing protocols for data quality assurance. - Execute unit, integration, and system testing to ensure data pipeline accuracy and robustness. - Assist data scientists and analysts in facilitating data-driven decision-making throughout the organization. - Create and maintain automated unit, integration, and end-to-end tests. - Monitor and resolve issues in production data pipelines through troubleshooting efforts. - Manage Kubernetes clusters, including pods, services, and deployments for big data workloads.

Description

We are Software Guidance & Assistance, Inc. (SGA), a technology and resource solutions provider dedicated to standing out in our field. As a women-owned business, our mission is to solve significant IT challenges using a personal, boutique approach. Each year, we connect skilled consultants to over 1,000 engagements. When we say lets work better together, we truly mean it. You will be joining a diverse team grounded in core values such as customer service, employee development, and integrity. We encourage you to be yourself, embrace your passions, and find fulfillment in your work with us. This role will be conducted onsite in Rockville, MD.

Something wrong or incorrect with this job? Tell us in the chat 💬 on the right ➡️

IT & Software developer jobs in the USData Science & Engineering jobs in the USData Science & Engineering jobs Frederick, MD

You can find Data Engineer salaries in the United States here.

How many Data Engineer jobs are in the United States?

Currently, there are 3650 Data openings. Check also: Spark jobs, Snowflake jobs, Kafka jobs, Hadoop jobs - all with salary brackets.

Is the US a good place for Data Engineers?

The US is one of the best countries to work as a Data Engineer. It has a vibrant startup community, growing tech hubs and, most important: lots of interesting jobs for people who work in tech.

Which companies are hiring for Data Engineer jobs in the United States?

Tactable, Decisive Dividend Corporation, TEN X TORONTO, D3 Security Management Systems, GINGER Telecom, Gatestone & Co. Inc, ID Cosmetic clinic among others, are currently hiring for Data roles in the United States.

The company with most openings is Judge Group, Inc. as they are hiring for 266 different Data Engineer jobs in the United States. They are probably quite committed to find good Data Engineers.