Citi, the leading global bank, has approximately 200 million customer accounts and does business in more than 160 countries and jurisdictions. Our core activities are safeguarding assets, lending money, making payments and accessing the capital markets on behalf of our clients. Citi’s Mission and Value Proposition explain what we do and Strategy explain how we do it. Our mission is to serve as a trusted partner to our clients by responsibly providing financial services that enable growth and economic progress. We strive to earn and maintain our clients’ and the public’s trust by constantly adhering to the highest ethical standards and making a positive impact on the communities we serve. Our Data Platform Engineering team is on the cutting edge. We research, adapt, and deploy the latest open-source data platforms to meet Citi's unique needs. We're a collaborative group that thrives on technical challenges and the satisfaction of building highly performant systems. We're seeking a passionate and highly skilled Java Data Engineer to guide and mentor a talented team of engineers in building and maintaining Citi's next-generation data platform. If you're a natural leader with a deep understanding of Java, distributed systems, and a passion for pushing the boundaries of Big Data technology, we want to hear from you. Responsibilities: Effectively interact, collaborate with development team. Work with developers onshore, offshore and matrix teams to implement a business solution. Effectively communicate development progress to the tech lead. Implement Ad-hoc changes as requested by the business or technology. Committed technologist focused on delivering high quality products on time, following TDD and meeting aggressive timelines. Qualifications 5+ years of hands-on experience developing high-performance Java applications (Java 11+ preferred) with a strong foundation in core Java concepts, OOP, and OOAD. Proven experience building and maintaining data pipelines using technologies like Kafka, Apache Spark. Familiarity with event-driven architectures and experience in developing real-time, low-latency applications is essential. Object Oriented analysis and design using common design patterns. Profound insight of Java and JEE internals (Classloading, Memory Management, Transaction management etc) Excellent knowledge of Relational Databases, SQL and ORM technologies (JPA2, Hibernate) Experience in the Spring Framework Understanding of Spark's core concepts: RDDs (Resilient Distributed Datasets), DataFrames, Datasets, transformations (map, filter, reduce), and actions (collect, count). Proficiency in writing Spark applications using the Java API. Knowledge of Spark's execution model and cluster management. Experience with using Spark SQL for data manipulation and querying. Familiarity with Spark SQL's data types and functions. Ability to write SQL queries within Spark applications. Understanding of real-time data processing concepts. Experience with Spark Streaming API for processing data streams. Knowledge of different input sources and output sinks for streaming data. Familiarity with basic machine learning concepts. Experience with using Spark MLlib for building and deploying machine learning models. Knowledge of different machine learning algorithms available in MLlib. Understanding of data serialization formats like Kryo and Avro. Ability to optimize data serialization for performance improvements. Knowledge of Spark's performance tuning parameters. Ability to identify and address performance bottlenecks in Spark applications. Candidate should have keen interest to gain financial knowledge. Strong written, interpersonal and verbal communication skills are essential Desired Skills Familiarity with other big data tools like Hadoop, Hive, Kafka, and HBase. Experience with cloud platforms like AWS, Azure, or Google Cloud Platform, especially their managed Spark services (EMR, Databricks, HDInsight). Docker and Kubernetes for deploying and managing Spark applications. Continuous integration and continuous deployment pipelines for automated testing and deployment. While Java is perfectly suitable, learning Scala can be beneficial as many Spark libraries and examples are written in Scala. Proficiency in using the oc command-line tool to manage OpenShift resources. Strong understanding of Kubernetes concepts like pods, deployments, services, namespaces, and configmaps, as OpenShift is built on Kubernetes. This job description provides a high-level review of the types of work performed. Other job-related duties may be assigned as required. ------------------------------------------------------ Job Family Group: Technology ------------------------------------------------------ Job Family: Applications Development ------------------------------------------------------ Time Type: Full time ------------------------------------------------------ Most Relevant Skills Please see the requirements listed above. ------------------------------------------------------ Other Relevant Skills For complementary skills, please see above and/or contact the recruiter. ------------------------------------------------------ Citi is an equal opportunity employer, and qualified candidates will receive consideration without regard to their race, color, religion, sex, sexual orientation, gender identity, national origin, disability, status as a protected veteran, or any other characteristic protected by law. If you are a person with a disability and need a reasonable accommodation to use our search tools and/or apply for a career opportunity review Accessibility at Citi. View Citi’s EEO Policy Statement and the Know Your Rights poster.