cover image
Microsoft

Member of Technical Staff, AI Data

On site

London, United Kingdom

Full Time

12-03-2025

Job Specifications

Help build the world’s most advanced multimodal dataset at Microsoft AI.

We are on a mission to create the largest and most advanced multimodal dataset in the world. This dataset, spanning all modalities from across the web and beyond, will power the training of the world’s most capable AI frontier models, pushing the boundaries of scale, performance, and product deployment.

The AI Data team at Microsoft AI is responsible for all aspects of data preparation to support our model pre-training operations, including collecting data from the source, extracting and transforming the most useful data, and understanding the impact of changes to data by training and evaluating new models. We are an interdisciplinary team of engineers and scientists, learning from each other, and collaborating to create the best models and products. We work closely with the teams that transform pre-trained models into the models that power the consumer Copilot experience.

About

We are looking for outstanding individuals excited about contributing to the next generation of systems that will transform the field. In particular, we are looking for candidates who:

Are passionate about the role of data in large-scale AI model training
Will thrive in a highly collaborative, fast-paced environment
Have a high degree of craftsmanship and pay close attention to details
Demonstrate a proactive attitude and enthusiasm for exploring new methods and technologies
Effectively manage multiple responsibilities and can adjust to shifting priorities.

Responsibilities

Design and develop data pipelines that ingest enormous amounts of multi-modal training data (text, audio, images, video).
Build and maintain cutting-edge infrastructure that can store and process the petabytes of data needed to power models.
Partner with the pretraining and post-training teams to improve our data recipe by rigorous and careful experimentation.
Collaborate with the product team and other engineers and researchers across Microsoft AI to identify gaps in the current generation of models.
Embody our culture and values.

Qualifications

Required/Minimum Qualifications

Bachelor's Degree in Computer Science, Math, Software Engineering, Computer Engineering, or related field AND experience in business analytics, data science, software development, data modeling or data engineering work
OR Master's Degree in Computer Science, Math, Software Engineering, Computer Engineering, or related field AND experience in business analytics, data science, software development, or data engineering work
OR equivalent experience.

#Copilot #MicrosoftAI

Microsoft is an equal opportunity employer. Consistent with applicable law, all qualified applicants will receive consideration for employment without regard to age, ancestry, citizenship, color, family or medical care leave, gender identity or expression, genetic information, immigration status, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran or military status, race, ethnicity, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable local laws, regulations and ordinances. If you need assistance and/or a reasonable accommodation due to a disability during the application process, read more about requesting accommodations.

About the Company

Every company has a mission. What's ours? To empower every person and every organization to achieve more. We believe technology can and should be a force for good and that meaningful innovation contributes to a brighter world in the future and today. Our culture doesn’t just encourage curiosity; it embraces it. Each day we make progress together by showing up as our authentic selves. We show up with a learn-it-all mentality. We show up cheering on others, knowing their success doesn't diminish our own. We show up every day o... Know more

Related Jobs

Company background Company brand
Company Name
Capgemini
Job Title
AWS Data Engineer
Job Description
About The Job You're Considering The Cloud Data Platforms team is part of the Insights and Data Global Practice and has seen strong growth and continued success across a variety of projects and sectors. Cloud Data Platforms is the home of the Data Engineers, Platform Engineers, Solutions Architects and Business Analysts who are focused on driving our customers digital and data transformation journey using the modern cloud platforms. We specialise on using the latest frameworks, reference architectures and technologies using AWS, Azure and GCP. Hybrid working: The places that you work from day to day will vary according to your role, your needs, and those of the business; it will be a blend of Company offices, client sites, and your home; noting that you will be unable to work at home 100% of the time. If you are successfully offered this position, you will go through a series of pre-employment checks, including: identity, nationality (single or dual) or immigration status, employment history going back 3 continuous years, and unspent criminal record check (known as Disclosure and Barring Service) Your Role We are looking for strong AWS Data Engineers who are passionate about Cloud technology. Your work will be to: Design and Develop Data Pipelines: Create robust pipelines to ingest, process, and transform data, ensuring it is ready for analytics and reporting. Implement ETL/ELT Processes: Develop Extract, Transform, Load (ETL) or Extract, Load, Transform (ELT) workflows to seamlessly move data from source systems to Data Warehouses, Data Lakes, and Lake Houses using Open Source and AWS tools. Adopt DevOps Practices: Utilize DevOps methodologies and tools for continuous integration and deployment (CI/CD), infrastructure as code (IaC), and automation to streamline and enhance our data engineering processes. Design Data Solutions: Leverage your analytical skills to design innovative data solutions that address complex business requirements and drive decision-making. Your Skills And Experience Proficiency with AWS Tools: Demonstrable experience using AWS Glue, AWS Lambda, Amazon Kinesis, Amazon EMR , Amazon Athena, Amazon DynamoDB, Amazon Cloudwatch, Amazon SNS and AWS Step Functions. Programming Skills: Strong experience with modern programming languages such as Python, Java, and Scala. Expertise in Data Storage Technologies: In-depth knowledge of Data Warehouse, Database technologies, and Big Data Eco-system technologies such as AWS Redshift, AWS RDS, and Hadoop. Experience with AWS Data Lakes: Proven experience working with AWS data lakes on AWS S3 to store and process both structured and unstructured data sets. Your Security Clearance To be successfully appointed to this role, it is a requirement to obtain Security Check (SC) clearance. To obtain SC clearance, the successful applicant must have resided continuously within the United Kingdom for the last 5 years, along with other criteria and requirements. Throughout the recruitment process, you will be asked questions about your security clearance eligibility such as, but not limited to, country of residence and nationality. Some posts are restricted to sole UK Nationals for security reasons; therefore, you may be asked about your citizenship in the application process. What Does 'Get The Future You Want' Mean To You? You will be encouraged to have a positive work-life balance. Our hybrid-first way of working means we embed hybrid working in all that we do and make flexible working arrangements the day-to-day reality for our people. All UK employees are eligible to request flexible working arrangements. You will be empowered to explore, innovate, and progress. You will benefit from Capgemini’s ‘learning for life’ mindset, meaning you will have countless training and development opportunities from thinktanks to hackathons, and access to 250,000 courses with numerous external certifications from AWS, Microsoft, Harvard ManageMentor, Cybersecurity qualifications and much more. Why You Should Consider Capgemini Growing clients’ businesses while building a more sustainable, more inclusive future is a tough ask. But when you join Capgemini, you join a thriving company and become part of a diverse collective of free-thinkers, entrepreneurs and industry experts. A powerful source of energy that drives us all to find new ways technology can help us reimagine what’s possible. It’s why, together, we seek out opportunities that will transform the world’s leading businesses. And it’s how you’ll gain the experiences and connections you need to shape your future. By learning from each other every day, sharing knowledge and always pushing yourself to do better, you’ll build the skills you want. And you’ll use them to help our clients leverage technology to grow their business and give innovation that human touch the world needs. So, it might not always be easy, but making the world a better place rarely is. About Capgemini Capgemini is a global business and technology transformation partner, helping organisations to accelerate their dual transition to a digital and sustainable world, while creating tangible impact for enterprises and society. It is a responsible and diverse group of 340,000 team members in more than 50 countries. With its strong over 55-year heritage, Capgemini is trusted by its clients to unlock the value of technology to address the entire breadth of their business needs. It delivers end-to-end services and solutions leveraging strengths from strategy and design to engineering, all fuelled by its market leading capabilities in AI, generative AI, cloud and data, combined with its deep industry expertise and partner ecosystem. The Group reported 2024 global revenues of €22.1 billion. Get The Future You Want | www.capgemini.com
London, United Kingdom
Hybrid
Full Time
13-03-2025
Company background Company brand
Company Name
scrumconnect ltd
Job Title
Software Engineer (SC Cleared)- Azure Data & DevOps
Job Description
About the Role Scrumconnect Consulting is looking for a Software Engineer to work on a Strategic Data Platform with expertise in Azure Data Factory (ADF), Python, PySpark, Java, Terraform, and Azure DevOps . The ideal candidate will have experience in cloud-based data engineering, automation, and infrastructure provisioning while working in an Azure environment . You will be responsible for developing scalable data pipelines, integrating cloud services, automating deployments, and supporting DevOps workflows . Key Responsibilities: 1. Software Engineer Develop, test, and deploy scalable data pipelines using Azure Data Factory (ADF), Python, PySpark, and Java . Implement data transformation, ETL/ELT workflows, and data integration solutions . Optimize data flow and performance for cloud-based data processing. 2. Cloud & Infrastructure Automation Use Terraform to provision and manage Azure resources . Implement infrastructure as code (IaC) best practices for automated cloud deployments. Ensure efficient resource scaling and cost optimization . 3. DevOps & CI/CD Automation Collaborate with DevOps teams to build automated CI/CD pipelines in Azure DevOps . Deploy and manage containerized applications using Docker & Kubernetes . Monitor and troubleshoot build, deployment, and infrastructure issues . 4. Performance Optimisation & Security Optimise data pipeline performance in Azure. Implement cloud security best practices and ensure compliance with data governance policies. Troubleshoot data and infrastructure-related performance issues . 5. Collaboration & Documentation Work closely with data engineers, cloud architects, and DevOps teams to design solutions. Participate in agile ceremonies, sprint planning, and technical discussions . Maintain technical documentation and best practices . Required Skills & Experience: ? Software Development & Data Engineering Strong experience in: Python, PySpark, Java, Azure Data Factory (ADF) . Data Processing & Pipelines: ETL/ELT development, big data frameworks. Cloud Services: Hands-on experience with Azure Data Lake, Synapse, and Azure Functions . ? Infrastructure as Code (IaC) & Automation Terraform expertise for Azure resource provisioning. Experience in cloud infrastructure automation and DevOps workflows . ? DevOps & CI/CD Experience with Azure DevOps, Git, YAML pipelines . Ability to work with Docker, Kubernetes, and containerized applications. ? Other Skills Strong problem-solving and debugging skills. Experience working in Agile/Scrum environments . Excellent communication and collaboration skills. Nice to Have: Experience with Databricks, Apache Spark, or ML workloads . Knowledge of security best practices in cloud environments . Azure, Terraform, or DevOps-related certifications .
London, United Kingdom
Hybrid
Full Time
12-03-2025
Company background Company brand
Company Name
Gain Theory
Job Title
Principal Data Analyst
Job Description
Principal Data Analyst required to drive our analytics strategy and ensure our data insights align with business objectives. In this role, you will lead complex analytical projects, mentor a team of analysts, and collaborate across departments to deliver actionable recommendations. You will work as part of a client team supporting a Data Communications Lead on the collection, ingestion, and processing of marketing data, delivering cleaned data to the modeling team for analysis. The Principal Data Analyst is expected to be proficient in data processing techniques including SQL, ETL, and Python. You will leverage these skills for data interrogation, manipulation, and cleaning, as well as building and maintaining data pipelines. You will utilize our internal data automation tools to ensure efficient execution of these pipelines. The role also includes updating and creating new data processes and pipelines as required. Additionally, you will work with the Data Communications Lead to coordinate with client and agency contacts regarding the continued flow of data from relevant sources. You will interact with the wider data community within Gain Theory, especially with members of the Data Centre of Excellence (DCOE), to share best practices and provide and receive support. The Principal Data Analyst will also mentor and support junior analysts on their projects, helping them learn processes, best practices, and specific tools used by Gain Theory. Responsibilities: Data Management & Analysis: Manage data extraction, manipulation, validation, and interrogation using SQL, Python, and other relevant tools. Build data insights relevant to the project. Ensure all data is systematically checked and passes all QA steps. Data Architecture: Execute and update data pipelines. Build data ingestion and transformation pipelines using available tools, including Python scripting for data processing and automation. Work with fellow data analysts to build scalable solutions using ETL/ELT pipelines. Python Development: Develop and maintain Python scripts for data interrogation, cleaning, processing, and automation. Contribute to the development and improvement of our internal data processing tools and libraries. Research & Development: Propose better approaches to improve internal procedures, including new methodologies. Share techniques and ideas with the wider data community. Meetings: Organize and participate in internal project meetings, ensuring agendas are set and action points are shared. Lead internal meetings as required. Mentorship: Guide and support junior analysts, providing training on tools and best practices. Experience: Comfortable working with large amounts of data in a cloud ecosystem. Proficient with SQL and ETL processes, and experience driving robust QA processes. Experience with data interrogation, cleaning, and processing using Python. Snowflake experience a plus. Experience with data manipulation/visualization tools (e.g., Excel, Tableau). R experience a plus. Extreme attention to detail a must. Understanding or experience of business marketing and media a plus. Strong interpersonal and communication (written and oral) skills. Team-oriented attitude. Capacity to learn new skills and master new tools. Ability and desire to lead junior team members through mentorship and example setting. Qualifications Background (3-4 years+) in Computer Science, Data Science, Data Engineering Information Science, or related quantitative field In depth experience with all things data including ability to work with a variety of datasets from multiple sources, familiar with standard data processing tools/concepts (e.g. SQL, NoSQL ETL), and experience driving robust QA processes In depth experience of the advertising ecosystem (e.g. ad trafficking, Ad servers, DSPs, Media Strategy and Activation, etc.) and a working knowledge of appropriate metrics, measurement, and reporting Required skillsets: Snowflake, Python, GIT, AWS/Azure Can lead requirements gathering, project planning, and implementation of projects developed with DCOE leads Has project management skills including planning tasks and deliverables, managing timelines and risks, managing team resource allocation, and overseeing multiple simultaneous projects Ability to manage and motivate Gain Theory team members and to teach concepts or technologies that are developed Organized, detail-oriented, QA-focused Experience with DBT is highly valuable Excellent written, verbal, and presentation skills Values and Behaviors I Demonstrate Joining Gain Theory means joining a group of people who live, breath and behave by our values: Be Curious: continuously asking, understanding, learning, and developing. Be Positive: approaching everything we do with a positive mindset and making positive impact on each other. Act with Consideration: seeing things from someone else’s perspective; respecting and embracing diverse thinking. Make it Better: continuous improvement and stretching our abilities, being honest with ourselves and each other. Gain Theory is committed to actively building a diverse, equitable and inclusive workplace where everyone feels welcomed, valued and heard, and is treated with dignity and respect. As leaders and creative partners across industries, it is our responsibility to cultivate an environment reflective of our greatest asset; our people. We believe that this commitment inspires growth and delivers equitable outcomes for everyone as well as the clients and communities we serve. Gain Theory is a WPP-owned consultancy. For more information, please visit please visit our website and follow Gain Theory on our social channels via LinkedIn and Twitter.
London, United Kingdom
Hybrid
Full Time
12-03-2025
Company background Company brand
Company Name
Kantar
Job Title
Data Analyst
Job Description
We’re the world’s leading data, insights, and consulting company; we shape the brands of tomorrow by better understanding people everywhere. Kantar’s Profiles division is home to the world’s largest audience network. With access to 170m+ people in over 100 global markets, we offer unrivalled global reach with local relevancy. Validated by industry leading anti-fraud technology, Kantar’s Profiles Audience Network delivers the most meaningful data with consistency, accuracy, and accountability – all at speed and scale. Job Details Join our Data Science team at Kantar Profiles as a Mid-Level Data Analyst! If you are enthusiastic about transforming data into actionable insights and have a strong programming background (Python and SQL), we encourage you to apply. This role offers a distinctive chance to collaborate with our Senior Data Scientists and contribute significantly to our innovative projects. What You’ll Do... Lead all aspects of alerts to ensure the ecosystem's functionality, working with existing models for day-to-day operations and performance. Analyze user acquisition and retention data, identifying weaknesses and bugs in existing models for resolution. Run analytics to extract statistics, patterns, and design machine learning models to improve existing technologies. Contribute to data/statistics tasks for improving user engagement, working closely with the development team. Work together with different team members to offer valuable insights and analyses that contribute to decision-making processes. Communicate technical information with both technical and non-technical team members and collaborators. Transform data using ETL tools like DBT to make it more accessible to the broader business. Build visualizations, monitor trends, and identify patterns using time series graphing services like Grafana. Define critical metrics to drive improvements in user acquisition and engagement performance. What You’ll Bring... Experience with database queries, programming, data mining/wrangling, analysis, and reporting. Strong proficiency in SQL, with the ability to read, write, and query optimally. A keen curiosity about data, statistics, machine learning, and data science. Strong problem-solving skills with an emphasis on product development, logical thinking, and critical analysis. Experience with statistical computer languages such as Python, Scala, R, MATLAB. Knowledge of statistical techniques and concepts, including regression, properties of distributions, statistical tests, and accurate usage. Experience using web services and languages, including AWS, EC2, S3, Redshift, DigitalOcean, etc. Meticulous and committed, with a good work ethic and the capacity to collaborate effectively with diverse teams. Experience in Excel and Power BI is a plus. Benefits include 25 days annual leave (increasing with tenure), private medical health cash plan, income protection, life assurance, enhanced employer pension contribution; plus award-winning voluntary flexible benefits (lifestyle, health, wealth, wellbeing). We offer a hybrid working arrangement with an office presence of at least 2 days a week in Reading. Why join Kantar? We shape the brands of tomorrow by better understanding people everywhere. By understanding people, we can understand what drives their decisions, actions, and aspirations on a global scale. And by amplifying our in-depth expertise of human understanding alongside ground-breaking technology, we can help brands find concrete insights that will help them succeed in our fast-paced, ever shifting world. And because we know people, we like to make sure our own people are being looked after as well. Equality of opportunity for everyone is our highest priority and we support our colleagues to work in a way that supports their health and wellbeing. While we encourage teams to spend part of their working week in the office, we understand no one size fits all; our approach is flexible to ensure everybody feels included, accepted, and that we can win together. We’re dedicated to creating an inclusive culture and value the diversity of our people, clients, suppliers and communities, and we encourage applications from all backgrounds and sections of society. Even if you feel like you’re not an exact match, we’d love to receive your application and talk to you about this job or others at Kantar.
Reading, United Kingdom
Hybrid
Full Time
12-03-2025