These questions have been curated based on the interviews of various data Professionals in the industry as stated in the IBM Data engineering specialization. They will be helpful to anyone attending interviews for DE roles, it also gives a glance to brush up on skills.
Data Engineering Domain:
- Data integration
- Data Pipeline
- Data lake
- Data Warehouse
- Distributed systems
- Data
What employers look for in a Data Engineer
- Exposure to a breadth of data-related technologies
- Requirements can vary between jobs and roles
- Exposure to data sources such as relational DB, NoSQL DB, In-memory DB, and Key-value stores Experience in data movement processes such as :
- Moving data from RDBMS to NoSQL
- Pulling data from social media using APIs
- Loading data into analytical databases such as Hadoop
- Employers look for good analytical and problem-solving skills.
- Somebody who is inquisitive
- Asks additional questions to figure out the direction to take
- Can communicate really well
- Has a strong work ethic and owns what they do.
- SQL
- Data Modelling
- ETL Technologies
- Programming ( Python)
- Skills on RDBMS
- Expertise in schema design
- Ability to work on ETL and ELT processes
- Ability to handle streaming data
- Ability to handle multiple data formats and file formats
- Ability to work with web APIs and Web Scraping
- Basic data analytics skills
- Automation of routine work.
Work on:
- Building strong foundations
- SQL, Python, data modeling, and ETL Methodologies.
- Pay attention to hands-on experience.
- Leverage open source tools, and build hands-on projects.
- Come up with your own project
- Build a database
- Get involved with other people who are working in that area to learn from them.
- Learn DB internals.
- Procedural language such as Shell scripting, PL/SQL, or Perl.
- OOP in python
- Functional programming lang such as Scala
- Master at least one NoSQL DB, MongoDB, Cassandra or Neo4J
- Understand Web Scraping
- Understand how APIs work.
- Cloud computing and cloud platforms: Amazon Web Services (AWS), Microsoft Azure, SpringCloud, GCS (Google Cloud Storage)
- Data warehouse tools: Snowflake, Data Bricks, BigQuery, Redshift, Db2
- Data pipeline tools: Apache Kafka, Apache Airflow, Luigi
- Big data tools: Apache Hadoop, Apache Spark, Apache Hive
- Operating systems: UNIX, Linux
- Programming languages: SQL, Bash, Python, R, Java, C++
- Databases - Cassandra, Microsoft SQL Server, MySQL, PostgreSQL, Amazon DynamoDB, Apache Solr, IBM Db2, MongoDB, neo4j, Oracle PL/SQL, PostgreSQL
- Metadata management software - CA Erwin Data Modeler; Oracle Warehouse Builder; SAS Data Integration Server; Talend Data Fabric; Alation Data Catalog, SAP Information Steward, Azure Data Catalog, IBM Watson Knowledge Catalog, Oracle Enterprise Metadata Management (OEMM), Adaptive Metadata Manager, Unifi Data Catalog, data. world, and Informatica Enterprise Data Catalog
- Agile software development methodologies
- Version control - Git
- Modelling and API development
- Business intelligence and data analysis software - IBM Cognos Impromptu, MicroStrategy, Microsoft Power BI, Google Analytics, InsightSquared, Oracle Business Intelligence Enterprise Edition, Qlik Tech QlikView, Sisense, Tableau, Dundas BI, SAS Analytics, Domo, SAP Lumira