Skip to content

Latest commit

 

History

History
executable file
·
129 lines (89 loc) · 5.82 KB

File metadata and controls

executable file
·
129 lines (89 loc) · 5.82 KB

Python Data Engineer Learning Repository

Welcome to the Python Data Engineer learning repository! This repo contains a structured, practical set of Jupyter notebooks and example projects for learning core Python concepts with a focus on data engineering.

Note: This summary is based on the top-level files; for a full list of all tutorials and scripts, check the GitHub repository contents.


📚 Topics Covered

  • Overview: Introduction to Python, variables, data types, and basic operations.
  • Key Concepts:
    • Printing and string manipulation
    • Variable assignment and naming
    • Numeric, string, and boolean data types
    • Type conversion, built-in functions, and string methods
    • List basics and common list operations

  • Overview: Mastering conditional statements for decision making.
  • Key Concepts: if, elif, else, comparison and logical operators, nested conditions.

  • Overview: Using loops to automate repetitive tasks.
  • Key Concepts: for and while loops, loop control (break, continue, pass), iterating collections.

  • Overview: Writing reusable blocks of code with functions.
  • Key Concepts: Defining and calling functions, parameters, return values, scope, lambda functions.

  • Overview: Using operators to manipulate data.
  • Key Concepts: Arithmetic, assignment, comparison, logical, bitwise, and membership operators.

  • Overview: Mastering data structures for efficient storage and retrieval.
  • Key Concepts: Lists, tuples, sets, dictionaries, and real-world examples.

  • Overview: Organizing and reusing code with modules and packages.
  • Key Concepts:
    • Difference between modules, packages, and libraries
    • Importing and using built-in and external libraries (e.g., Pandas, NumPy, Matplotlib, Requests)
    • Creating custom modules and packages

  • Overview: File handling (text & CSV) and JSON management for configuration and data exchange. (JSON content previously listed separately has been merged into this section.)
  • Key Concepts:
    • Reading and writing text and CSV files using built-in modules and pandas
    • Using os and shutil for file and directory operations
    • Reading, writing, parsing, and serializing JSON with Python’s json module
    • Data extraction and ingestion from files and JSON API responses
    • Error handling and path management

  • Overview: Object-oriented programming in Python.
  • Key Concepts:
    • Defining classes and creating objects
    • Constructors (__init__)

  • Overview: Working with randomness, generating random numbers and data for testing and simulations.
  • Key Concepts: random module, faker library, random sampling and anonymization.

  • Overview: Reusable scripts and code blocks for modular data engineering workflows.
  • Key Concepts: Encapsulating logic in functions and scripts, templates for batch processing.

  • Overview: Logging and monitoring data engineering processes.
  • Key Concepts: Python’s logging module, log formats, levels, handlers, and best practices.

  • Overview: Common Python interview questions and concise answers, focusing on practical explanations.

  • Overview: Example Streamlit apps and instructions to run them.
  • Key Concepts: Installing Streamlit, building and running simple apps, basic visualization with Plotly.

  • Overview: End-to-end projects to apply learned concepts.
  • Contents: Project ideas, example implementations, and deployment notes.

📎 How to Use This Repo

  1. Browse Notebooks: Start with the Jupyter notebooks in the main directory for a structured learning path.
  2. Explore Directories: Check out the additional folders for sample scripts, data, and projects.
  3. Try the Code: Run the notebooks locally or in an online Jupyter environment.
  4. Contribute: Pull requests to add new topics or improve examples are welcome!

🔗 Explore More