Skip to main content

End of Chapter Exercises

Introduction to Jupyter Notebooks (Google Colab) and Data Acquisition

Objective:

Familiarize yourself with Jupyter Notebook (Google Colab), practice data acquisition techniques, and begin using GitHub for assignment submissions.

Instructions:

1. Setting Up Google Colab:

  • Create a new notebook in Google Colab.
  • Familiarize yourself with the Colab interface: exploring the menu options, inserting code and text cells, and using the file browser.

2. Data Acquisition:

  • Find and download at least 2-3 different CSV datasets (less than 10MB each) that do not contain any PHI/PII.
  • Similarly, find and download 3 XLSX files and 3 JSON files with the same criteria.
  • Use pandas to load each of these datasets into separate data frames in your Colab notebook.
  • Display the first 10 rows of each dataset in separate code cells.

3. Markdown Practice:

  • Add a markdown cell and list the features (columns) of each dataset.
  • For each feature, give a brief description of its content/purpose.

3. Submission:

  • Create a new GitHub repository named datasci_1_loading in your GitHub account.
  • Organize your GitHub repository with the following:
    • Within your repository, create a folder named "data" to store your acquired datasets.
    • Save your Colab notebook to your GitHub repository.
    • Submit the link to your GitHub repository.

Resources:


Tip: Ensure that the datasets you choose are public and do not violate any copyrights. Be mindful of the size constraints and avoid datasets containing sensitive information.