End of Chapter Exercises
Introduction to Jupyter Notebooks (Google Colab) and Data Acquisition
Objective:
Familiarize yourself with Jupyter Notebook (Google Colab), practice data acquisition techniques, and begin using GitHub for assignment submissions.
Instructions:
1. Setting Up Google Colab:
- Create a new notebook in Google Colab.
- Familiarize yourself with the Colab interface: exploring the menu options, inserting code and text cells, and using the file browser.
2. Data Acquisition:
- Find and download at least 2-3 different CSV datasets (less than 10MB each) that do not contain any PHI/PII.
- Similarly, find and download 3 XLSX files and 3 JSON files with the same criteria.
- Use
pandas
to load each of these datasets into separate data frames in your Colab notebook. - Display the first 10 rows of each dataset in separate code cells.
3. Markdown Practice:
- Add a markdown cell and list the features (columns) of each dataset.
- For each feature, give a brief description of its content/purpose.
3. Submission:
- Create a new GitHub repository named
datasci_1_loading
in your GitHub account. - Organize your GitHub repository with the following:
- Within your repository, create a folder named "data" to store your acquired datasets.
- Save your Colab notebook to your GitHub repository.
- Submit the link to your GitHub repository.
Resources:
Tip: Ensure that the datasets you choose are public and do not violate any copyrights. Be mindful of the size constraints and avoid datasets containing sensitive information.