What is Colab?
Colaboratory, or Colab for short, is a browser-based Python interpreter built by Google. It extends the open source Jupyter Notebook project.
With additional niceties such as Markdown previews, free GPU, and the ability to make a comment just like you would any Google Docs, Colab is quickly becoming the Python environment of choice among developers in machine learning, data science, and AI research spaces.
Here are few practical Tips and Tricks to help you get the most out of Colab.
Loading Data from Google Drive
Since runtime resources are provided for free, there is no persistence in file storage. Files are lost when the resource is recycled. Datasets and files would need to be reuploaded everytime or hosted elsewhere.
If you don't have your own file hosting solution and want to avoid reuploading every time, use the pydrive library to access and load data directly from your Google Drive.
# Import PyDrive and google auth libraries.
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials
# Authenticate and create the PyDrive client.
auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)
# Download a file based on its file ID.
# A file ID looks like: laggVyWshwcyP6kEI-y_W3P8D26sz
file_id = 'REPLACE_WITH_YOUR_FILE_ID'
downloaded = drive.CreateFile({'id': file_id})
print('Downloaded content "{}"'.format(downloaded.GetContentString()))
Enable GPU / TPU in environment
By default, GPU is not enabled. To access it, Go to Runtime dropdown menu, and choose Change Runtime type to enable GPU or TPU in your environment. Check with print(gpu_info)
.
gpu_info = !nvidia-smi
gpu_info = '\n'.join(gpu_info)
if gpu_info.find('failed') >= 0:
print('Select the Runtime → "Change runtime type" menu to enable a GPU accelerator, ')
print('and then re-execute this cell.')
else:
print(gpu_info)
Display Pandas Dataframes as Interactive Tables
By default, a Pandas Dataframe output offers no interactivity. Use the Colab Data Table extension by running
%load_ext google.colab.data_table
In a cell to display rich interactive tabular data. Quickly view min, max, and sorted columnar data. Filter by range and index.
Add an "Open in Colab" badge to your project
For code snippets of the above in Colab:
The above badge is generated by this in markdown:
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1D37io66ccgkrOUV_FA_J1S8TXll_Dzpb)
Conclusion
Colab is a powerful tool and ecosystem capable of both handling large datasets and training powerful machine learning models.
Hopefully this will give you a head start in using Google Colab for data science, machine learning, research or personal projects.
And don't forget to check out these additional resources: - Intro to Colab is a short video made by Google about using Colab with Tensorflow. - Jupyter4edu is an open source project about teaching and learning with Jupyter Notebook ecosystem. - Pandas is a popular and production-grade data analysis library for Python.
Please feel free share your own Colab tips and tricks, and projects in comments.
About the Author:
Wanjun Zhang is a technical training consultant at Inly. She has a background in SaaS consulting and technology training.