System Design Notes
Don’t forget to get your copy of Designing Data Intensive Applications the single most important book to read for system design interview prep!

Dropbox/ Google Drive/ OneDrive Design

Google Drive is a file hosting system powered by Google. It offers cloud file storage and synchronization service, allowing users to store their data on remote servers. Besides storing the file on these servers, Google Drive will also synchronize their files across multiple devices that they use and share it with other users as requested. Dropbox, OneDrive and Google Photos are similar applications that handle file storage and sharing for massive amounts of files for millions of users.

Based on the design of Google Drive, let's create a basic file storage and sharing service that can scale to millions of users and handle petabytes of data.

Requirements Of The System

It's important to clarify the requirements of the design. The actual design of such applications can cover several features and involve complexities that are beyond the scope of a system design interview. Narrow down the requirements to a few core components before building the system.

Functional Requirements

  1. Users can upload and download files from any device that they are logged in.
  2. Users can share files with other users.
  3. The service should automatically synchronize files across all devices.

Non-Functional Requirements

  1. The system should support storage of large files of up to 1GB each.
  2. The system should be able to scale to an enormous number of users (Google Drive has over 1 billion active users as of July 2018).
  3. The system should be able to handle a high number of reads and writes (around 100 million requests per day). Read to write ratio is comparable in this case.
  4. Minimum possible network bandwidth should be utilized for file synchronization.
  5. There should be minimum latency in file transfer.

Uploading Files In Chunks

The last two non-functional requirements, including minimum bandwidth and minimum latency are both very important and is exactly why Google Drive and similar services choose to upload files in chunks rather than uploading a single large file.

Why Is Uploading A Complete File Not Practical?

If Google Drive were to upload a large file of, say, 10 MB, to the cloud storage, its upload as a single file would involve high latencies and bandwidth utilization. In case the upload fails, the entire file will need to be uploaded fresh, using further time, bandwidth and money. Also, if you were to update the file, the service will upload and store the entire 10 MB file again. With this approach, each update involves uploading and storing a fresh 10 MB file on the cloud storage, as shown in the diagram below.

There are two drawbacks to this technique:

  1. Each upload utilizes a 10MB bandwidth on the network.
  2. You will use up 10 MB space on the cloud each time the file is updated.

Now, we may overwrite the original file on the cloud storage and save all this space utilization. However, a file storage service typically keeps track of the update history, so we need a model that can store all the modifications while using minimum space on the cloud.

Click here to continue reading this lesson on Medium.