Google Drive is a file hosting system powered by Google. It offers cloud file storage and synchronization service, allowing users to store their data on remote servers. Besides storing the file on these servers, Google Drive will also synchronize their files across multiple devices that they use and share it with other users as requested. Dropbox, OneDrive and Google Photos are similar applications that handle file storage and sharing for massive amounts of files for millions of users.
Based on the design of Google Drive, let's create a basic file storage and sharing service that can scale to millions of users and handle petabytes of data.
It's important to clarify the requirements of the design. The actual design of such applications can cover several features and involve complexities that are beyond the scope of a system design interview. Narrow down the requirements to a few core components before building the system.
The last two non-functional requirements, including minimum bandwidth and minimum latency are both very important and is exactly why Google Drive and similar services choose to upload files in chunks rather than uploading a single large file.
If Google Drive were to upload a large file of, say, 10 MB, to the cloud storage, its upload as a single file would involve high latencies and bandwidth utilization. In case the upload fails, the entire file will need to be uploaded fresh, using further time, bandwidth and money. Also, if you were to update the file, the service will upload and store the entire 10 MB file again. With this approach, each update involves uploading and storing a fresh 10 MB file on the cloud storage, as shown in the diagram below.
There are two drawbacks to this technique:
Now, we may overwrite the original file on the cloud storage and save all this space utilization. However, a file storage service typically keeps track of the update history, so we need a model that can store all the modifications while using minimum space on the cloud.
50% off Udemy courses
Grokking the System Design Interview
Java Multithreading for Senior Engineering Interviews
Grokking the Advanced Design Interview
Grokking the Coding Interview: Patterns for Coding Questions
Grokking Dynamic Programming Patterns for Coding Interviews
Coderust: Hacking the Coding Interview