Intro to Data Storage

Welcome to Data Storage section of the course.

This section is divided into two

  • Data Lakes
  • Data Warehouse

What is a Data Lake?

  • A data lake is a centralized repository designed to store, process, and secure large amounts of structured, semistructured, and unstructured data. It can store data in its native format and process any variety of it, ignoring size limits.

We would be discussing Google Cloud Storage as an example in this course.

Introduction to Google Cloud Storage

  • Google Cloud Storage (GCS) is a fully-managed, highly scalable and durable cloud storage service provided by Google Cloud Platform for storing unstructured data.
  • Files are stored as objects in GCS. An object is an immutable piece of data consisting of a file of any format.
  • You store objects in containers called buckets. All buckets are associated with a project, and you can group your projects under an organization.
  • After you create a project, you can create Cloud Storage buckets, upload objects to your buckets, and download objects from your buckets.

What are Buckets?

Buckets are the basic containers that hold your data. Everything that you store in Cloud Storage must be contained in a bucket. You can use buckets to organize your data and control access to your data, but unlike directories and folders, you cannot nest buckets.

  • There is no limit to the number of buckets you can have in a project or location.
  • When you create a bucket, you give it a globally-unique name and a geographic location where the bucket and its contents are stored.
  • You cannot change the name or location of an existing bucket. Instead, you can create a new bucket with the desired name or in the desired location and move the contents from the old bucket to the new bucket. 

LOCATION RECOMMENDATIONS

https://cloud.google.com/storage/docs/locations#location_recommendations

STORAGE CLASSES

  • In this section, we’ll be discussing the different storage classes available in Google Cloud Storage and how they can be used to optimize your storage costs.
  • What is a Storage Class?
  • A storage class is a piece of metadata that is used by every object.
  • The storage class set for an object affects the object’s availability and pricing model.
  • When you create a bucket, you can specify a default storage class for the bucket. When you add objects to the bucket, they inherit this storage class unless explicitly set otherwise.
  • Google Cloud Storage offers four different storage classes: Standard, Nearline, Coldline, and Archive. Each class is designed for different types of data and usage patterns.

Types of Storage Classes

  • Standard Storage
  • Nearline Storage
  • Coldline Storage
  • Archive Storage

STANDARD STORAGE

  • The Standard storage class is the most commonly used class and is suitable for data that needs to be accessed frequently and in low latency. It’s ideal for use cases such as backups, media, and primary storage.

NEARLINE STORAGE

  • Nearline storage is a low-cost, highly durable storage service for storing infrequently accessed data. Nearline storage is a better choice than Standard storage in scenarios where slightly lower availability, a 30-day minimum storage duration, and costs for data access are acceptable trade-offs for lowered at-rest storage costs.
  • Nearline storage is ideal for data you plan to read or modify on average once per month or less. For example, if you want to continuously add files to Cloud Storage and plan to access those files once a month for analysis, Nearline storage is a great choice.

COLDLINE STORAGE

  • Coldline storage is a very-low-cost, highly durable storage service for storing infrequently accessed data. Coldline storage is a better choice than Standard storage or Nearline storage in scenarios where slightly lower availability, a 90-day minimum storage duration, and higher costs for data access are acceptable trade-offs for lowered at-rest storage costs.
  • Coldline storage is ideal for data you plan to read or modify at most once a quarter.

ARCHIVE STORAGE

  • Archive storage is the lowest-cost, highly durable storage service for data archiving, online backup, and disaster recovery. 
  • Like Nearline storage and Coldline storage, Archive storage has a slightly lower availability than Standard storage.
  • Archive storage is the best choice for data that you plan to access less than once a year. For example:
    • Cold data storage – This could be archived data, such as data stored for legal or regulatory reasons. Such data can be stored at low cost with Archive storage, yet still be available if you need it.
    • Disaster recovery – In the event of a disaster recovery event, recovery time is key. Cloud Storage provides low latency access to data stored as Archive storage
  • One thing to note about Nearline storage, Coldline storage and Archive storage is the concept of a Minimum Storage Duration.
  • You can delete, replace, or move an object before it has been stored for the minimum duration, but at the time you delete, replace, or move the object, you are charged as if the object was stored for the minimum duration. 
  • The Minimum storage duration is 30 days for Nearline storage, 90 days for Coldline storage and 365 days for archive storage.
Scroll to Top