Original Source Here
DATA | ENGINEERING | TECHNOLOGY | CLOUD
[Part 1] — Road to Google Cloud Professional Data Engineer — Introduction to the Google Cloud Platform
The Path to Becoming a Certified Google Professional Data Engineer
I am currently preparing to take the Google Cloud Professional Data Engineer certification. I will be blogging about my experiences in this journey to help others interested to find the information needed to pass the exam.
There will be many parts to this Series — how many I do not yet know.
As I go through the different course materials and resources which I find online, I will be jotting down notes here. These blog posts are merely my understanding of the topics as I am absorbing content, so please, take everything with a grain of salt. These posts should not act as your study material but rather as pointers to help you structure your preparation.
The Google Cloud Platform
Being one of the top 3 Cloud Service Provider, the Google Cloud Platform (GCP) contains a plethora of Big Data and Machine Learning products and services.
These are resources or services that will provide some processing power. This is usually in terms of CPU, TPU, and RAM — such as a VM.
One of the main differences between Cloud and Desktop computing is that storage is independent of computing in Cloud systems. You don’t want to think of disks attached to the compute instance as a limit of how much data you can process and store.
Our first priority is getting the data inside our instance and transforming it as required (i.e. build the appropriate pipelines).
One simple example of GCP storage is Elastic Bucket Storage. There are 4 options to Elastic Storage.
- Standard — used for frequently accessed data
- Nearline — used for data accessed monthly
- Coldline — used for data accessed quarterly
- Archive — used for data backups
Google’s network interconnects with the public internet at more than a 100 Points of Presence (PoP) worldwide. All requests are sent from the closest edge PoP to ensure lowest delays.
Most lower levels of security is handled entirely by GCP. This includes the physical security of the hardware, the integrity of the data, and also the integrity of the network.
As GCP users we have control to manage user access and securing of data. Cloud IAM helps you implement these security policies.
Stored data is automatically encrypted at rest by GCP. Data in a BigQuery table is encrypted using a data-encryption key. These keys are then encrypted using key-encryption keys (a process known as envelope encryption).
The Google Cloud Platform Resource Hierarchy
The GCP is made up of 4 main parts:
- Resources — The most granular object of the entire GCP ecosystem. These are the actual services and processes that we use such as BigQuery, Data Store, or the Compute Instance. These resources must belong to specific projects.
- Projects —These are the base-level organising entity. They allows us to create and use resources or services, manage their billings, and also control their permissions. They logically organise all GCP resources. Projects can be created, destroyed, and also recovered from accidental deletion.
- Folder — A collection of projects. To have folders we must have an organisation.
- Organisation — The root-node of the entire GCP hierarchy. It allows us to set policies that apply throughout all projects and folders that are created within our enterprise. We can also fine-tune access control (enterprise-wide) through policies at this level.
Further Reading List
Trending AI/ML Article Identified & Digested via Granola by Ramsey Elbasheer; a Machine-Driven RSS Bot