Original Source Here
AWS Data Storages for Machine Learning
AWS services that we can use to store our data for our machine learning problems.
1. Amazon Simple Storage(S3) :-
Amazon simple storage service or S3 provides us a way for unlimited data storage that provides object-based storage for any type of data.
This essentially is the go to place for storing our machine learning data and the reason for this is because the core services that AWS offers for machine learning and are great directly with S3 and makes it super easy to read and stream our input and testing data from S3 onto these machine learning services.
AWS also uses S3 to output the result of anything that we create within our machine learning process, so let’s continue and talk about a few other features of S3.
S3 objects or files can be from zero bytes to terabytes and there is unlimited storage.
What I mean by unlimited storage is, AWS monitors how much data is on S3 and if it looks like they’re hitting a threshold or getting close to the heading of threshold. They will provision more resources so there is enough space to store all your data.
Now files and objects are stored in the buckets which are similar to the folders and S3 has a universal namespace , meaning that the names of your buckets are unique globally, That means you cannot have the same bucket name as one my bucket name.
so this is one example and point of an S3 bucket would look like
It have “S3– the region.Amazonaws.com/the bucket name”
so in our case the bucket name would be machine learning data
Now AWS supports two different ways to support address end points with path style like i just showed you or virtual hosted style.
Recently announced the path style model will end in September 2020 so from this point forward
Our virtual bucket name looks like this.
“BucketName.S3.Amazonaws.com” (As shown in image)
So this is what the new end points or virtual hosted end points are going to look like.
Moving forward for the next question is will how do we get data into S3 ??
If your data is already in there and then you’re good to go but if you need to upload data in the S3.
There’s a few different ways you can do this.
The first and most straightforward way is to just upload through the console. you can simply click the upload button and upload your datasets or objects manually.
You can also upload datasets or objects in the S3 using many different SDK that AWS offers to upload the file. Either with code or you can use the command line interface to upload your data into S3.
2. Amazon Relational Database Service (RDS) :-
The next type of data store that we will discuss is RDS which is Amazon relational database service and this is for relational databases.
Now RDS has the following engine that you’re allowed to choose from to create a fully managed relational database, as mentioned earlier relational databases are four application data stores that need transactional style databases.
I want a console where we can create an RDS instance, choose our engine type, set all the parameters and then create a relational database right in the AWS console.
3. DynamoDB :-
The next service we will talk about is DynamoDB which is a no-sql data store for non-relational databases that is used to store key-value pairs.
Now this service is best for schema less data and unstructured or semi structured data.
Example of what is the data in DynamoDB , my look is the following.
Here we have some data about Star Wars characters and semi-structured data in JSON formats.
Now all this data is considered the table in DynamoDB. And the actual table name is defined as “Character”. We have different items so we have two items here and there within those items we have key and value pairs.
So key-value key will weapon in the value for that to you would be light side in the combination of a key-value pair is going to make up an attribute
Within the AWS console it gives us a user interface where we are able to interact with our no-sql database.
4. Amazon Redshift :-
Now Amazon Redshift is a fully managed cluster petabyte data warehousing solution that congregates data from other data sources like S3 dynamo DB and more.
And it allows you to store massive amounts of relational or non-relational, semi-structured or structured data to create a data warehousing solution.
And once your data in Redshift you can use Sql Client tool or business intelligence tools or other analytics tools to query that Data and find out important information about your data warehouse and with in the console you can launch your Amazon redshift cluster
select the number of nodes that you want their storage size and other parameters to create your data warehousing solution.
Another cool feature with the red shift is its tool called red shift spectrum. Which allows you to query your redshift cluster that has sources of S3 data.
So essentially it allows you to query your S3 and then you can use tools like quicksight to create charts and graphs to actually visualise that data.
5. Amazon Time stream :-
Now the next way to store that I want to discuss is Amazon Time stream and this was actually announced in reinvent 2018.
we talked about Time series data and this is data that is from things like IOT devices, RT systems, smart industrial machines things like server logs or any other time series data.
So think about stock market prices, Amazon Time stream is a fully managed time series database service and allows you to plug in business intelligence tools and run SQL queries on your time series data.
6. DocumentDB :-
Last data store that we will discuss is DocumentDB.
Now this was announced at the beginning of 2019 and All DocumentDB is a place to migrate your MongoDB data. it provides better performance and scalability than your traditional MongoDB and sensors that are running on something like EC2.
Trending AI/ML Article Identified & Digested via Granola by Ramsey Elbasheer; a Machine-Driven RSS Bot