How to build, train and deploy your own ML algorithms on AWS SageMaker from scratch



Original Source Here

Before moving forward, clone the project repository using the following command.

git clone https://github.com/osemrt/AWS-SageMaker

Now we are ready to implement the workflow above step-by-step. The numbers before the titles specify where we are in the workflow. I am excited to start it, how about you? 🙂. Without further ado, let’s jump into implementation details.

1. Upload the training data to Amazon S3 Bucket

The first step in the workflow is to upload the training data to a S3 bucket. If you don’t know how to create a bucket, visit the following link:

After creating the bucket, go to the project GitHub repository, download the dataset folder and upload it into the bucket.

2. Push the docker images to Amazon Elastic Container Registry (Amazon ECR)

SageMaker uses Docker containers for runtime tasks. Containers are used to train and deploy machine learning algorithms. Using containers we are able to deploy quickly at any scale. In this step, I will show how you can push your inference and training docker images to ECR (Elastic Container Registry). First, we should have an access key and secret key pairs to use the AWS Command Line Interface. To create your access keys, go to the following link:

When you have your own access and secret keys, type aws configure command in terminal and enter your keys, default region, etc. In my case, I have the output below.

Now, we need two repositories in ECR to push the training and inference images. To create the repositories, go to the link below.

We’ve done! you should see your repositories like below.

Before pushing our images, we should say where Docker uploads your images when you execute docker push command. First, type aws ecr get-login — no-include-email command to get your temporary authentication code. Your output will be like that:

Copy the command result and paste it into your terminal. You will be able to see the login succeeded message. Once you have logged in, you are ready to push your Docker images into ECR.

We will go to the inference and training folders under the container folder, then build images and push them. In each of these folders contains a Dockerfile where we define our base images and which commands should be executed when the containers start running.

Let’s start with the training image. Inside the training folder, run the following command with the repository and tag name. You can give any tag name you want when you build images. In my case, I’ve preferred to give them the latest tag.

docker build — tag <repository_name>:<tag_name> .

Execute the same command in the inference folder.

You can see the created images when you exeute docker image ls command.

Now, it is time to push the inference and training images into ECR. Execute the following command for the training and inference images:

docker push <repository_name>:<tag_name>

Be patient, uploading your images to ECR could take time. When it is completed, you will be seeing the images in the repositories.

3. Train the model

There are many ways of starting a training job in SageMaker. You can start it from a python script, Jupyter Notebook, Lambda trigger, etc. however, in this post, we will do it from a Java Sprint project.

Go to the service folder in the Github repository to access the project codes. Make sure you change the following configurations in the application.yml file with your own and install dependencies in pom.xml. As the names suggested, the keys without comments are clear what you will type but I have commented out for others.

amazon:
accessKey:
secretKey:
region:
sagemaker:
training:
s3Uri: s3://amazon-sagemaker-s3bucket/dataset/
trainingJobName: myTrainingJob
roleArn: # your SageMaker execution role arn
s3OutputPath: s3://amazon-sagemaker-s3bucket/model/
channelName: training
trainingImage: <account_id>.dkr.ecr.us-east-2.amazonaws.com/training:latest

Once the project is running and up, send a post request to start training.

Click Training jobs from the left panel in SageMaker to see the status of your training job and wait for it to be completed.

When it is completed, SageMaker automatically uploads the files under /opt/ml/model after the zipped them.

4. Create Model

Go to Models under Inference in the left panel and then click Create Model button. Give a model name, specify an IAM role and choose the “Provide model artifacts and inference image location” option.

In this step, we specify the inference container image URI and the S3 path of the model artifacts. When we give them to create a SageMaker model, the files inside of the path are downloaded under the folder /opt/model in the inference container. The script server will be running to serve the endpoints inside predicator.py.

You will be seeing the model with its name.

5. Create Endpoint Configuration

After you’ve created the model, create an Endpoint Configuration and specify the s3 path of your model. It will let you specify which model will be added to the endpoint and what AWS instance to run it on. It is up to you how many instances and what instance types you will use. In this post, I will use the default configurations.

6. Create Endpoint

Before serving the model endpoints, the final step includes creating a SageMaker Endpoint. The endpoints that you obtain in this step will be called by AWS Lambda. Give an endpoint name, choose the “Use an existing endpoint configuration” and then select the model that you’ve just created.

The endpoint name will be used in Lambda to call it, therefore copy it for now.

7. Use Lambda Function to make predictions

We have made the SageMaker endpoint ready, it is time to create a Lambda function to invoke it. To create the lambda function, first, give a function name, choose “Python 3.6” as runtime, and click the “Create Function” button.

Give the endpoint name you defined in the previous step to your Lambda function as an environment variable.

Then paste the following python script into your Lambda function and click the “Deploy” button.

Check if the Lambda function works, running a test case like below.

If you did everything, you will be seeing your prediction result corresponding to the data you have. For my feature array [2, 42, 1, 3], the model says that it is setosa.

8. Call the Lambda function from API Gateway

This is the last step in the workflow. In this step, we will create an API and make it public for everyone. To do that, go to the Amazon API Gateway service, create API, choose the following options, and give an API name.

Click the Action dropbox and choose “Create Resource”, give a resource name, and click “Create Resource”.

Under the resource, you just created, create a Method by clicking the same dropbox. And choose Lambda Function, region, and type your Lambda function name.

Click okay in the popup screen that lets you know that the API gateway will obtain permission to call the Lambda function. And test your API Gateway. Give the feature array in a proper JSON format to get the prediction.

If everything is okay for you, let’s deploy the API. Click “Deploy API” under the window that opens when you click “Actions”. Give a stage name and that’s it. You now deployed the API.

Go to the “Stages” section from the left table, click your API name and copy your invoke URL to use your API.

Test your API

Now we have successfully trained and deployed our own custom model on SageMaker. You are now ready to test your Gateway API anywhere you want using the invoke URL. When I send a POST request to the invoke URL with the data, I am able to get a prediction. If you do so, Congratulations!! you’ve completed the steps and created your ML pipeline on SageMaker :).

References

AI/ML

Trending AI/ML Article Identified & Digested via Granola by Ramsey Elbasheer; a Machine-Driven RSS Bot

%d bloggers like this: