Chapter 2: Questions & Answers

Original Source Here

1. Where do text models currently have a major deficiency?

Text models currently struggle to produce factually correct responses when asked questions about factual information. They can generate responses that appear compelling to laymen but are entirely incorrect. This problem is attributed to the current challenges in natural language processing which include contextual words, homonyms, synonyms, sarcasm, and ambiguity.

  • Contextual words: refers to words that carry different meanings that depend on the context of the sentence, such as “running” to the store and “running” out of milk.
  • Homonyms: refers to words that are spelled and pronounced the same but have different meanings, such as “bat,” as in the animal and the piece of baseball equipment.
  • Synonyms: refers to words that have the same or similar meanings in some contexts but not all contexts, such as “big” and “large” brother.
  • Sarcasm: refers to sentences that have a positive or negative sentiment by definition but actually imply the opposite sentiment.
  • Ambiguity: refers to sentences that have multiple interpretations, such as “I saw a dog on the beach with my binoculars.”

2. What are the possible negative societal implications of text generation models?

The negative societal implications of text generation are fake news and the spread of disinformation. It can be used to produce believable content on a massive scale with much greater efficiency and lower barriers to entry. It can also be used for harmful activities like spam, phishing, abuse of legal and government processes, fraudulent academic writing, and pretexting.

3. In situations where a model might make mistakes, and those mistakes could be harmful, what is a good alternative to automating a process?

Artificial Intelligence (AI): A subcategory of computer science that builds smart machines to perform tasks that require human intelligence. It allows machines to simulate human perception, learning, problem-solving, and decision-making. It also encompasses machine learning and deep learning.

Augmented Intelligence: A subcategory of artificial intelligence that uses technology to enhance human intelligence rather than replace it. It relieves humans from demanding, time-consuming, and repetitive tasks. It also supports human decision-making but the decisions are made by humans.

  • The best alternative to artificial intelligence is augmented intelligence. It expects humans to interact closely with the model. It makes humans up to 20 times more productive than strictly using manual methods. It also makes processes more efficient and accurate than strictly using humans.

4. What kind of tabular data is deep learning particularly good at?

Deep Learning: A subcategory of machine learning that uses artificial neural networks with three or more layers to learn how to perform a task with increasing accuracy. It involves 4 learning methods which include supervised, semi-supervised, unsupervised, and reinforcement learning.

  • Deep learning is good at analyzing tabular data that contains columns with categorical variables that have high cardinality. It can outperform machine learning algorithms in that case, but it takes longer to train, is hard to interpret, involves hyperparameter tuning, and requires a GPU.

5. What’s a key downside of directly using a deep learning model for recommendation systems?

The key downside of using recommendation systems is that they can only recommend products the user might like rather than the products the user may need or find useful. It only recommends similar products based on things like purchase history, product sales, and product ratings. It also can’t recommend novel products that haven’t been discovered by many users.

6. What are the steps of the Drivetrain Approach?

Drivetrain Approach: A framework that helps design systems that solve complex problems with deep learning. It emphasizes using data to produce actionable outcomes rather than just generating more data in the form of predictions. It also uses the following 4-step process to build data products.

  1. Define the clear outcome you are wanting to achieve
  2. Identify the levers you can pull to influence the outcome
  3. Consider the data you will need to produce the outcome
  4. Identify the models you can use to achieve the outcome

7. How do the steps of the Drivetrain Approach map to a recommendation system?

  1. Outcome: to capture additional sales by recommending products to customers that wouldn’t have made the purchase without receiving the recommendation.
  2. Levers: the methods that are used to choose the recommendations that are shown to customers.
  3. Data: identify the recommendations that cause new sales by conducting randomized experiments that test a wide range of recommendations for a wide range of customers.
  4. Model: two separate models that predict the purchase probability for products based on whether the customers are shown the recommendation or not.

8. Create an image recognition model using data you curate and deploy it on the web.

Binder: A web service that converts the notebook documents in a specified repository into a web application. It creates sharable notebooks that can be accessed by anyone with a single click. It also runs the notebooks on its own virtual machine that stores all the files that are needed to run in the cloud.

  • Fastai suggests deploying prototypes as a web application using Binder.

9. What are DataLoaders?

DataLoaders: A class that separates the data in the Datasets object into mini-batches. It can shuffle the data before the data is separated. It can pass the mini-batches to the Learner object using multi processing. It sets the dataset parameter to the Datasets object to specify the data. It also returns the mini-batches in the DataLoader objects in the DataLoaders object.

10. What four things do we need to tell Fastai to create DataLoaders?

Data Block: A class that stores all the preprocessing steps to prepare the dataset for the model. It sets the blocks parameter to two or more of the TransformBlock classes to specify the input and output data types. It sets the get_items parameter to a function to specify how to get the data. It sets the splitter parameter to a split function to specify how to split the training and validation sets. It sets the get_y parameter to a function to specify how to get the label values. It also returns the DataBlock object.

  • The DataBlocks class needs the blocks, get_items, splitter, and get_y parameters to be specified to create the DataLoaders object.

11. What does the splitter parameter to DataBlock do?

Splitter: A parameter in the DataBlock the class that specifies how to split the dataset into the training and validation sets. It expects one of the split functions that Fastai provides for splitting the dataset in different ways.

12. How do we ensure a random split always gives the same validation set?

Random Seed: A number that enables the random number generator to initialize the weights with the same numbers in the same exact sequence each time. It allows users to reproduce their results by using the same code, data, and weights. It also allows users to optimize the performance of the model by experimenting with the code using the same data and weights.

13. What letters are often used to signify the independent and dependent variables?

Independent Variable: A variable that represents the input values that are passed to the model to predict the output values. It directly affects the output value as its value changes. It doesn’t depend on any other variables in the equation. It also gets represented as the X letter in the equation.

Dependent Variable: A variable that represents the label values that are passed to the model to predict the output values. It changes as the value of the independent variable changes. It depends on the independent variable in the equation. It also gets represented as the Y letter in the equation.

14. What’s the difference between the crop, pad, and squish resize approaches? When might you choose one over the others?

Crop: A technique that saves a portion of the image that fits in a square shape of the specified image size. It helps improve the performance of the model by adding images to the training set where the object isn’t fully visible. It can also lose important details in the image that are cropped out.

Pad: A technique that resizes the image to the specified image size while preserving the aspect ratio. It helps create the square shape that the model expects by adding black pixels to the shortest sides of the image. It can also create blank spaces and lower the resolution of the useful part of the image.

Squish: A technique that squeezes or stretches the image to the specified image size without preserving the aspect ratio. It helps resize the image to the square shape that the model expects. It can also cause unrealistic proportions in the image that confuses the model and lowers the accuracy.

  • Fastai suggests randomly crop different areas of the image to help the model learn to focus on the objects in different sizes and places in the image. It can also help present the images in a way that reflects the real world where the same object is framed differently in different images.

15. What is data augmentation? Why is it needed?

Data Augmentation: A technique that artificially increases the size of the training dataset by creating modified versions of the images in the dataset. It can involve flipping, rotating, scaling, padding, cropping, moving, and resizing images. It also helps prevent overfitting when training the model.

  • Data augmentation is needed because it helps prevent overfitting.

16. Provide an example of where the bear classification model might work poorly in production, due to structural or style differences in the training data.

The bear classification the model may work poorly in production because the images in the training set are different than the images in the real world. It contains images that were downloaded from the internet that display the bears much more clearly and artistically than they appear in the real world.

17. What is the difference between item_tfms and batch_tfms?

Item Transforms (Item_Tfms): A parameter that applies the specified Transform functions to the images in the dataset before separating them into mini-batches. It also performs the transformations on the CPU.

Batch Transforms (Batch_Tfms): A parameter that applies the specified Transforms functions to the mini-batches after resizing and separating them from the dataset. It also performs the transformations on the GPU.

18. What is a confusion matrix?

Confusion Matrix: A table that helps visualize the performance of the model. It displays the predicted values in the columns. It displays the label values in the rows. It also displays correct predictions in the matching rows and columns and incorrect predictions everywhere else.

19. What does export save?

Export: A function that saves the trained model to make predictions in production. It saves everything that’s needed to build the Learner object using the pickle protocol which includes the architecture, weights, and biases, and definitions that specify how to create the DataLoaders object.

20. What is it called when we use a model for making predictions, instead of training?

Inference: A process that uses the trained model to make predictions about unseen data. It makes predictions by performing the forward pass without including the backward pass, calculating the error, or updating the weights.

21. What are IPython widgets?

IPython Widget: An graphical user interface that enhances the interactive experience in Jupyter Notebook. It includes elements like buttons, sliders, dropdowns, text boxes, and progress bars. It also allows users to control and visualize changes in the data by executing functions in response to events.

22. When would you use a CPU for deployment? When might a GPU be better?

The CPU is the best option when the price is the primary concern. It can do a decent job at performing one inference at a time. It can be cheaper to rent CPU servers because there’s higher market competition than GPUs. It can also be cost-effective when there’s low volume and speed isn’t important.

The GPU is the best option when performance is the primary concern. It can process much larger quantities of data in less time than CPUs. It can perform multiple inferences in parallel at the same time. It can also be cost-effective when there’s enough volume to perform inferences in batches.

23. What are the downsides of deploying your app to a server, instead of to a client (or edge) device such as a phone or PC?

The textbook provides four downsides to deploying the model to a server.

  1. It requires users to have an internet connection to use the model.
  2. It causes delays while the data transmitted to and from the server.
  3. It requires protecting the sensitive data that’s uploaded by users.
  4. It adds overhead for managing, scaling, and protecting the server.

24. What are three examples of problems that could occur when rolling out a bear warning system in practice?

The textbook provides a few examples that are caused by out-of-domain data.

  1. It might detect bears correctly but take too long to be useful in practice.
  2. It might detect bears incorrectly and trigger false alarms.
  3. It might not work because the training and production data don’t match.

25. What is out-of-domain data?

Out of Domain Data: Data that’s largely different in some aspect from the training data. It can lead to unexpected behaviors by the model that create all kinds of problems in practice. It can also be mitigated by using carefully thought-out processes and by doing first-hand data collection and labeling.

26. What is domain shift?

Domain Shift: A problem that occurs because the production data changes over time until it no longer represents the training data that was used to train the model. It can cause the model to be less effective and potentially ineffective. It can also be partially mitigated by using a thought-out process.

27. What are the three steps in the deployment process?

  1. Use a Manual Process: The first step is to manually perform the entire process by humans. It runs the model with human supervision to check the predictions and identify problems. It also doesn’t use the predictions to drive any actions.
  2. Limit the Scope: The second step is to limit the scope of the model and supervise the deployment. It runs the model in a small geographical area for a limited length of time. It also uses the predictions that are approved by humans to drive actions.
  3. Expand the Scope: The third step is to expand the scope of the model. It runs the model in larger geographical areas and gradually reduces the level of supervision. It also requires good reporting systems to address actions that are different from the manual process.


Trending AI/ML Article Identified & Digested via Granola by Ramsey Elbasheer; a Machine-Driven RSS Bot

%d bloggers like this: