Practice quiz: Clustering
第 1 个问题:Which of these best describes unsupervised learning?
【正确】A form of machine learning that finds patterns using unlabeled data (x).
【解释】Unsupervised learning uses unlabeled data. The training examples do not have targets or labels "y". Recall the T-shirt example. The data was height and weight but no target size.
第 2 个问题:Which of these statements are true about K-means? Check all that apply.【未选全部答案】
【正确】If you are running K-means with K=3K=3 clusters, then each c^{(i)}should be 1, 2, or 3.
【解释】c^{(i)}describes which centroid example(ii) is assigned to. If K=3K=3, then c^{(i)}would be one of 1,2 or 3 assuming counting starts at 1.
第 3 个问题:You run K-means 100 times with different initializations. How should you pick from the 100 resulting solutions?
【正确】Pick the one with the lowest cost J
【解释】K-means can arrive at different solutions depending on initialization. After running repeated trials, choose the solution with the lowest cost.
第 4 个问题:You run K-means and compute the value of the cost function J(c^{(1)}, …, c^{(m)}, \mu_1, …, \mu_K)after each iteration. Which of these statements should be true?
【正确】Because K-means tries to maximize cost, the cost is always greater than or equal to the cost in the previous iteration.
【解释】The cost never increases. K-means always converges.
第 5 个问题:In K-means, the elbow method is a method to
【正确】Choose the number of clusters K
【解释】The elbow method plots a graph between the number of clusters K and the cost function. The ‘bend’ in the cost curve can suggest a natural value for K. Note that this feature may not exist or be significant in some data sets.
Practice quiz: Anomaly detection
第 1 个问题:You are building a system to detect if computers in a data center are malfunctioning. You have 10,000 data points of computers functioning well, and no data from computers malfunctioning. What type of algorithm should you use?
【正确】Anomaly detection
【解释】Creating an anomaly detection model does not require labeled data.
第 2 个问题:You are building a system to detect if computers in a data center are malfunctioning. You have 10,000 data points of computers functioning well, and 10,000 data points of computers malfunctioning. What type of algorithm should you use?
【正确】Supervised learning
【解释】You have a sufficient number of anomalous examples to build a supervised learning model.
第 3 个问题:Say you have 5,000 examples of normal airplane engines, and 15 examples of anomalous engines. How would you use the 15 examples of anomalous engines to evaluate your anomaly detection algorithm?
【正确】Put the data of anomalous engines (together with some normal engines) in the cross-validation and/or test sets to measure if the learned model can correctly detect anomalous engines.
【解释】Anomalous examples are used to evaluate rather than train the model.
第 4 个问题:Anomaly detection flags a new input xx as an anomaly if p(x)<ϵ. If we reduce the value of ϵ, what happens?
【正确】The algorithm is less likely to classify new examples as an anomaly.
【解释】When ϵ is reduced, the probability of an event being classified as an anomaly is reduced.
第 5 个问题:You are monitoring the temperature and vibration intensity on newly manufactured aircraft engines. You have measured 100 engines and fit the Gaussian model described in the video lectures to the data. The 100 examples and the resulting distributions are shown in the figure below. The measurements on the latest engine you are testing have a temperature of 17.5 and a vibration intensity of 48. These are shown in magenta on the figure below. What is the probability of an engine having these two measurements?
【正确】0.0738 * 0.02288 = 0.00169
【解释】According to the model described in lecture, p(A, B) = p(A) * p(B).