Exciting Announcements at WWDC 2012: New MacBook Pro, Mountain Lion, and iOS 6
The cat is finally out of the bag. Apple announced some amazing new hardware and software at the WWDC 2012 keynote. There were some expected...
4 min read
Anurag : Oct 31, 2017 10:00:00 PM
Machine learning (ML) is one such field of data science and artificial intelligence that has gained massive buzz in the business community. Developers and researchers are coming up with new algorithms and ideas every day. These ML technologies have also become highly sophisticated and versatile in terms of information retrieval. Machine learning applications range from banking to healthcare to marketing. Unfortunately, we haven’t reached a level of artificial intelligence where we can say that our algorithms are hundred percent accurate. Machine learning enabled Computers aren’t as smart as humans and we need rigorous coding to make them capable of showing some level of intelligence. That said, data-driven companies are working hard to get the best from their algorithms by aiming for relevant results with the highest accuracy possible. But is accuracy really all what you should be aiming for or it’s just a fad? Let’s look at an example to understand this:
Before diving into the details of machine learning algorithms it is important that you understand the basics and standard terms that will be used in this blog. At first, you might feel overwhelmed with the information but rest assured that it is not as complicated as it might appear. The outputs from any classification algorithm can be classified as follows:
Accuracy = (TP + TN) / (TP + TN + FP + FN)
Read More: Using Machine Learning to Predict Sentiments
Here we are going to analyze a classification model to understand accuracy. Say we have a classifier to identify spams and it shows following results:
Classified Positive | Classified Negative | |
Actual Positive | 10 (TP) | 15 (FN) |
Actual Negative | 25 (FP) | 100 (TN) |
In this case, accuracy = (10 + 100) / (10 + 100 + 25 + 15) = 73.3%. Looks like a decent algorithm. Now let’s see what happens when we switch it for a dumb classifier that marks everything as “no spam”:
Classified Positive | Classified Negative | |
Actual Positive | 0 (TP) | 25 (FN) |
Actual Negative | 0 (FP) | 125 (TN) |
Now accuracy = (0 + 125) / (0 + 125 + 0 + 25) = 83.3%. You saw what happened? Although we moved to a dumb classifier, with exactly zero predictive power, yet, we saw an increase in the accuracy. This is called the accuracy paradox.
In a case where TP < FP, then accuracy will always increase when we move a classification rule that always gives “negative” output. Similarly, in the case where TN < FN, the same will happen when we move to a rule that always gives “positive” output. Fortunately, there is a way to solve this issue. Here comes precision, recall, and F1 to the rescue:
Precision = TP/TP+FP
Precision is the ratio of correctly predicted positive values to the total predicted positive values. This metric highlights the correct positive predictions out of all the positive predictions. High precision indicates low false positive rate.
Recall = TP/TP+FN
The recall is the ratio of correctly predicted positive values to the actual positive values. Recall highlights the sensitivity of the algorithm i.e. out of all the actual positives how many were caught by the program. High recall means that an algorithm returns most of the relevant results (whether or not irrelevant ones are also returned)
F1 Score = 2*(Recall * Precision) / (Recall + Precision)
It is the weighted average of Precision and Recall. At first glance, F1 might appear complicated. It is a much more sophisticated metric than accuracy because it takes both false positives and false negatives into account. Accuracy is suitable only when both false positives and false negatives have similar cost (which is quite unlikely). Precision, Recall, and F1 Score offer a suitable alternative to the traditional accuracy metric and offer detailed insights about the algorithm under analysis.
Read More: 5 Machine Learning Trends to Follow
A common aim of every business executive would be to maximize both precision and recall and that in every way is logical. But machine learning technologies are not as sophisticated as they are expected to be. Any algorithm can be tuned to focus on one metric more than the other. Either your algorithm can be sensitive or it can be precise. The importance of a metric depends on your business goal.
For instance, in case of an algorithm for fraud detection recall is a more important metric. It is obviously important to catch every possible fraud even if it means that the authorities might need to go through some false positives. On the other hand, if the algorithm is created for sentiment analysis and all you need is a high-level idea of emotions indicated in tweets then aiming for precision is the way to go.
The ultimate aim is to reach the highest F1 score but we usually reach a point from where we can’t go any further. Whenever you decide to create a machine learning algorithm keep your priorities defined from the very beginning. Hopefully, our guide on precision vs recall would help you define your targets.
At NewGenApps we specialize in developing Machine Learning applications whether on mobile or web. If you have a project like this then feel free to get in touch.
The cat is finally out of the bag. Apple announced some amazing new hardware and software at the WWDC 2012 keynote. There were some expected...
Have you ever noticed just how many businesses and brands are starting to pop up on the various different social media platforms and want to know if...
Choosing the right programming language is the most crucial thing for the developers in today’s time. You need to choose a language which is robust...