January 10, 2013

Still Don't Like Our AV Study? A Response to The Critics

Imperva CTO Amichai Shulman:

Let me start by saying that I’m not a big fan of back and forth argumentative discussions taking place in the blogosphere. However, the religious rage that erupted over the past couple of weeks with respect to our paper, Assessing the Effectiveness of Antivirus Solutions, compels me to provide some response.

Trying to avoid dragging the reader through a lengthy text full of complex arguments I’ll try to take this backwards (kind of the “Backwards Episode” from Seinfeld). The bottom line is in fact that many people have questioned the core aspects of our research: choice of malware samples and method of sample evaluation. However, even among those who have questioned our methodology, there seems to be a consensus around our conclusions – that standard AV solutions have reached the point of diminishing returns and organizations must shift their investments towards other solutions that protect organizations from the effects of infection. I have to assume that if our methodology leads us in a logical way to conclusions that are so widely acceptable, it can’t be all that wrong.

Criticism #1:  Sampling
The first part of the criticism targeted our choice of malware samples. Let me again put forward the bottom line – our critics basically claim that our results are so different than theirs because the method we used to collect the samples is incorrect. Let me put this in different words – if attackers choose malware the way AV vendors instruct them to, detection rates become blissful. If attackers choose malware in a different manner, you’re toast.

Poor sampling would be a fair argument to make if we used some mysterious technique for collecting malware that can only be applied by high end criminal gangs. That is, of course, not the case. We used Google searches with terms that get us close to sporadic malware repositories in publicly accessible web pages. We salted that with some links we obtained through sporadic searches in soft-core hacker forums.  We did focus on Russian language forums, but I do not believe that this is controversial.  Meanwhile, the “cream of the crop” was supplied by some links we took from traffic obtained through anonymous proxies. All this collection work was done by unbiased people, those who are NOT in the business of hacking nor employed by antivirus companies.

Moreover if we inspect the claim made by antivirus vendors with respect to what is the “right” set of malware samples, it actually supports our finding. They claim that if you take the sample size they are dealing with – 100K per day, they achieve higher than 90% detection (98% according to one report). That is – they miss 2,000 malware samples out of 100K. How hard do you think it is for an attacker (and I intentionally did not add the term “skilled”) to get his hands on a couple of those 2,000 undetected samples? I should add that all the samples that we included in our statistics—out of the samples that we’ve collected and tested—are those that were eventually detected by a large enough sample of AV products, and that none of them was a brand new malicious code – rather they were all variations and instances of existing malware.

Criticism #2:  Using VirusTotal
The second part of the criticism touches on our use of VirusTotal.com (VT) as a tool for conducting an experiment related to AV effectiveness. We recognize the limitations of using VT, and described those limitations in our paper.  However, bottom line first – we are not the first one to publish comparative studies of AV efficiency or to publish some analysis of AV efficiency based on VT. We drew explicit conclusions that are not put in technical terms but in plain business terms – organizations should start shifting their budgets to other solutions for the malware infection problem.

The first and foremost statement made by critics is “you should not have used VT because they say so." Again, here’s the bottom line – we have used VT in a prudent and polite way. We did not use undocumented features, we did not subvert APIs and we did not feed it with data with the purpose of subverting results of AV vendor decisions (which is an interesting experiment on its own). So basically, our wrongdoing with respect to VT is the way we interpreted the results and the conclusions we drew from them – going against this has no other term but “thought police”. This is of course before mentioning the fact that various recent reports and publications have been using VT for the same purpose (including Brian Krebs). I know that VT do not claim or pertain to be an anti-malware detection tool and that VT is not intended to be used as an AV replacement. However, they cannot claim to only be a collection tool for the AV industry with results provided per sample being completely meaningless. I must add that having an upload / get results API further disproves that claim. I deeply regret being dragged into this debate with VT since I truly value their role in the anti-malware community and have the utmost respect to their contribution to improvements of AV detection techniques and malware research.

One of the most adamant arguments against the validity of VT as a measurement for effectiveness is that it uses the command-line version of AV products and that configuration may not be ideal. I’d like to quote:

  • VirusTotal uses command-line versions: that also affects execution context, which may mean that a product fails to detect something it would detect in a more realistic context.
  • It uses the parameters that AV vendors indicate: if you think of this as a (pseudo)test, then consider that you’re testing vendor philosophy in terms of default configurations, not objective performance.
  • Some products are targeted for the gateway: gateway products are likely to be configured according to very different presumptions to those that govern desktop product configuration.
  • Some of the heuristic parameters employed are very sensitive, not to mention paranoid.

Regarding the first point, I personally do appreciate the potential difference between a command-line version of an AV tool and other deployed versions. However, in terms of signatures and reputation heuristics I don’t really get it. I’d love to see AV vendors explain that difference in details and in particular pointing out which types of malware are not detected by their command line version that are detected by their other version and why. I am certainly willing to accept that our results would have been somewhat different if tested an actually installed version of the product that is not the command-line version. However, I do think that they are a good approximation. If AV vendors claim that this is by far untrue I’d really like to see the figures. Is the command-line version 10%, 50% or 90% less effective than the product?

I don’t see the point in the second argument. Are they really claiming that VT configuration is not good because it is the recommended vendor configuration?

As for the third argument, this is really puzzling. According to this, we should have experienced a high ratio of false positives, rather than the high ratio of false negatives that we have observed in practice.

Quoting again:

VirusTotal is self-described as a TOOL, not a SOLUTION: it’s a highly collaborative enterprise, allowing the industry and users to help each other. As with any other tool (especially other public multi-scanner sites), it’s better suited to some contexts than others. It can be used for useful research or can be misused for purposes for which it was never intended, and the reader must have a minimum of knowledge and understanding to interpret the results correctly. With tools that are less impartial in origin, and/or less comprehensively documented, the risk of misunderstanding and misuse is even greater.

Again, the writer agrees that VT is indeed a tool that can be used for research as long as results are correctly interpreted. Yes, it is possible that we’ve misinterpreted the results. If that is your opinion then argue with our interpretation of the results. Unfortunately most critics chose not to do so, but rather argued that we used the wrong tools.

I could continue, however, I think that I’ve addressed the main criticism against our work and shown that most of it is of immaterial nature. I would like to see a livelier debate around our interpretation of the results and the conclusion – AV solutions attempting to prevent infection have reached a point of diminishing returns and are thus providing attackers with a large enough window of opportunity time-wise and device-wise to penetrate organizations and remain undetected for extremely long periods. It does not mean that we have to throw AV solutions away, it just means that we need to start shifting some of the money towards solutions that detect and prevent the effects of infection.

Authors & Topics:

Share on LinkedIn


Defending a flawed report makes look Imperva childish and suggests they don't know what they are doing. They say AV allows successful attacks even after several years , if its not successful why do you think people are using it ? It is bit reactionary type of software but definitely not what you have suggested in your report. From this blog's comments looks like Imperva will go downwards in future and playing dirty game of throwing mud on others to get its PR while it has its own shortcomings to sort out.

Shahar, "While I do recognize the logic and validity of some of your arguments, I think an outright acknowledgement of the erroneous methodology would have been the best course of action, PR wise."

You have made a critical statement as though it is fact yet have not had the curtesy to advance a single argument as to why you think it is fact. Nobody knows you (so we don't know if you have skin in the game/business motives for your criticism), you're putting nothing on the line, making no relevant arguments, yet expect the reader of your comment to concur with your position.

Wrong way to go, as I see it.

While I do recognize the logic and validity of some of your arguments, I think an outright acknowledgement of the erroneous methodology would have been the best course of action, PR wise.
You could have made the same "business points" without the persecuted martyr tone.
The defensive/aggressive attitude relayed here confronts the general opinion of a great mass in the security professional community, and is expected to further extend the disassociation from the Imperva brand.
Wrong way to go, as I see it.

Verify your Comment

Previewing your Comment

This is only a preview. Your comment has not yet been posted.

Your comment could not be posted. Error type:
Your comment has been saved. Comments are moderated and will not appear until approved by the author. Post another comment

The letters and numbers you entered did not match the image. Please try again.

As a final step before posting your comment, enter the letters and numbers you see in the image below. This prevents automated programs from posting comments.

Having trouble reading this image? View an alternate.


Post a comment

Comments are moderated, and will not appear until the author has approved them.