Automating incident response lets IDT take battle to the enemy
Security staffers had their hands full dealing with a constant inflow of attacks against the company's infrastructure.
Sorting out real attacks from false positives, cleaning up malware, and ensuring that infections didn't spread could take hours -- or longer -- for a single incident. Meanwhile, every additional minute that an infected machine stayed on the network was that much more opportunity for the attackers to bury themselves deep or to make lateral jumps to other machines.
By automating the incident response process, IDT was able to reduce the time before the infection was quarantined, shorten the remediation cycle, reduce investigation time, and free up security staff to go after the bad guys themselves.
Quicker quarantine
At the end of 2013, it took about 30 minutes to isolate an infected device and remove it from the company's network, said Golan Ben-Oni, IDT's CSO and senior vice president of network architecture.
"Because of the danger of what happens when a compromised asset sits on the network, we wanted that time to be reduced from about 30 minutes to just seconds," he said.
To do this, the company used the application programming interfaces from Palo Alto Networks, its firewall vendor, and Splunk, its big data analytics platform.
Previously, a WildFire alert would be sent to the company's security information and event management system, at which point a security professional would manually isolate the suspicious host and start looking for the downloaded malware file.
Now, the WildFire alert is delivered to Splunk in about one second. Within seven seconds, Palo Alto isolates the device, the user gets an alert that their machine is being investigated, and the WildFire alert is sent on for analysis.
"We might get an alert from a user analytics platform that a user ID was being used improperly, or that malware was detected on an end user device," said Ben-Oni.
Ready remediation
Then the company turned to the remediation process which previously took more than eight hours of manual labor.
Those alerts that scored high and were most likely to be real and not false positive are now handled automatically.
"We locate all the newly downloaded files and initiate forensics on memory and disk to try to identify more information about the event," he said. "Once we've collected all this, we'll go ahead and image that system to our forensic capture platform, and re-image it, bringing that system to a golden image."
It takes about five minutes to collect the initial round of data, he said, then another 30 minutes to collect all the disk information for deeper forensic analysis.
Then the computer, and all user files, are restored and a user can get back to work within about an hour.
The process takes longer if a user is working remotely and doesn't have access to the company's 10 Gigabit network.
"So, for the mobile workforce, we actually do something else," said Ben-Oni. "We'll direct them to a workspace in the cloud."
After the user is back at work, the machine is watched for the next 48 hours.
"We make sure that that host cannot execute any code that we did not install -- a white list -- because there's always an opportunity that the host will get reinfected after you reimage it and reinstall user files back on the system," he said.
For production servers, the process is even faster. If there's a secondary system in the environment, the infected server is simply taken offline and the backup goes to work, with no impact on delivered services.
"In the production environment, we have automation tools on Amazon and VMware to spin up new hosts or change the load balance configuration to direct traffic to backups or hot standbys," he said.
Incident investigation
Each high-priority, high-fidelity alert processed automatically would save an employee up to nine hours of work, or more.
Time that they could now spend investigating alerts that require human investigation.
There were plenty of these alerts coming into IDT every day, alerts that would not be normally considered high-fidelity.
In the past, most of these alerts would have been ignored because there was simply not enough time to handle them.
Over the past couple of years, there were plenty of news headlines about what happens then.
"With many of the data breaches -- like Target, Home Depot and others -- security teams were sent alerts but the teams were unable to determine which were the highest risk," said Muddu Sudhakar, co-founder and CEO at security vendor Caspida.
This is the kind of thing that keeps IDT's Ben-Oni up at night.
"When we started to manually investigate them, it became clear that many of these alerts were actually very serious," Ben-Oni said. "Are we seeing everything we need to see And once we do see things, are we reacting to them appropriately For us, reacting to them appropriately means reacting to every event, and determining if they are significant."
But even with automated remediation, IDT still didn't have enough resources to investigate 80 to 90 percent of all the alerts coming in.
And the investigations that were involved were very time-consuming. Analysts had to pivot between different systems, look at the context of what was happening on the machine and networks, and sandboxing and analyzing code.
Automation was needed here, as well, and now IDT turned to another vendor, Hexadite.
Today, Hexadite receives the alert within a second after it comes in, and sends behavioral data to Palo Alto and other sandboxes for analysis.
A full behavioral alert is ready within 18 seconds, and other information is collected in the next 40 to 60 seconds, he said.
The entire alert investigation process now takes a total of one and a half minutes, and those alerts that turn out to be significant are funneled into the automated remediation process.
For user workstations, a confidence level of 95 percent or so knocks it off the network and sends it in for automatic reimaging. For production systems, that happens at a confidence level of around 30 percent, since it's easier to rebuild them quickly.
There are still occasions when real people need to get involved, Ben-Oni added.
"But what they're looking at right now is a clearer storyboard of what actually happened," he said. "They get the results of a full automated investigation on their screen."
Hexadite came in about six months ago, he said. It took about a week to get started with the first set of 20 to 30 machines, Ben-Oni said. The system was extended to cover the rest of the company's infrastructure in stages.
At the end of the day, automation was not a choice, but a necessity, he said.
"As a public organization, it's incumbent on me to do this," he said.
Not everyone is ready to go this far in automating their security response, however.
"It's not practical in a business setting," said Andy Woods, director of commercial cybersecurity at BAE Systems. The company provides outsourced incident response services.
The big risk, he said, is overreacting to false positives and trying to re-image too many desktops at once.
"It could take down your network," he said. "You could be performing a DDOS on yourself."
In addition, he said, attackers are always innovating and threat indicators change constantly.
It takes a trained analyst to tell whether a threat is real or not, and to adjust indicators as needed, he said.
"Most security professionals are wary of enabling automated 'active' responses that could cause an interruption the very services they're chartered to protect," said Mike Paquette, vice president of security products at Prelert, a security analytics company.
But many organizations are already using automation, such as to automatically block network traffic to known bad sites, or sandboxing networks to detect and block malicious executables.
"I predict that we'll see accelerating adoption of automated incident response over the coming years, guided by the combination of machine learning and human expertise," he said.
That's where Hexadite comes in, said CEO Eran Barak, whose company has been training security analysts for many years.
That deep knowledge of the security analysis process allows the company to go beyond simple rules and indicators to a complex decision tree based on an extensive library of actions.
"What you do as a cyberanalyst, we do it automatically and faster," he said. "We close the loop in seconds instead of hours or days or months."
Barak said that his company has customers for whom it processes hundreds of alerts daily, and others with thousands of alerts. In addition, Hexadite also offers a semi-automated system, where the user has the ability to control the remediation process instead of the appliance triggering it automatically.
Automation has another benefit as well, said Paul Nguyen, CEO at CSG Invotas, another security automation vendor.
"Automation significantly reduces human error which is responsible for 52 percent of data breaches," he said.
Engage the enemy
Meanwhile, at IDT, Ben-Oni said that his security organization is now able to do more than simply react to incoming attacks -- and hope that nothing gets missed.
There is now time to do more, he said. "Maybe on the other side of this, the action side, or the attribution side, learning more about our adversary to better iterate or protect the organization going forward."
And that's just the start, he said.
"We can then enable our security operations center to take the next step, work on attribution and eliminating the source of the threat by working with law enforcement," he said. "And that's where we're going to go into the future."