By implementing these vulnerability assessment and vulnerability management best practices you will reduce the attack surface of your infrastructure.
We’re human, and many things we build aren’t perfect. That’s why we take our cars for a periodic inspection, or why we have organizations certifying that products are safe to use.
Software is no different.
Over the last decade, software has increased in complexity, and we have become used to constant updates that ship with more error fixing than new functionality.
Software errors vary from common annoying bugs, to edge cases that can give a regular user the power to bring down the whole internet. And they often include vulnerabilities that, combined with misconfigurations, can be exploited to compromise your whole infrastructure.
We recently covered how a vulnerable container can be exploited, enabling a lateral movement attack that can compromise a whole cloud account. We also looked at how CVE-2021-20291 in CRI-O and Podman can completely take your hosts out of service if a malicious actor places a crafted image in your repositories.
If software vulnerabilities are inevitable, is there anything you can do about them?
By following these 10 vulnerability assessment and vulnerability management best practices, your infrastructure will be less open to potential attacks, and newly discovered vulnerabilities won’t distract you as much from shipping your applications.
1. Enable vulnerability assessment and management with image scanning
Before you can assess how badly vulnerabilities are affecting your infrastructure, and before you start building an action plan to manage them, you need to know what vulnerabilities you’re dealing with.
Automated image scanning is the tool that enables this process.
Scanning everything you deploy in production, and comparing it against known vulnerabilities databases, provides a picture of what vulnerabilities affect us. It’s the first step to assess what our security posture is.
It’s good to review image scanning best practices from time to time. After all, the deeper your image scanning digs, the easier it will be to assess and manage the vulnerabilities you discover. For example, is your scanner looking just on the images metadata, or does it also check the third-party libraries they contain?
2. Block vulnerabilities from reaching production
Now that we agree that image scanning is basic for vulnerability assessment, the next question is: When should you perform the scans?
The answer is anytime possible.
The sooner you detect a vulnerability, the earlier you can act and the easier it is to fix.
If developers get a warning while writing the software, they can address it right away and the vulnerabilities will never reach production.
Some container image registries, like Harbor, will scan your images directly from the registry. And strengthening security even further, they can block your services from pulling the images that don’t pass the scan.
And finally, to cover for images that may be deployed from external repositories, you can perform image scanning at the Kubernetes admission level.
3. Catch up with new vulnerabilities
We just covered how image scanning can be used to prevent known vulnerabilities from reaching production. But what about those that haven’t been disclosed yet?
A continuous scan of your runtime workloads will alert you on newly discovered vulnerabilities. So, you can hopefully implement a fix before they are exploited.
Keep in mind that image scanning does not replace runtime security. It takes some time to discover vulnerabilities. Until then, they can only be mitigated by detecting abnormal behavior in your runtime. We’ll cover this later on.
4. Scan your cloud tasks
Serverless services, like Fargate or the newly introduced AWS App Runner, are trending. By leveraging them, teams can deploy their containers without worrying too much about the infrastructure. That way, they can focus on what’s really important; shipping new features faster.
One could naively think that, as a managed service, you can forget about security. But that’s pretty far from the truth. Check this example scenario, where a vulnerable container can enable lateral movement to compromise a whole cloud account.
Cloud providers operate under a shared responsibility model. So, providers will secure the underlying service, providing fixes for low-level vulnerabilities like Meltdown and Spectre. Meanwhile, you’ll secure the apps you deploy on that service. You can dig deeper into this subject by checking the ECS Fargate threat modeling.
Whether via a CI/CD Pipeline or from a container registry, it is a vulnerability management best practice to scan the cloud tasks you deploy on serverless services.
5. Don’t forget host scanning as part of your vulnerability assessment
It would be easy to forget about the host that is actually running your containers. After all, access to them is highly secured, making it nearly impossible to exploit a vulnerability on a host.
As a result we often see how teams implement host scanning just to check a box in their compliance validation. Not integrating the results of the scan into the vulnerability assessment and management process renders host scanning useless.
Keep these two scenarios in mind:
- After compromising one container, an attacker will look for a vulnerability that allows to escape the container. If they succeed, they may exploit vulnerabilities in the host to perform cloud lateral movement, compromising your whole infrastructure.
- Some vulnerabilities, like CVE-2021-20291 for CRI-O and Podman, can be exploited without access to your hosts. Uploading a crafted image to your registry is enough to successfully perform a DoS attack on affected hosts.
It is a vulnerability assessment best practice to treat host scanning at the same level as regular image scanning.
6. Define global policies for vulnerability assessment and management
There are all kinds of vulnerabilities, should you block them all?
Without some global policies, your team may take too long to address important vulnerabilities, and spend too many resources on non-important ones.
The main metric you can use to prioritize vulnerabilities is the severity they are reported on.
You may decide to let low severity vulnerabilities pass, but block from deployment all images containing vulnerabilities with a severity higher than medium.
But what about the vulnerabilities affecting runtime workloads?
If you are lucky you may be able to just kill that service for a while. However if you cannot disturb the service right away, you can implement some mitigation measures. For example, further isolating vulnerable services with tools like network policies, or moving them to their own cluster. You can also put runtime security policies in place to detect when a vulnerability is exploited.
Beyond the reported score of a vulnerability, you should also assess how it truly affects you. A medium severity vulnerability may require a very specific, unusual configuration. But, if you happen to have deployed that configuration, that vulnerability may be of critical severity for you.
And lastly, cover part of your development process in your policies.
Implementing dockerfile best practices can highly reduce the attack surface of your containers. For example, the
ubuntu:xenial-20210114 image may contain around one hundred vulnerabilities that are inherited by others that use it as a base.
If you build your own company-wide distroless images, you’ll gain two benefits:
- By including only what is needed, you can highly reduce the number of vulnerabilities inherited from base images.
- By reusing base images, if a vulnerability is found, you can fix all the affected images at once.
Summarizing, there are too many questions to be answered by your teams when a vulnerability is found. Company-wide policies can help them react faster and more efficiently. Remember to cover the following points in your policy:
- How to handle pre-deployment vulnerabilities: When to block? When to pass?
- How to handle runtime vulnerabilities: When to kill? When to isolate?
- How to isolate a vulnerable service?
- Development practices to reduce attack surface and speed up response times.
- Like using company-wide base images.
- Define a versioning strategy.
7. Include updates as part of your vulnerability management process
The most common fix for a vulnerability is to update to a newer version.
However, if you didn’t upgrade in a while, that can mean a notable development effort to address the breaking changes that may have been introduced between versions.
Another thing your company policy can address is how to handle versioning of software to smooth these updates. A policy like that should cover:
What versions should everyone use: Whether we are talking about libraries, applications, or base images, if everyone is using the same version, it’s easier to keep track of vulnerabilities and fixes can be applied by everyone at the same time.
When should people update: It’s understandable not wanting to update to the latest version as soon as it is released. However, never upgrading leads to deploying year-old libraries, adding to the technical debt of your team.
You should define some guidelines so people can confidently upgrade. For example:
- Apply security patches as soon as they are out.
- Upgrade minor releases two weeks after they are released.
- Upgrade major releases one month after release.
Plan end of life: Software versions only receive security updates for a while. After that, you’ll have to upgrade to a major version. As this usually includes addressing breaking changes, it’s better to plan ahead and perform the big upgrade when it causes less disruption.
This is more painful with newer technologies, like Kubernetes, where the releases are more frequent and ship with bigger changes to the core components.
Starting with Kubernetes 1.19, the support window was increased to one year. If you upgrade to the latest version, you will be able to apply security updates for a year without having to worry about breaking changes.
Initiatives like that make it easier to integrate updates into your development process. For example, you can pick the least busy month of the year, and dedicate it to both upgrading software components and addressing some technical debt.
8. Make your vulnerability assessment and management reports useful
There is a huge gap between performing inventory of your vulnerabilities and actually fixing them. The way information is presented and shared dictates how fast and efficiently that gap is closed.
As a rule of thumb, if notifying the appropriate person about a vulnerability takes too much time or effort, you are missing an opportunity.
For critical vulnerabilities, you’ll want to alert the appropriate teams as soon as possible.
Some best practices to make alerts useful:
- Avoid noise: Do only alert on those items that need immediate attention.
- Be surgical: Only alert the people that need to take action.
- Use the appropriate channels: Your team might not read an urgent email in time.
- Provide context: If the alert contains all the needed information, like what image is affected, what namespace, or what cloud task; it will save critical time.
For not so urgent vulnerabilities, reports can come quite handy.
You’ll want your reports to be complete and granular.
Complete enough so you can validate the compliance of your whole infrastructure without having to dig between several spreadsheets and tools.
And granular enough, so you can easily group vulnerabilities by team, service, or cloud account, and help you decide where you should start acting first.
An example would be providing a weekly report for each team, consolidating all their assets that contain the information they need to plan the fixes.
When it comes to reporting, tools can be too opinionated. Make sure you can take something useful from your reporting tools, beyond checking a box in your compliance requirements.
Also, what good are reports if you spend hours working with several tools to craft them? Look for tools that either consolidate all your vulnerabilities, or that play well with the other tools you are using.
9. Alert on vulnerable configurations
It doesn’t matter that you patched all your software vulnerabilities if your configuration is leaving the door open.
It is a vulnerability assessment best practice to check and alert on vulnerable configurations.
Most image scanners can check for exposed ports or leaked credentials. Leverage those features.
When static configuration is managed through a code repository, it can pass the same QA processes that the rest of your software goes through. Avoid changing configurations without a review process.
And for cloud services, you can leverage tools like cloud custodian to alert on misconfigurations, like having S3 buckets accessible to the public.
Network security is specially sensible, as it’s the main tool to isolate services and block malicious actors from performing lateral movement. Take special care on monitoring changes on network configurations.
10. Plan for vulnerability exploits
We covered the limits of vulnerability management several times in this article. From vulnerabilities not yet discovered to those that we haven’t fixed yet, and also vulnerable configurations, there is a lot that can go wrong.
Luckily, the behavior of microservices is rather predictable, so it’s easy to spot when a vulnerability is exploited. That is, if you are looking for it.
Runtime security tools can read system events, so they can detect suspicious things like shells spawning on containers, new processes spawning, or configuration changes.
Like Falco, the cloud-native runtime security project, that is the de facto Kubernetes threat detection engine.
So, you should leverage runtime security tools.
Going one step ahead, vulnerability management and runtime security tools should enable the forensic investigation that will take place after a vulnerability is exploited.
Imagine that a vulnerability in a container is exploited, your runtime security tool detects it and immediately kills the container to block the attack.
Even if you get notified via alerts, how can you investigate the incident to prevent it from happening again? After all, once you kill the container, it doesn’t exist anymore.
Knowing that the container had vulnerabilities can significantly narrow the investigation, but you’ll need actual evidence to reconstruct the incident. These features in security tools can help you:
Capture system activity: Runtime security tools rely on system events to keep track of what’s happening on your system. Some tools can go further and record those events in the time around a security event. With them, you can know what processes the attacker ran, what files they changed, and what network activity was involved.
Activity Audit: Sometimes, you don’t need all the system activity of a machine to investigate an event. Keeping separate audit logs per workload can be a less noisy complement. You can check the audit log of a specific container, then jump into a system event capture if you need to fill the gaps.
Correlated data: In the cloud and microservices world, incidents are rarely isolated. You may find a crypto miner running in a container, kill the container, and call it a day.
But if you could watch security events from your whole infrastructure in a single overview, you may also find that, right before the attack, one of your developers signed without multi-factor authentication. Maybe what was compromised wasn’t just a container, but some user credentials. The investigation just gained another dimension.
Look for tools that help you correlate misconfigurations with security events.
Software is not perfect and that causes security issues. Thankfully, with image scanning and other best practices, we can mitigate the impact of vulnerabilities.
Blocking vulnerabilities is not always enough, so keep an eye on runtime security.
And overall, look for security tools that make your life easier. Better reporting and faster response times can mark the difference between a security incident and becoming a trending topic.
Implement vulnerability assessment and vulnerability management with Sysdig Secure
Sysdig Secure consolidates image and host scanning, speeding up fixes and making it easier to validate compliance across your whole infrastructure. And it’s radically simple to deploy.
With Sysdig Secure for cloud, you can continuously flag cloud misconfigurations before the bad guys get in, and detect suspicious activity like unusual logins from leaked credentials. This all happens in a single console, making it easier to validate your cloud security posture. And it only takes a few minutes to get started!
Start securing your cloud for free with our Sysdig Free Tier!