YARA matching in Malzoo Serverless

TL;DR

Malzoo Serverless now collects YARA matching results from submitted samples via a custom container image for AWS Lambda. Users of Malzoo Serverless can add their own rules to the designated folder, rebuild the Docker image locally and push this to AWS for use by the deployed Malzoo Serverless stack. Hooraay!

YARA? YARA? What the hell is YARA?!

From the homepage:

YARA is a tool aimed at (but not limited to) helping malware researchers to identify and classify malware samples. With YARA you can create descriptions of malware families (or whatever you want to describe) based on textual or binary patterns. Each description, a.k.a rule, consists of a set of strings and a boolean expression which determine its logic.

It’s one of the most known tools out there for malware analysis with dozens of blogs, videos, commercial trainings, free trainings and open source rules available from security researchers and organizations.

Malzoo & YARA

Malzoo matches submitted samples with rules in the designated rules folder for a few years now via the Python library. The design of Malzoo Serverless is to only use AWS Lambdas to analyze the binaries uploaded to the malware S3 bucket. YARA isn’t a native Python library that is used to analyze samples but rather a complete application on it’s own that needs to be present.

AWS Lambdas started supporting custom Docker images from users to execute the Lambdas with. Malzoo Serverless uses this feature to build a Docker image that has the prerequisites for Lambda and comes with YARA installed. Rules are built in the image for performance. This means that when a new rule is added, the Docker image needs to be rebuild and pushed with AWS SAM.

Samples are submitted by uploading them to the S3 bucket that’s created when the Malzoo serverless stack is deployed with AWS SAM.

aws s3 cp sample.bin s3://malzoo-serverless-v1-arn-malware/

And the distributor Lambda will add any sample to the YARA worker queue. Once the YARA worker has the analysis results, these are stored in the DynamoDB database in a list.

Architecture overview

malzoo-new-architecture

Example analysis

The JSON blob below is what Malzoo Serverless stores in DynamoDB for PE32 executable files. The sample is from the WannaCry family, a nasty little ransomware most of the Cyber Security folks are familiar with.

{
  "md5": {
    "S": "84c82835a5d21bbcf75a61706d8ab549"
  },
  "imphash": {
    "S": "68f013d7437aa653a8a98a05807afeb1"
  },
  "imports": {
    "SS": [
      "ADVAPI32.dll",
      "KERNEL32.dll",
      "MSVCRT.dll",
      "USER32.dll"
    ]
  },
  "comp_time": {
    "N": "1290243905"
  },
  "filesize": {
    "N": "3514368"
  },
  "sha1": {
    "S": "5ff465afaabcbf0150d1a3ab2c2e74f3a4426467"
  },
  "filetype": {
    "S": "PE32 Executable"
  },
  "yara_matches": {
    "L": [
      {
        "S": "Win32_Ransomware_WannaCry"
      },
      {
        "S": "WINDOWS_EXECUTABLE_0"
      },
      {
        "S": "ZIP_ARCHIVE_0"
      },
      {
        "S": "MS_COFF_OBJECT"
      },
      {
        "S": "ZIP_ARCHIVE_2"
      }
    ]
  }
}

Use cases

Build your own malware static analysis capability with no IT operations. Once you have a nice library of data, YARA matches can be an additional clustering field to group malware samples. Happy analyzing :)