Identifying Threat Hunting opportunities in your data
Published Oct 15 2019 08:56 AM 6,764 Views

Key Principles

Starting Point, Data Knowledge and Learning, Pivoting

Starting Point

For this article, our starting point will be Self-identified or Free Hunting opportunities


A few items we need to understand when we Free Hunt:

  • What do you already know about your data?
  • What do you need to learn about your data?
  • How do you pivot in the data from one entity to other interesting entities so you can build a picture or story of what occurred?

We want to learn from our data, build queries from the learning to identify interesting hunting opportunities and then pivot from those items you identify to build an attack sequence or story.


Data Knowledge and Learning


Data Knowledge

Leverage your practical knowledge

You already have some information about your environment, and you may know all the datatypes (logs) you have along with the interesting properties in those data sets, but it is best to clearly document and understand what you know.

To start to build an understanding of the norms for your data, ask yourself these questions (among others) and this can lead you to hunting methods.


  • What do you know about your environment?
  • What datatypes do you have and what properties (entities) do they have in them?
  • What data types can I bring together to clarify activity when hunting?

Build a matrix of what you have and identify common entities and then you can target areas of opportunities.

Here is an example of a matrix we put together:




Data Learning

Learn from your data

Ask yourself some questions that help you identify an area of interest.  For example:


  • What’s anomalous for my environment? – We should never see an inbound connection via public IP address connecting to my internal web server
  • What is rare activity? – It would be rare for this account type to access this system type except at a given time or during a given exercise like patching.
  • What is common? – How can we identify what is common and what is rare, without knowing it is?
  • What is expected? – It is normal to see this type of download on my SharePoint server.


The first 2, Anomalous and Rare for the environment are really what we are trying to get to with our hunting, so let’s look at Common and Expected.  How do we identify what is Common or what is Expected in the environment?  How do we exclude what is expected?


Focus on a specific datatype

In this case, we chose the SecurityEvent datatype available in Azure Sentinel as this has a rich set of information related to process executions and as it is one of the more commonly collected data types, it is likely available to most Azure Sentinel users via Log Analytics.  What do we want to know about this data?  Let’s use this question related to Commonality (or Rarity) - “How can we identify what is common and what is rare, without knowing it is?”.  We can use simple counts and distributions to understand this, but what about using something like Entropy to understand what is Rare in my data?


Side Note: We had a question asked to my team about using Entropy in determining Process rarity for Security Event data related to Azure VMs.  I am not an expert in this, so I did some research and found a nice simplified explanation of Shannon Entropy which I used to create a query that shows me Process Rarity for a given VM and across all VMs in the Azure environment.




We won’t go through the specifics of the Azure Sentinel hunting query that was built for this, just know that it allowed me to see something interesting in the data that I may not have seen as it could have been lost in the noise of data.  Rare, but maybe not rare enough to show up in other algorithms with more constraints.  This query is not meant to be an alert, it potentially provides False Positives, but that’s fine…we are hunting after all!


Again, this is just one example of understanding your data, you can apply the same concepts to any datatype to identify a starting point for your hunting.


Now that we understand our data and we have a query that gives me a view into an interesting data point, what next?

Use the matrix you built above to:

  • Identify common entities across datatypes (logs)
  • Identify the datatypes you think are most likely related and interesting

Then Pivot to:

  • Provide additional context
  • Build out the attack sequence or story
  • Eventually build custom alerts and hunting queries and where appropriate contribute to the community



Process related pivoting


Let’s see this in action

Starting with the home-grown query from above and then pivoting into associated datatypes to reveal the security story. (The names have been changed to protect the innocent.)  Of note, the lower the ProcessEntropy and Weight, the rarer it is. Below are the results of my ProcessEntropy query.  Sorting by Weight and ProcessEntropy values that are the lowest, we can see some rare processes executing. Near the bottom of the list, we see something more interesting.


Pstest.exe and IEX on the command line is unexpected. IEX is a PowerShell cmdlet, so clearly this coming from a process other than PowerShell is odd, so this call coming from a process other than PowerShell is odd.




Now that we have the entities (TimeGenerated, Computer, Account and so on) listed above, we are probably interested in what all occurred around this time.  I can take existing queries in Azure Sentinel and modify them slightly for my needs.


For example, we may be interested in what occurred along with the activity identified in my ProcessEntropy query above and we will want to be able to decode that string so we can verify if it is malicious or expected behavior.  By looking at the Analytics tab in Azure Sentinel we can see various associated queries for the Data Source (datatype) of SecurityEvent.




Specifically, look at New processes observed in the last 24 hours hunting query and use that as my starter.  Simply choose that query and select View Results which will take you directly into the Log Analytics blade.

Modify the query to include only the Computer we are interested in.  In this query, we aggregate the Computers into one field, so we would use the following to get only the Computer we are interested in:


| where Computers has "GodzillaProd1"


Additionally, we want to decode the base64 string to determine what is occurring.  Two methods of parsing the string are shown below, one via the parse operator and another via the split function, and then a base64 decode function used. Additional methods of decoding are also available when using Jupyter and MSTICPY.


| parse ProcessCommandLine with * "FromBase64STring('" parse_Base64 "'" *

| extend split_Base64 = tostring(split(tostring(split(ProcessCommandLine, "('")[-1]),"')")[0])

| extend split_Decoded = base64_decodestring(split_Base64) 


When we run the query, we can see the decoded results, and this looks potentially bad.  Generally, downloading information gathering tools, such as Sysinternal tools, via encoded scripts is something done to hide activity.  We will want to investigate this further:




Now that we are pretty sure this is at a minimum unwanted behavior or more likely malicious, we need to build the attack sequence or story.


What are my next steps? – Let’s ask questions about the entities we have.  Some questions we would ask:

  • What datatypes do we have where we can see what happened related to network connections so we can get the related IP?
  • What datatypes do we have that show group memberships for the account?
  • What datatypes do we have in cloud specific services that can show us activity related to the account?
  • How far widespread is this?

To help understand what datatypes contain your interesting entity, search on them, just be sure to scope the time properly.

For example, we can see which datatypes have information about one of the interesting accounts:




Network related Pivoting


Let’s look at how the VM shows up

In the cloud data and specifically some network details. 

We know the Computer name where the initial hits were determined and so we can look at the Heartbeat table to see what the IP is for the VM.




At this point, we can easily pivot into the network data, in this case WireData related to the IP of the VM. 

A good thing to look for is to see what Outbound connections have occurred and if there is outbound data transfer.

We have a query for that looks for outbound connections and it looks for only public connections by excluding connections to internal IPs:


let PrivateIPregex = @'^127\.|^10\.|^172\.1[6-9]\.|^172\.2[0-9]\.|^172\.3[0-1]\.|^192\.168\.';


| where TimeGenerated between (ago(starttime)..ago(endtime))

| where Direction == "Outbound"

| extend RemoteIPType = iff(RemoteIP matches regex PrivateIPregex,"private" ,"public" )

| where RemoteIPType =="public"


The query results show that we do have connections to an external IP with around 9mb of data sent.




We next want to see if your curated Bring Your Own Threat Intel (BYOTI) has any references to this IP so we can further validate malicious activity.




You may be asking why no alert was fired for this.  There potentially could have been an alert, but in this case, this was in custom BYOTI that was added after the malicious activity already occurred, so it was not available at the time of the attack.  Therefore, it is important to use multiple methods across time to identify malicious behavior.


We now know that this is malicious, and we need to understand what else happened. 


Account Related Pivoting


Let’s look at what groups may be involved

The “baduser” and “shainw” accounts identified initially will provide us our pivot point.  We have some built in hunting queries that will help us.




The great thing about this query, is we have already included WellKnownLocalSID and WellKnownGroupSID mappings for you to see if there was addition to a high privilege group.  As you can see below, the “bad user” was added to the Builtin\Administrators group by the “shainw” account.




Now let’s see what the cloud specific logs show

For the accounts we are interested in, what do we know:

  • We know the “shainw” account added in the “bad user” account
  • We can assume “shainw” is compromised
  • We know this account has Azure level access, just not sure exactly what level.

We have this information from the Security Events:




The same concept of groups applies to AzureAD, we need to see if the “shainw” account added any users to any groups in Azure.  Here we will look for when a member is added to a group in Azure and specifically Initiated by our known high privileged account.


As you can see, the user added to the CloudAdmins group.  Not good.




Things seem to be getting worse and now you remember that your WireData showed a good size of data sent from the VM that was compromised.


Data Exfiltration

Now let’s see what might have been taken. 

There are potentially many datatypes to look at, but we will be smart and identify where this baduser shows up first, before digging in any deeper.  Here is where we can use the search feature again.  We can clearly see below that the account shows up in many logs and very interestingly in OfficeActivity.




Probably getting a bit nervous now that we know there is a compromise and maybe data was pulled from the environment, but what data? 


We are most interested in the OfficeActivity data as we know emails and documents flow through that space.  Looking up the UserKey for our account indicates some very useful information was accessed by the adversary during the attack.




How widespread is this?

As we are getting a bit long on this investigation, we will only do one more pivot back into the VM and on-prem side.  Let’s just have a quick look at what Computers and Resources this account may have shown up on:





In summary, discovery of malicious activity can be found in many ways, this is just one method used by Threat Hunters at Microsoft.  Go find your next adversary and share your findings with the community.  Some of the activity identified in this blog is generally available as Detections or Hunting Queries in the Azure Sentinel GitHub.  There is opportunity for other Detections and Hunting Queries that we will continue to produce over time.  Feel free to improve the current queries or come up with your own queries for pivoting.


Our mission: There are always new attack techniques

  • Go Discover it
  • Create a Detection or Hunting query for it
  • Share the query with the community GitHub
  • Your query may eventually be included in the Detection or Hunting sections of Azure Sentinel









Version history
Last update:
‎Oct 15 2019 08:55 AM
Updated by: