We live in a world that is hyper communicative, with much of this communication occurring on the Internet.
On the Internet, companies/products want to communicate their value to customers and people want to communicate with other people.
Open Source Intelligence (OSINT) is a byproduct of these online communications, and as the quantity of these communication continue to increase, the resources that yield OSINT will also increase.
In hacking (a term that I include Red Teaming and Penetration Testing in), we weaponize information through the application of our experience, intelligence, strategy, creativity and tooling.
This capacity is a major distinction that separates an experienced hacker from most other users on the Internet.
For example: if a normal user accesses log files on a misconfigured web server and an experienced hacker accesses those same logs on that same host, what is the difference between the two users in this crude example?
The hacker has the capacity to convert the data on those log files into a resource that could allow them to access the web server and/or other hosts on the network (or beyond it).
OSINT is the real world equivalent of these metaphorical log files mentioned above, except it seems like every user and company on the Internet is writing their own log files…
And they seem to leave these log files they write for themselves all over the Internet.
OSINT is usually gathered for the purpose of analysis/reporting and is refined through any number of well defined processes. This yields intelligence that can be applied toward answering a specific question or deciding/reinforcing an appropriate response or path of action.
Below: The image provides an examples of a processes used to gather, analyze and refine OSINT.
OSINT as a hacking/Red Team/penetration testing resource can yield a great number of resources that can be leveraged for enumeration and exploitation during an operation/engagement.
This data is usually much more information dense then the output provided by tooling such as a portscan or vulnerability scanner.
However, this data also tends to require more time to process effectively…it generally cannot be put into direct action as quickly as other data types due to the extra analysis/critical thinking needed to process it.
This is because OSINT often consists of what I call “layers”; sometimes these layers consist of data that targets contribute themselves.
Examples of the layers that targets can contribute to OSINT include things like metadata, overt/underlying conditions conveyed by the the wording of a target’s expression(s) and opportunities to better identify/quantify a target’s relationships (and the condition/nature of those relationships).
This is the real trick of making OSINT work during offensive operations/engagements: extracting the most relevant advantages possible from the layers of data provided without the time periods or logistical resources that most conventional processes for working with OSINT assume.
For an added bonus, OSINT tends to be an excellent resource for providing us with contingencies, which are infinitely useful when making decisions on things like target acquisition.
These contingencies can be a real boon for recouping time/energy lost in pursuing rabbit holes and/or running into dead ends, all of which happen to the best of us from time to time.
Two of the personal strategies I utilize most when working with OSINT during engagements include “Key and Layer” and “Contingency Seeding”.
I will demonstrate both processes (as well as some other methodologies, resources and tactics I use when working with OSINT) by detailing a real world example from an engagement I ran against a corporate client.
Key and Layer, Contingency Seeding: Maximizing OSINT for Hacking/Red Team/Penetration Testing
During an engagement I utilized Google Dorking to discover the archives of an internal mailing list that belonged to the target company; it was publicly accessible on the Internet.
This was an ongoing archive that contained well over a decades worth of internal emails, some of which were employee to employee communications and others that were employee to client/customer communications.
This archive contained thousands of emails, the earliest of which dated back to 2001 with the most current dating just days before the engagement began.
By accident or on purpose, the majority of companies I’ve engaged keep some manner of OSINT rich archives sitting somewhere on the Internet, publicly available.
I found the archive in the image above after reading online user manuals for a network solution the target developed. I knew from reading blogs on the target’s sites that they used this solution in most of their own networks besides selling it as a managed and unmanaged solution to their customers/clients.
One of the manuals for this solution stated that it used default/hardcoded credentials with a username that I had never seen before; a fair approximation of this username would be XYZuser.
Searching the Internet for material that contained the default/hardcoded password used alongside the username, I used manual Google Dorking against the string intext:xyzuser.
Below: Manual Google Dorking against the string intext:xyzuser uncovered an indexed entry that led to the e-mail archive. This entry also displayed the username I searched for plus its matching password (redacted in red).
The archive included a “Search this archive” function, which I used to search the archive for keywords like “password”, “credentials” and “Administrator”.
A search using the word “password” located hundreds of archived emails like the one pictured below (many of which contained credentials).
We will use the “Key and Layers” process against one of the archived emails I found (shown below) to demonstrate the capacity of the technique to help extract many of the advantages/resources an instance of OSINT can provide.
The first step of this process is to evaluate this instance of OSINT for the primary, most visible advantage/resource it provides (the “Key” advantage).
NOTE: I had already established a Key value for the archive as a whole at this point, based on the set of credentials I found via the Google search indexed entry above: the Key value of the archive was that it contained emails containing credentials.
The Key value is important for many reasons, a few of which include:
1) If it is your first time reviewing a single instance of OSINT and you can’t find a strong Key value that leaps out at you, it may be time to move onto another instance (or perhaps another source all together) for the time being (and perhaps come back to this one later).
PLEASE NOTE: Where the Key and Layer technique is concerned, an example of an OSINT instance would be any of the individual emails stored by this archive.
An example of an OSINT source would be the entire archive of emails itself and all the functionality the archive entails.
2) This Key advantage helps to insure our investment of time in investigating an instance or source of OSINT; offensive engagements/operations are dynamic challenges where time is ultimately the enemy.
*Should an event outside of our control force us to suddenly abandon an OSINT source/instance after establishing a Key value, at least we have yielded some tangible value from that source/instance.
Over time, this strategy will pay dividends: we are ensuring that we are gaining advantages/resources from the material we choose to invest our time/attention in, while more quickly putting aside the material with layers that are not yielding these advantages/resources.*
This is especially important when working post exploitation in LAN, where your investment of attention elsewhere could (essentially) leave an active session out in the open. If we’re going to put a session (and the implant/infrastructure that maintains it) in a position where there is an increased probability of it being detected or disrupted (perhaps accidentally by an unsuspecting user), then we should be ensuring we maximize the reward to balance the added risk.
3) I’ve found that the Key value concept helps me gain better value from the materials I do set aside; it helps me move through a greater quantity of OSINT content more quickly, while also helping me maintain/define a sense of the value these resources could provide.
This is because defining the reasons an OSINT instance/source does not have a Key value also helps me defines the resources it does have.
This process helps me create a store of resources that can be drawn from at any point during an engagement. This allows me to understand how these resources could be used to connect the dots, even if the data doesn’t represent a dot itself.
Reading through the archived e-mail below, I establish the Key value: the email possesses another set of hardcoded/default credentials (redacted in red, a different set of credentials from those redacted above) for the same network solution, with information in the body of the email quantifying the relative effectiveness of it.
As is common with instances of OSINT, there are many more layers of advantages/resources present.
However, we also have Contingency Seeding occurring: given the context of this e-mail, the credentials it contains seems like a good path to investigate/follow.
Should this set of credentials lead us to a dead end or toward a disadvantageous situation, or should we learn from other recon that these credentials do not have the value we think they do, we have a second set of credentials to utilize (the credentials found in the Google Dorking search entry).
With the Key value already established (the credentials redacted in red), let’s work through the archived email layer by layer; when using “Key and Layer”, I work from the top of the source down to the bottom.
A layer constitutes every section of a source where resources/advantages are present.
Starting from the top of the e-mail (below), we have the full name of the target’s employee and their e-mail address; though the archive has tried to sanitize the e-mail address, it still hints at the naming convention the target’s internal/corporate e-mail addresses use ([email protected]).
The date/time in the email was close to the date range when this engagement was occurring.
This helped quantify the relevance of a couple of things: it helped to ensure the effectiveness of the credentials that were included in the email and the situation surrounding them (the technical difficulty associated with changes to those credentials) which was addressed in the e-mail.
Also, given the thousands of emails present in the archive, we could search all of the emails this employee sent/responded to and quantify all of the days/times associated with those e-mails.
This would leave us with a statistical range as to when the employee may or may not have been active on their corporate email/corporate host.
As this archive would not contain all the employee’s emails, this statistical range would not be perfect.
However, should we ever gain access to this employee’s credentials/corporate host, this range could:
Help us lower the probability of being detected when accessing the employee’s email account or corporate host to utilize it as a resource (read the emails their account contains, launch social engineering attacks at other employees from it, utilize/search the employee’s host for resources with a Meterpreter session, etc.).
Also, if we wanted conduct attacks like taking screenshots of this employee’s online sessions, this statistical range could help us activate implants on their host during days/times the employee was most likely to be active.
Finally, if the archives show statistical norms when this employee e-mails other employees, this could help launch us social engineering attacks at those times/on those days against those employees, which can help with adding an extra layer of feigned authenticity to the attacks.
Now, we move on to another layer of resources: the body of the e-mail itself, a layer added by the target’s employee…as is often the case with OSINT, this is the most valuable intelligence we can find.
This is where the Key resource is located, provided by the words of the target’s own employee: active (at the time at least), hardcoded/default credentials in an e-mail that is destined to be stored on the public Internet within hours or days.
The employee continues adding valuable layers to this OSINT by explaining why his fellow employees should not change these default credentials, furthering the statement by saying it is unlikely the target or it’s clients/customers (both of whom make heavy use of the target’s products) will go through the trouble of doing so.
The body of this e-mail has helped solidify the value of the credentials it provides, ensuring that there is a high likely hood that any hosts enumerated hosts that are this product will be accessible via these credentials.
Logging into a host always carries the danger of being detected, but logging into a host after multiple failed attempts is generally much worse.
The body of this e-mail could aid in crafting social engineering attacks that appear to mimic this employee’s writing style, word choices, demeanor and formatting; impression of these elements could likely be improved through studying any number of emails in the archive that belong to this employee.
This study could include noting historic differences/similarities in how this employee interacts with certain employees/clients over many emails, with targets of social engineering attacks that impersonate this employee targeting employees/clients that display a dynamic we feel can be leveraged.
For social engineering attacks against this employee, we know Alexander likes to go by Alex. For social engineering attacks that pose as them, we know the name the name they go by and how he prefers to sign off on his correspondence (using “Best,”, which we can be verify by studying other emails that belong to this employee in the archive).
We could use this email and others in the archive that belong to this employee to build custom wordlists with Cewl to bruteforce any resources attributed to them.
Moving on to the layer shown above, the archived emails that belonged to the target’s employees seemed to always include the internal corporate email signature of each participant (this varied where the target’s clients/customers are concerned).
These signatures pretty much mangled the scant security measures that were taken to obscure information like internal corporate emails.
Here, we have data that helped us quantify the viability/quality of this instance of OSINT: since this employee is a System Architect, there is the highest reasonable probability that the data is factual, given the employee’s position in the company.
We also have other data helpful in attempting other attacks (especially social engineering attacks) such as an address, phone number, fax and this employee’s division.
This layer also adds extra value to other data found earlier: having discovered the facility this employee works at, we could take an established statistical range of the days/times this employee worked using the headers of their archived e-mails and use that to make a safe assumption what days/times the majority of that facility’s employees work.
This data could be used to to improve the probability of successfully launching any number of attacks against the facility’s network infrastructure or the employees themselves.
The archives attempts to hide the employee’s email addresses are a failure due to the presence of the employee’s email signatures:
We already have the naming convention of this employee’s email address taken from headers.
The obscured e-mail address is repeated here above the target company’s domain name.
The number of Xs used to hide the remainder of the employee’s email address equals the same number of characters used to spell the company’s/target’s domain name in the website address field after the www (the company’s emails could also be found using Hunter.io and this method recreates the employee’s email address exactly).
We also have the layout/appearance of the target company’s internal e-mail signature and their confidentiality agreement; both of these could aid in producing convincing phishing/spearphishing emails that appear to originate from within the company.
We have the employee’s phone number; we could attempt to gain access to this employee’s voicemail box, possibly leveraging data from this and other emails in the archive to do so.
For examples: employee ID numbers are often the default code used to secure an employee’s voicemail; we could search archived emails belonging to this employee to see if the ID numbers were ever an included value in the email headers/signature or mentioned by this employee or other employees/clients/customers.
Since there is no cellphone field in the signature, this could be a deskphone within the target company’s facility; if a BlackHat could gain access to the employees voicemail box, PBX type exploitation for profit could be a possible.
The BlackHat PBX attack scenario adds value to the idea of quantifying times/dates listed in the header of this/other archived emails belonging to this employee to help establish a statistical range of normal working days on/days off and hours on/hours off, as well as the likely norms worked by other company employees at that facility.
PBX exploitation for profit often involves a BlackHat exploiting PBX employee voicemail message forwarding functionality, causing company phonelines to call pay by minute (1-900) numbers belonging to the BlackHat, charging the company for the time elapsed via repeated calls; some BlackHats have earned hundreds of thousands of dollars over a weekend by doing this…it helps to utilize this attack when company employees are off and not near their desk phones.
Due to the presence of the fax machine number, we now know a fax machine is likely present at the target facility, which is an added, viable attack vector.
This adds a third contingency via Contingency Seeding: our first contingencies should utilize the credentials that have been found. Should these credentials not work, another set of contingencies could involve trying to locate fax machines on the target’s perimeter to attack; or, if we already have a foothold in this facility’s LAN, we could try to locate the fax and utilize it for post exploitation activities.
Since there is no deskphone field in the email signature, this employee may only use a cellphone or softphone for work, which could create an avenue for other attacks against them.
For instance, if we somehow discover that the employee uses a cellphone instead of a deskphone, we could attempt possibly attempt SMS based social engineering attacks them.
Or, if we learn that the employee uses a softphone, we could attempt to enumerate the software used and attempt to leverage any vulnerabilities that the software possesses at some point of this engagement/operation.
Above: The final layer of data for this instance of OSINT is the “Other related posts” field.
The messages in this thread could allow us to discover departments, networks and facilities where we can best utilize the credentials we have found to access hosts in the target’s network.
For instance, if messages in this section contains emails where company employees state having the network solution in play, we could enumerate what department/location they work at through data contained in their email signature (and possibly the email itself or other emails in the archive).
The same strategy could also apply to any customers/clients involved in this dialogue. If their email signatures didn’t leak identifying information in any of this section’s threads, we could also attempt to search emails in the archive for the necessary information.
Then it becomes a matter of enumeration to find the hosts that data in an email or emails points at.
Perhaps an employee in one of emails listed in this section states they constantly have to work around the problem these credentials present (confirming the network hosts are present in their network); this employee’s e-mail signature states they work in the target’s Engineering Department in another facility.
We could enumerate the target’s domains/subdomains with tools like Amass, Aquatone, Fierce or Sublis3r and/or sites like DNSDumpster, searching for domains/subdomains that seem likely to be part of the Engineering department (looking for naming conventions containing words like “dev” or “staging” are a good start).
Image above: OWASP Amass DNS Enumeration output enumerates a target IP range using words in the naming convention indicative of Engineering/Developer department subdomains.
After locating the necessary target domains/subdomains, we could then search these IP in Shodan to see if the necessary network solution is present pending other enumeration, such as portscans of the target’s full IP ranges.
As DNS and other enumeration commences, we could also utilize Shodan search filters to expedite the process of finding these hosts (we could use the archive to isolate attributes that allow us to do so).
This is especially helpful, as Shodan search filters allow us to search by a huge number of parameters, examples of which include state, country, port, type of software and network IP range.
NOTE: Daniel Miessler’s overview of Shodan search filters can be found here https://danielmiessler.com/study/shodan/
We could also use the e-mails that make up this thread to further isolate the best targets: what are the employees/customers/clients saying about the credentials and the situation surrounding them?
Are they working on fixing their dependence on these credentials regardless of the difficulties?
Are employees from any of the target’s departments/locations stating that they are just fine with the credentials or do they state they’ve resigned themselves to the situation?
This type of data is excellent for Contingency Seeding; let’s say we read the emails linked in this section and established some contingencies based off them.
Contingency Seeding often includes establishing different levels of viable targets…we will state this example from a BlackHat’s perspective, where all of the company and it’s clients/customers are fair game):
Target 1 - The second, confirmed set of credentials vs. hosts/networks where employees/customers/clients have stated/hinted the credentials are used and they have no plans of fixing their dependence on them…search for these networks using the data contained in employee email signatures and any clues that may be included in emails contained in this thread.
Contingency Target 1 - The second, confirmed set of credentials vs. hosts/networks where employees/clients/customers have stated/hinted the credentials are used and they plan on remediating the issue, but they have not begun this process…search for these networks using the data contained in client/customer email signatures and any clues that may be included in emails contained in this thread.
Contingency Target 2 - First set of credentials found in Google search engine entry (above, redacted in red) vs. hosts/networks where employees/clients/customers have stated/hinted credentials are used and they have no plans of fixing their dependence on them…use these credentials against hosts where the other confirmed credentials have failed (these may have been networks identified using the data contained in employee email signatures and any other clues that may have been included in emails contained in this thread).
Contingency Target 3 - First set of credentials found in Google search engine entry (above, redacted in red) vs. hosts/networks where employees/clients/customers have stated/hinted credentials are used and they plan on remediating the issue, but they have not begun this process…use these credentials against hosts where the other confirmed credentials have failed (these may have been networks identified using the data contained in employee email signatures and any other clues that may have been included in emails contained in this thread).
Other resources/advantages present in the layers of “Other related posts”:
Reading all of the emails in this thread may present opportunities to improve the probability of successful social engineering attacks.
For instance, if messages became heated between two participants in this thread, we could fashion a social engineering attack leveraging sentiments like “you were right, I found this document (really a malicious attachment). I apologize…” or “I looked into it for you and I think I found a solution, which I attached (really a malicious attachment)”.
An attack of this type could be made especially potent given our access to both the thread and archived e-mails.
The archive is likely to include a wealth of materials that could help us impersonate the target company’s employees (or even it’s clients/customers) through studying (and then mimicking) idiosyncrasies like: individual word choices, the layout of their writing, habitual spelling mistakes, favored expressions, the aesthetics conveyed by individual email signatures, changes of demeanor between interactions with different employees (or different clients/customers for that matter) and quotes used in their e-mails, all garnished by details like the company privacy disclosure.
While the archived emails likely lack some of the aesthetic qualities of the company’s emails, these can be learned/mimicked by searching the web for images of company emails, signing up for company resources via temporary email (with Guerilla Mail perhaps) that are available to anyone (such as promotional materials or employment opportunities).
At the time of this engagement, the email addresses contained in this thread were only dated days/weeks before the beginning of the engagement.
This gave me the opportunity to use tooling like Spiderfoot or Pwned (which queries Have I been Pwned?) to locate any of the emails in the thread that had been included in a credentials dump recent to the start of the engagement.
If Spiderfoot stated that any of the emails in the thread were associated with or used to create external accounts on other sites, it may be worth seeing if any of those sites had recent dumps associated with them as well.
If any of the target’s employees/customers/clients had credentials in these dumps, there is always the chance they are reusing passwords between accounts…this could translate to taking creds from a recent dump and using those credentials to access an employee’s/client’s/customer’s corporate email account.
It’s always possible that by using dumped credentials to access other accounts belonging to an employee/customer/client that we could eventually gain access to a internal corporate account .
Perhaps the account is linked to or contains intelligence/data that grants us access to an internal corporate account (for instance, I have found corporate email credentials saved in the “Drafts” folder of other external email accounts).