1. Types of Reconnaissance
1) Active reconnaissance
(1) Port Scanning:
- Risk of Detection: High (Direct interaction with the target can trigger IDS and firewalls.
(2) Vulnerability Scanning:
- Risk of Detection: High
(3) Network Mapping
- Mapping the target's network topology, including connected devices and their relationships.
- Using traceroute to determine the path packets take to reach the target server, revealing potential network hops and infrastructure.
- Risk of Detection: Medium to High
(4) OS Fingerprinting
- Risk of Detection: Low
(5) Service Enumeration
- Risk of Detection: Low
(6) Web Spidering
- Risk of Detection: Low to Medium
2) Passive reconnaissance
- Risk of Detection: Very Low
(1) Search Engine Queries
(2) WHOIS Lookups
(3) DNS
(4) Web Archive Analysis
(5) Social Media Analysis
(6) Code Repositories
2. WHOIS
- Database that stored information about network users, hostnames, and domain names
yeon0815@htb[/htb]$ whois inlanefreight.com
[...]
Domain Name: inlanefreight.com
Registry Domain ID: 2420436757_DOMAIN_COM-VRSN
Registrar WHOIS Server: whois.registrar.amazon
Registrar URL: https://registrar.amazon.com
Updated Date: 2023-07-03T01:11:15Z
Creation Date: 2019-08-05T22:43:09Z
[...]
* Register: The company where the domain was registered
* Registrant Contact: The person or organization that registered the domain
* Administrative Contact: The person responsible for managing the domain
* Technical Contact: The person handling technical issues related to the domain
* Name Servers: Servers that translate the domain name into an IP address.
1) Utilising WHOIS
(1) Phishing Investigation
(2) Malware Analysis
(3) Threat Intelligence Report
- Tactics, Techniques, and Prodecures (TTPs)
* Tactics: High-level goals
* Techniques: Specific methods
* Procedures: Detailed steps
3. DNS
1) How DNS Works
(1) Your computer asks for directions (DNS Query): when you enter the domain name, your computer first checks its memory (cache) to see if it remembers the IP address from a previous visit. If not, it reaches out to a DNS resolver, usually provided by your ISP (internet service provider)
(2) The DNS resolver checks its map (Recursive Lookup): The resolver also has a cache, and if it doesn't find the IP address there, it starts a journey through the DNS hierarchy. It begins by asking a root name server.
(3) Root name server points the way: The root server doesn't know the exact address but knows who does - the TLD (Top-Level Domain) name server responsible for the domain's ending (e.g., .com, .org). It points the resolver in the right direction.
(4) TLD name server narrows it down: The TLD name server is like a regional map. It knows which authoritative name server is responsible for the specific domain you're looking for (e.g., example.com) and sends the resolver there.
(5) Authoritative name server delivers the address: The authoritative name server is the final stop. It holds the correct IP address and sends it back to the resolver.
(6) The DNS resolver returns the information: The resolver receives the IP address and gives it to your computer. It also remembers it for a while (caches it), in case you want to revisit the website soon.
(7) Your computer connects
2) The Hosts file
- Simple text file used to map hostnameto Ip addresses, providing a manual method of domain name resolution that bypass the DNS process.
- This can be particularly useful for development, troubleshooting, or blocking websites.
(1) Redirecting a domain to a local server for development
127.0.0.1 myapp.local
(2) Testing connectivity by specifying an IP address
192.168.1.20 testserver.local
(3) Blocking unwanted websites by redirecting their domains to a non-existent IP address
0.0.0.0 unwanted-site.com
3) Key DNS Concepts
(1) Zone: Virtual container for a set of domain names. (e.g., example.com, mail.example.com, blog.example.com, etc.)
$TTL 3600 ; Default Time-To-Live (1 hour)
@ IN SOA ns1.example.com. admin.example.com. (
2024060401 ; Serial number (YYYYMMDDNN)
3600 ; Refresh interval
900 ; Retry interval
604800 ; Expire time
86400 ) ; Minimum TTL
@ IN NS ns1.example.com.
@ IN NS ns2.example.com.
@ IN MX 10 mail.example.com.
www IN A 192.0.2.1
mail IN A 198.51.100.1
ftp IN CNAME www.example.com.
- Authoritative name server: NS records
- Mail server: MX records
- IP addresses: A records
- IN: Internet protocol suite (IP)
- DNS resolver: Your ISP's DNS resolver or public resolvers like Google DNS (8.8.8.8)
- Root name server: There are 13 root servers worldwide, named A-M: a.root-servers.net
- TLD name server: Verisign for .com, PIR for .org
(2) Record types
a. A:
- e.g., 192.0.2.1
b. AAAA:
-e.g., 2001:db8:85a3:8a2e:370:7334
c. SOA: Start of Authority record: Specifies administrative information about a DNS zone, including the primary name server, responsible person's email, and other parameters.
d. PTR: Pointer Record: Used for reverse DNS lookups, mapping an IP address to a hostname.
4) DNS Tools
(1) dig
- dig domain.com: performs a default A record lookup for the domain
- dig A domain.com
- dig NS domain.com
- dig @1.1.1.1 domain.com: specifies a specific name server to query
- dig +trace domain.com: shows a full path of DNS resolution
- dig -x 192.168.1.1: performs a reverse lookup. You may need to specify a name server
- dig +short domain.com: provides a short, concise answer to the query.
- dig +noall + answer domain.com: displays only the answer section of the query output.
- dig any domain.com: retrieves all available DNS records for the domain (* Many DNS servers ignore ANY queries to reduce load and prevent abuse)
(2) nslookup: simpler DNS lookup tool, primarily for A, AAAA, and MX records.
(3) host: streamlined DNS lookup tool with concise output. Quick checks of A, AAAA, and MX records
(4) dnsenum: subdomain enumeration tool
(5) fierce: subdomain enumeration tool with recursive search and wildcard detection.
(6) dnsrecon: combines multiple DNS reconnaissance techniques and supports various output formats.
(7) the Harvester: OSINT tool that gathers information from various sources, including DNS records (email addresses).
5) Subdomains
(1) Active Subdomain Enumeration
- directly interacts with the target domain's DNS servers to uncover subdomains.
- e.g., DNS zone transfer: due to tightened security measures, this is rarely successful.
(2) Passive Subdomain Enumeration
- This relies on external sources of information to discover subdomains without directly querying the target's DNS servers.
- e.g., CT (Certificate Transparency) logs: public repositories of SSL/TLS certificates. These certificates often include a list of associated subdomains in their SAN field (Subject Alternative Name)
6) Subdomain Bruteforcing
dnsenum --enum inlanefreight.com -f /usr/share/seclists/Discovery/DNS/subdomains-top1million-110000.txt -r
-r: This options enables recursive subdomain brute-forcing, meaning that if dnsenum finds a subdomain, it will then try to enumerate subdomains of that subdomain.
7) DNS Zone transfers
- while brute-forcing can be a fruitful approach, there's a less invasive and potentially more efficient method for uncovering subdomains: DNS zone transfers
yeon0815@htb[/htb]$ dig axfr @nsztm1.digi.ninja zonetransfer.me
8) Virtual Hosts
- distinguishing between multiple websites or applications sharing the same IP address.
# Example of name-based virtual host configuration in Apache
<VirtualHost *:80>
ServerName www.example1.com
DocumentRoot /var/www/example1
</VirtualHost>
<VirtualHost *:80>
ServerName www.example2.org
DocumentRoot /var/www/example2
</VirtualHost>
<VirtualHost *:80>
ServerName www.another-example.net
DocumentRoot /var/www/another-example
</VirtualHost>
(1) Subdomains vs Virtual hosts
- Subdomains: Subdomains typically have their own DNS records. They can be used to organise different sections or services of a website.
- Virtual hosts (VHosts): Configurations within a web server that allow multiple websites or applications to be hosted on a single server. They can be associated with TLD or subdomains. VHosts can also be configured to use different domains, not just subdomains.
(2) If a virtual host does not have a DNS record
- you can still access it by modifying the hosts file on your local machine.
(3) VHost fuzzing
- Websites of ten have subdomains that are not public and won't appear in DNS records. These subdomains are only accessible internally or through specific configurations.
- VHost fuzzing is a technique to discover public and non-public subdomains and VHosts by testing various hostnames against a known IP address.
(4) Server VHost lookup
a. Browser Requests a Website: When you enter a domain name into your browser, it initiates an HTTP request to the web server associated with that domain's IP address.
b. Host header reveals the domain: The browser includes the domain name in the request's Host header, which acts as a label to inform the web server which website is being requested.
c. Web server determines the Virtual Host: The web server receives the request, examines the Host header, and consults its virtual host configuration to find a matching entry for the requested domain name.
d. Serving the Right content
- the Host header functions as a switch
(5) Types of Virtual Hosting
a. Name-based Virtual hosting: This method relies solely on the HTTP Host header to distinguish between websites. It is the most common and flexible method, as it doesn't require multiple IP addresses.
b. IP-based Virtual hosting: This type of hosting assigns a unique Ip address to each website hosted on the server. It requires multiple IP addresses, which can be expensive and less scalable.
c. Port-based Virtual hosting: This can be used when IP addresses are limited, but it's not as common or user-friendly as name-based virtual hosting and might require users to specify the port number in the URL
(6) Virtual Host Discovery Tools
a. gobuster
yeon0815@htb[/htb]$ gobuster vhost -u http://<target_IP_address> -w <wordlist_file> --append-domain
--append-domain: appends the base domain to each word in the wordlist. In newer version of Gobuster, this flag is required to perform virtual host discovery.
-t: increases the number of threads for faster scanning
-k: ignores SSL/TLS certificate errors
9) Certificate Transparency Logs
- public, append-only ledgers that record the issuance of SSL/TLS certificates. Whenever a Certificate Authority (CA) issues a new certificate, it must submit it to multiple CT logs.
(1) Searching CT logs
a. crt.sh: Quick and easy searches
b. Censys: In-depth analysis of certificates
(2) crt.sh lookup
yeon0815@htb[/htb]$ curl -s "https://crt.sh/?q=facebook.com&output=json" | jq -r '.[]
| select(.name_value | contains("dev")) | .name_value' | sort -u
*.dev.facebook.com
*.newdev.facebook.com
*.secure.dev.facebook.com
dev.facebook.com
devvm1958.ftw3.facebook.com
facebook-amex-dev.facebook.com
facebook-amex-sign-enc-dev.facebook.com
newdev.facebook.com
secure.dev.facebook.com
- sort -u: This sorts the results alphabetically and removes duplicates
4. Fingerprinting
- focuses on extracting technical details about the technologies powering a website or web application
(1) Banner Grabbing
(2) Analysing HTTP Headers
(3) Probing for Specific Responses: Sending specially crafted requests to the target can elicit unique responses that reveal specific technologies or versions
(4) Analysing Page Content
1) Banner Grabbing
curl -I: fetches only the HTTP headers
yeon0815@htb[/htb]$ curl -I inlanefreight.com
HTTP/1.1 301 Moved Permanently
Date: Fri, 31 May 2024 12:07:44 GMT
Server: Apache/2.4.41 (Ubuntu)
Location: https://inlanefreight.com/
Content-Type: text/html; charset=iso-8859-1
yeon0815@htb[/htb]$ curl -I https://inlanefreight.com
HTTP/1.1 301 Moved Permanently
Date: Fri, 31 May 2024 12:12:12 GMT
Server: Apache/2.4.41 (Ubuntu)
X-Redirect-By: WordPress
Location: https://www.inlanefreight.com/
Content-Type: text/html; charset=UTF-8
yeon0815@htb[/htb]$ curl -I https://www.inlanefreight.com
HTTP/1.1 200 OK
Date: Fri, 31 May 2024 12:12:26 GMT
Server: Apache/2.4.41 (Ubuntu)
Link: <https://www.inlanefreight.com/index.php/wp-json/>; rel="https://api.w.org/"
Link: <https://www.inlanefreight.com/index.php/wp-json/wp/v2/pages/7>; rel="alternate"; type="application/json"
Link: <https://www.inlanefreight.com/>; rel=shortlink
Content-Type: text/html; charset=UTF-8
2) Wafw00f
- To detect the presence of a WAF (Web Application Firewall), we'll use the wafw00f tool.
yeon0815@htb[/htb]$ pip3 install git+https://github.com/EnableSecurity/wafw00f
yeon0815@htb[/htb]$ wafw00f inlanefreight.com
______
/ \
( W00f! )
\ ____/
,, __ 404 Hack Not Found
|`-.__ / / __ __
/" _/ /_/ \ \ / /
*===* / \ \_/ / 405 Not Allowed
/ )__// \ /
/| / /---` 403 Forbidden
\\/` \ | / _ \
`\ /_\\_ 502 Bad Gateway / / \ \ 500 Internal Error
`_____``-` /_/ \_\
~ WAFW00F : v2.2.0 ~
The Web Application Firewall Fingerprinting Toolkit
[*] Checking https://inlanefreight.com
[+] The site https://inlanefreight.com is behind Wordfence (Defiant) WAF.
[~] Number of requests: 2
* this website is protected by the Wordfence Web Application Firewall, developed by Defiant.
3) Nikto
- web server vulnerability assessment tool
yeon0815@htb[/htb]$ sudo apt update && sudo apt install -y perl
yeon0815@htb[/htb]$ git clone https://github.com/sullo/nikto
yeon0815@htb[/htb]$ cd nikto/program
yeon0815@htb[/htb]$ chmod +x ./nikto.pl
yeon0815@htb[/htb]$ nikto -h inlanefreight.com -Tuning b
- Nikto v2.5.0
---------------------------------------------------------------------------
+ Multiple IPs found: 134.209.24.248, 2a03:b0c0:1:e0::32c:b001
+ Target IP: 134.209.24.248
+ Target Hostname: www.inlanefreight.com
+ Target Port: 443
---------------------------------------------------------------------------
+ SSL Info: Subject: /CN=inlanefreight.com
Altnames: inlanefreight.com, www.inlanefreight.com
Ciphers: TLS_AES_256_GCM_SHA384
Issuer: /C=US/O=Let's Encrypt/CN=R3
+ Start Time: 2024-05-31 13:35:54 (GMT0)
---------------------------------------------------------------------------
+ Server: Apache/2.4.41 (Ubuntu)
+ /: Link header found with value: ARRAY(0x558e78790248). See: https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Link
+ /: The site uses TLS and the Strict-Transport-Security HTTP header is not defined. See: https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Strict-Transport-Security
+ /: The X-Content-Type-Options header is not set. This could allow the user agent to render the content of the site in a different fashion to the MIME type. See: https://www.netsparker.com/web-vulnerability-scanner/vulnerabilities/missing-content-type-header/
+ /index.php?: Uncommon header 'x-redirect-by' found, with contents: WordPress.
+ No CGI Directories found (use '-C all' to force check all possible dirs)
+ /: The Content-Encoding header is set to "deflate" which may mean that the server is vulnerable to the BREACH attack. See: http://breachattack.com/
+ Apache/2.4.41 appears to be outdated (current is at least 2.4.59). Apache 2.2.34 is the EOL for the 2.x branch.
+ /: Web Server returns a valid response with junk HTTP methods which may cause false positives.
+ /license.txt: License file found may identify site software.
+ /: A Wordpress installation was found.
+ /wp-login.php?action=register: Cookie wordpress_test_cookie created without the httponly flag. See: https://developer.mozilla.org/en-US/docs/Web/HTTP/Cookies
+ /wp-login.php:X-Frame-Options header is deprecated and has been replaced with the Content-Security-Policy HTTP header with the frame-ancestors directive instead. See: https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/X-Frame-Options
+ /wp-login.php: Wordpress login found.
+ 1316 requests: 0 error(s) and 12 item(s) reported on remote host
+ End Time: 2024-05-31 13:47:27 (GMT0) (693 seconds)
---------------------------------------------------------------------------
+ 1 host(s) tested
5. Crawling
- often called spidering
- automated process of systematically browsing the World Wide Web
1) Types of crawling strategies
(1) Breadth-First Crawling
- prioritizes exploring a website's width before going deep.
- This is useful for getting a broad overview of a website's structure and content.
(2) Depth-First Crawling
- prioritizes depth over breadth
2) robots.txt
- a guide for bots, outlining which areas of a website they are allowed to access and which are off-limits. (tell bots which they can and cannot crawl)
- Technically, robots.txt is a simple text file placed in the root directory of a website. (e.g., www.example.com/robots.txt)
User-agent: *
Disallow: /admin/
Disallow: /private/
Allow: /public/
User-agent: Googlebot
Crawl-delay: 10
Sitemap: https://www.example.com/sitemap.xml
(1) Why respect robots.txt?
- while robots.txt is not strictly enforceable (a rogue bot could still ignore it), most legitimate web crawlers and search engine bots will respect its directives. This is important for several reasons:
a. Avoiding overburdening servers
b. Protecting sensitive information
c. Legal and ethical compliance
(2) robots.txt in Web reconnaissance
a. Uncovering hidden directories
b. Mapping website structure
c. Detecting crawler traps
3) .Well-Known URIs
- The .well-known standard serves as a standardized directory within a website's root domain
- This designated location, typically accessible via the /.well-known/ path on a web server, centralizes a website's critical metadata, inlcuding configuration files and information related to its servers, protocols, and security mechanism
- The IANA (Internet Assigned Numbers Authority) maintains a registry of .well-known URIs, each serving a specific purpose defined by various specifications and standards.
(1) examples
a. security.txt: contains contact information for security researchers to report vulnerabilities
b. /.well-known/change-password: provides a standard URL for directing users to a password change page.
c. openid-configuration: defines configuration details for OpenID Connect, an identity layer on top of the OAuth 2.0 protocol.
d. assetlinks.json: Used for verifying ownership of digital assets (e.g., apps) associated with a domain
e. mta-sts.txt: specifies the policy for SMTP MTA Strict Transport Security (MTA-STS) to enhance email security.
* OpenID: Authentication protocol that lets users log in to multiple websites using one account
* MTA-STS: Mail Transfer Agent Strict Transport Security. Security protocol that protects email communications by enforcing the use of encrypted connections (via TLS) between email servers.
(2) Web recon and .well-known
- Particularly useful URI is openid-configuration
- https://example.com/.well-known/openid-configuration: This endpoint returns a JSON document containing metadata
{
"issuer": "https://example.com",
"authorization_endpoint": "https://example.com/oauth2/authorize",
"token_endpoint": "https://example.com/oauth2/token",
"userinfo_endpoint": "https://example.com/oauth2/userinfo",
"jwks_uri": "https://example.com/oauth2/jwks",
"response_types_supported": ["code", "token", "id_token"],
"subject_types_supported": ["public"],
"id_token_signing_alg_values_supported": ["RS256"],
"scopes_supported": ["openid", "profile", "email"]
}
* Authorization endpoint: Identifying the URL for user authorization requests
* Token endpoint: Finding the URL where tokens are issued
* Userinfo endpoint: Locating the endpoint that provides user information
* JWKS URI: Reveals the JSON Web Key Set (JWKS), detailing the cryptographic keys used by the server
* Supported Scopes and Response Types
* Algorithm Details: Information about supported signing algorithms can be crucial for understanding the security measures in place.
4) Creepy Crawlies
(1) Popular Web Crawlers
a. Burp Suite Spider: widely used
b. OWASP ZAP (Zed Attack Proxy): free, open-source
c. Scrapy (Python framework): versatile and scalable framework for building custom web crawlers
d. Apache Nutch (Scalable Crawler): highly extensible and scalable open-source web crawler written in Java. It's designed to handle massive crawls across the entire web or focus on specific domains.
(2) Scrapy
- installing scrapy
yeon0815@htb[/htb]$ pip3 install scrapy
- custom scrapy spider
yeon0815@htb[/htb]$ wget -O ReconSpider.zip https://academy.hackthebox.com/storage/modules/144/ReconSpider.v1.2.zip
yeon0815@htb[/htb]$ unzip ReconSpider.zip
yeon0815@htb[/htb]$ python3 ReconSpider.py http://inlanefreight.com
- results.json
{
"emails": [
"lily.floid@inlanefreight.com",
"cvs@inlanefreight.com",
...
],
"links": [
"https://www.themeansar.com",
"https://www.inlanefreight.com/index.php/offices/",
...
],
"external_files": [
"https://www.inlanefreight.com/wp-content/uploads/2020/09/goals.pdf",
...
],
"js_files": [
"https://www.inlanefreight.com/wp-includes/js/jquery/jquery-migrate.min.js?ver=3.3.2",
...
],
"form_fields": [],
"images": [
"https://www.inlanefreight.com/wp-content/uploads/2021/03/AboutUs_01-1024x810.png",
...
],
"videos": [],
"audio": [],
"comments": [
"<!-- #masthead -->",
...
]
}
* external_files: Lists URLS of external files such as PDFs.