From Zero to One: Practical Tips for Efficiently Collecting Information from the Entire Internet

Information Collection Tools#

1. SecurityTrails — Deep DNS and Domain Intelligence#

SecurityTrails is a premium tool that provides in-depth domain and DNS intelligence. It allows users to track domain history, subdomains, and WHOIS records, which is very useful for footprint tracking during reconnaissance.

🔹 Key Features:

✔️ Domain history tracking (ownership changes, WHOIS records)
✔️ Subdomain enumeration for attack surface mapping
✔️ Reverse DNS lookup and IP intelligence

🔹 Use Case:
Penetration testers can use SecurityTrails to find subdomains of target websites and identify potentially vulnerable forgotten or exposed assets.

2. GreyNoise — Internet Background Noise Filtering#

GreyNoise helps cybersecurity professionals distinguish between targeted attacks and common internet noise. It collects data from large-scale scanning sources like bots and automated scripts to help analysts filter out irrelevant threats.

🔹 Key Features:

✔️ Identify IP addresses executing large-scale scans
✔️ Help differentiate legitimate traffic from threats
✔️ API for automation and integration with security tools

🔹 Use Case:
Security researchers can use GreyNoise to check if suspicious IPs are part of large-scale scanning botnets or if they are conducting targeted probes against their infrastructure.

3. Hunter.io — Email Address Enumeration#

Hunter.io is an excellent OSINT tool for finding email addresses associated with specific domains. It helps security researchers and red team members discover the email formats used by organizations.

🔹 Key Features:

✔️ Find publicly available email addresses associated with domains
✔️ Predict email formats (e.g., [email protected])
✔️ Verify email deliverability

🔹 Use Case:
Attackers can use Hunter.io to collect company email addresses for phishing activities, while security teams use it to identify and protect exposed company emails.

4. FOFA — Shodan's Chinese Open Source Intelligence Alternative#

FOFA is a powerful cybersecurity search engine similar to Shodan but focused on global network intelligence. It scans and indexes exposed assets on the internet, including web servers, IoT devices, and industrial control systems (ICS).

🔹 Key Features:

✔️ Deep fingerprinting of online devices and services
✔️ Supports cybersecurity intelligence research in China
✔️ Advanced search filters for precise reconnaissance

🔹 Use Case:
Penetration testers use FOFA to find publicly available services with vulnerabilities in specific regions or industries, identifying misconfigured systems.

5. LeakIX — Search Engine for Leaked Databases#

LeakIX is a dedicated OSINT tool that helps find exposed databases and misconfigured cloud storage. It continuously scans the internet for open Elasticsearch, MongoDB, and other databases.

🔹 Key Features:

✔️ Detect misconfigurations and open databases
✔️ In-depth insights into leaked credentials and sensitive data
✔️ API access for automation

🔹 Use Case:
Ethical hackers use LeakIX to notify organizations about insecure databases, while threat actors may exploit this data for credential stuffing attacks.

6. Shodan — Search Engine for IoT and Devices#

Shodan is a search engine for discovering internet-connected devices such as servers, webcams, databases, and industrial control systems. Unlike Google, which indexes websites, Shodan indexes open ports, services, and vulnerabilities of publicly accessible systems.

🔹 Key Features:

✔️ Find exposed databases (MongoDB, Elasticsearch, etc.)
✔️ Identify unprotected webcams and IoT devices
✔️ Discover vulnerable servers with outdated software
✔️ Monitor industrial control systems (ICS/SCADA)

🔹 Usage Example:
To find all open RDP ports (3389) in a specific country/region, use:

port:3389 country:CN

🔹 Why is it useful for information gathering?
Helps security researchers find misconfigured systems, assists penetration testing by identifying weak infrastructure, and provides real-time threat intelligence.

7. Censys — A More Advanced Alternative to Shodan#

Censys is similar to Shodan but offers deeper analysis of internet-connected assets. It allows researchers to scan IP addresses, domains, and certificates to discover security risks.

🔹 Key Features:

✔️ Real-time internet scanning for vulnerable services
✔️ TLS/SSL certificate analysis to find expired or weak encryption
✔️ Detailed network and IP reporting

🔹 Usage Example:
To find all HTTPS websites running outdated TLS versions, use:

services.service_name = “HTTP” AND services.tls.version = “TLSv1”

🔹 Why is it useful for information gathering?
Provides better filtering and in-depth analysis compared to Shodan, helps detect SSL misconfigurations, and assists in tracking changes in attack surfaces over time.

8. Wayback Machine — View Old Versions of Websites#

The Wayback Machine is an internet archive that stores historical snapshots of websites. Researchers can use it to view old versions of websites, including past login pages, leaked directories, and sensitive files.

🔹 Key Features:

✔️ Retrieve previous versions of websites
✔️ Restore deleted pages or hidden endpoints
✔️ Analyze security changes over time

🔹 Usage Example:
To see what example.com looked like in the past, visit:

https://web.archive.org/web/*/example.com

🔹 Help:
Find old endpoints or exposed files, allowing security researchers to track site vulnerabilities over time and assist in locating deleted sensitive content.

9. Have I Been Pwned — Check if Credentials Have Been Leaked#

This website allows users to check if their email or password has been compromised in data breaches.

🔹 Key Features:

✔️ Detect compromised emails from public breaches
✔️ Provide API access for automated checks
✔️ Alert users when new breaches occur

🔹 Usage Example:
To check if an email has been leaked, enter:

[email protected]

🔹 Why is it useful for information gathering?
Helps ethical hackers identify leaked company credentials, assists in security audits and awareness, and is useful for enforcing password reset policies.

10. SpiderFoot — Automated OSINT Framework#

SpiderFoot is a powerful OSINT automation tool that integrates with over 200 data sources. It collects and analyzes various types of information, such as IP addresses, domains, emails, and leaked credentials.

🔹 Key Features:

✔️ Fully automated reconnaissance tool
✔️ API integration with multiple OSINT platforms
✔️ Custom scan configurations for targeted information gathering

🔹 Use Case:
Security analysts can use SpiderFoot to scan domains and automatically collect WHOIS details, related subdomains, and potential data leaks.

11. IntelligenceX — Search Historical and Dark Web Data#

IntelligenceX is an OSINT search engine that indexes not only public data but also information from dark web sources, leaked file archives, and even website historical snapshots.

🔹 Key Features:

✔️ Access to dark web leaks, violations, and historical records
✔️ Search blockchain transactions, WHOIS, and social media footprints
✔️ API for automation and bulk data retrieval

🔹 Use Case:
Investigators can use IntelligenceX to find out if company confidential documents or employee credentials have been leaked in past violations.

12. BuiltWith — Website Technology Fingerprinting#

BuiltWith helps security researchers identify the technologies used to build websites, including CMS platforms, analytics tools, and security mechanisms.

🔹 Key Features:

✔️ Detect website frameworks, tracking scripts, and hosting providers
✔️ Reveal third-party integrations used by websites
✔️ API support for automated reconnaissance

🔹 Use Case:
Penetration testers can use BuiltWith to identify outdated CMS platforms and plugins that may contain vulnerabilities.

13. Common Crawl — OSINT Research Web Archive#

Common Crawl is an open-source project that provides a vast amount of web crawl data. Researchers can analyze past versions of websites, metadata, and online footprints.

🔹 Key Features:

✔️ Large historical web crawl datasets
✔️ Helps find deleted or hidden pages
✔️ API access for advanced automation

🔹 Use Case:
Researchers can use Common Crawl to retrieve previously available pages that companies attempted to delete to hide sensitive information.

14. URLScan.io — Real-time Website Scanning and Analysis#

URLScan.io is a website scanning tool that provides detailed information about web pages, including HTTP headers, security configurations, and third-party requests.

🔹 Key Features:

✔️ Visual representation of how web pages load resources
✔️ Detect malicious domains and phishing sites
✔️ API support for bulk scanning and automation

🔹 Use Case:
Security teams use URLScan.io to identify malicious domains impersonating legitimate websites for phishing attacks.

15. OSINT Automation#

With the APIs provided by OSINT tools, security professionals can automate reconnaissance and integrate OSINT into their workflows.

1️⃣ OSINT Automation with Python#

Here is an example script to extract subdomains using the SecurityTrails API:

import requests

API_KEY = "your_securitytrails_api_key"
domain = "example.com"
url = f"https://api.securitytrails.com/v1/domain/{domain}/subdomains"

headers = {"APIKEY": API_KEY}

response = requests.get(url, headers=headers)
if response.status_code == 200:
    subdomains = response.json().get("subdomains", [])
    for subdomain in subdomains:
        print(f"Found: {subdomain}.{domain}")
else:
    print("Error fetching data")

2️⃣ Automating WHOIS Queries with Python#

import whois

domain = "example.com"
info = whois.whois(domain)

print("Domain Name:", info.domain_name)
print("Registrar:", info.registrar)
print("Expiration Date:", info.expiration_date)

16. Spyse — Advanced Cybersecurity Search Engine#

Spyse is a powerful search engine designed for cybersecurity intelligence. It collects real-time data about domains, IPs, SSL certificates, and vulnerabilities, enabling researchers to assess risks effectively.

🔹 Key Features:

✔️ Collect detailed infrastructure data (DNS, subdomains, SSL, etc.)
✔️ Detect open ports and exposed services
✔️ Provide vulnerability intelligence based on CVE

🔹 Usage Example:
Penetration testers can use Spyse to scan the infrastructure of target organizations and detect weaknesses before attackers do.

17. Netlas.io — Advanced Asset Discovery and Risk Analysis#

Netlas.io is a cybersecurity intelligence search engine that indexes exposed assets similar to Shodan and Censys but offers more detailed analysis.

🔹 Key Features:

✔️ Scan IPs, domains, and web applications
✔️ Detect vulnerable network services
✔️ Support automated queries via API

🔹 Usage Example:
Red team members can use Netlas.io to find outdated software running on public servers and report potential risks.

18. FullHunt — Attack Surface Monitoring and Asset Detection#

FullHunt continuously monitors an organization's attack surface, identifying misconfigurations and exposed services.

🔹 Key Features:

✔️ Detect publicly exposed assets and services
✔️ Track domain changes and new subdomains
✔️ Monitor CVE-based vulnerabilities

🔹 Usage Example:
Security analysts can integrate FullHunt into their vulnerability management workflows to receive alerts about newly exposed assets.

19. Onyphe — Cyber Threat Intelligence for IPs and Domains#

Onyphe is a cyber intelligence platform that collects data from various sources, including the deep web, dark web, and open databases, to provide a comprehensive view of target exposure.

🔹 Key Features:

✔️ Collect information about IPs, domains, and leaked credentials
✔️ Track cyber threats and potential attack vectors
✔️ Provide API access for automated threat intelligence

🔹 Usage Example:
Cybersecurity researchers can use Onyphe to investigate suspicious IP addresses and check if they have been involved in malicious activities.

20. CertStream — Real-time SSL/TLS Certificate Monitoring#

CertStream provides real-time information on newly registered SSL/TLS certificates, helping researchers detect phishing and malicious domains.

🔹 Key Features:

✔️ Monitor new domain registrations through SSL certificates
✔️ Assist in real-time detection of phishing activities
✔️ Support integration with security automation tools

🔹 Usage Example:
Threat intelligence teams can use CertStream to identify newly registered similar domains used for phishing attacks against organizations.

🔹 Why is it useful?
Provides an early warning system for phishing detection, helping track malicious actors registering fake domains, which is valuable for security operations teams monitoring brand abuse.