Nov 01 2006

Protect Your Assets from Search Engines

Hackers love to exploit security holes. With Google hacking, they've found a doorway into sensitive information from Web servers, applications, error messages and subdomains.

Many businesses reveal a lot of highly sensitive information to the world through the Web. Trouble is, most don’t even know it.

It might come as a surprise to some people that simply by querying a search engine, hackers can find credit card and Social Security numbers, user names and password information about their targets. This information gets accessed off vulnerable servers and Web applications, descriptive error messages, supposedly hidden or private pages and subdomains, pages containing internal network data and various Internet-connected devices such as printers and copiers. Using search engines such as Google and Yahoo to find and access sensitive and exploitable information about targets is called “Google hacking.”

Luckily, you can put the same hacker techniques to work to protect and plug the holes in your network. The first step is to uncover what hackers could discover about your company’s data from search engines, remove the sensitive data from the Web and get Google or the search engine in question to do the same. This will put an immediate end to current vulnerabilities, but you’ll need to exercise due diligence and audit yourself to keep your data safe.

How It Works

Hackers craft special search engine query snippets to ferret out Web sites hosting sensitive content. For example, a simple search for this snippet on Google — “ ‘admin
account info’ filetype:log
” returns a list of sites in Google’s index that host log files containing server account information, such as user names and passwords.

A search for the snippet “!Host=*.* intext:enc_UserPassword=* ext:pcf” reveals a list of Web sites hosting virtual private network profiles.

Google hacking may pose a threat to your Web servers and data if a hacker decides to use a search engine to dig out information about an individual. It remains a concern even when hackers do not actively use search engines to hack targets. By using Web application worms, such as Santy, hackers have been able to use search engines to discover random vulnerable servers and attack them indiscriminately.

Moreover, businesses that deploy search engines on their intranets and various internal enterprise search products (such as the Google search appliance) are vulnerable to the same types of threats as if they used an external search engine. As the number of businesses utilizing these tools increases, so do the threats.

The examples and techniques are Google-specific, though most of them also can be used with other search engines.

Finding and Plugging Holes

Find out what’s already revealed to the outside world. To discover what’s available via a search engine, tap the same techniques as a hacker to find out what information is already indexed by search engines.

Use the “site:” operator to see all the information on your Web site that is discoverable using Google. For instance, search for this snippet: “site:[yourbusiness].com.”

The Google Hacking Database has a large collection of snippets that can be used to find sensitive or exploitable information. Go to johnny.ihackstuff.com and click on the Google Hacking Database link. Combine these snippets with the “site:” operator and use them to see if your Web site is vulnerable to any of them.

For instance, the snippets shown in the example in the previous section return a general list of vulnerable sites from Google’s index. Combining these snippets with the “site:” operator, you can find out if your site is vulnerable to administrative or hosting attacks:

• “admin account infofiletype:log site:[yourbusiness].com

!Host=*.* intext:enc_UserPassword=* ext:pcf site:[yourbusiness].com

Another automated tool for auditing your site is called Gooscan. You can find it and other tools at johnny.ihackstuff.com by clicking on the Downloads link and then selecting the “Tools – Google Hacking” folder.

Remove the sensitive information from your servers. Once you have identified what sensitive information your Web site is exposing to the outside world, take steps to immediately remove it from your public Web servers. All links to files containing such information should be removed from all Web pages. Additionally, you’ll want to take two precautions:

• Disable directory listings for all folders.

• Do not reveal unhandled error messages generated by your Web applications to anyone browsing your Web site. SQL error messages, for example, can reveal SQL injection vulnerabilities in your Web application. Search engine crawlers can index these error messages.

Get Google to remove the sensitive information from its database. It’s not enough to remove the sensitive information from your Web servers since Google already has it indexed in its databases. This information might be retrievable from various caches long after you have removed it from your servers. Hence, it is important to get Google to remove it from its databases.

Contact Google’s Webmaster to get the search engine to remove your content from its index. The Google Help Center offers instructions on removing all or part of your Web site, snippets, caches pages and outdated links. Go to www.google.com/support, select the Webmasters link from the top menu and then select the link on removing content from the Google index.

Avoid future incidents. To avoid exposures, it’s best to create a companywide policy preventing any sensitive information from being placed on public Web servers.

Use the Robots Exclusion Standard, a tool to prevent search engine spiders and other Web robots from accessing all or part of your site. A special file called robots.txt is placed in the top-level directory of the site to specify which parts of the site should not be indexed by the search engines. Since this is a publicly accessible file, care should be taken as to what information is placed in this file because hackers can use this file as a reference to learn what parts of your site you want to keep secret.

The robots.txt application uses a special syntax for controlling Web spiders. For example, if you want to disallow all search engines from indexing any part of your site, you would put this in the file:

User-agent: *
Disallow: /

To disallow all search engines from accessing certain directories, use this:

User-agent: *
Disallow: /cgi-bin/
Disallow: /images/
Disallow: /tmp/
Disallow: /private/

You can find more information on how to use the Robots Exclusion Standard at www.robotstxt.org/wc/robots.html.

To control how Google spiders behave on your site using robots.txt, set the “User-agent:” to “Googlebot.” For instance, to disallow access to all pages in a directory except one page, use this:

User-Agent: Googlebot
Disallow: /folder1/
Allow: /folder1/myfile.html

Perform periodic audits. These techniques and periodic audits should ensure that you don’t reveal any potentially dangerous or sensitive information through the Web. But it’s a good idea to make this a part of your regular Web security audits.

S.G. Masood is a Web security researcher for F-Secure (www.f-secure.com), a network security services provider with headquarters in Helsinki, Finland.
CEO takeaway
Search engine hacking, or Google hacking, may pose a threat to your Web servers and data if a hacker decides to use a search engine to dig out information. To ensure that you’re immune, ask your IT team to consider the following:

• Does your regular security auditing plan include checks for information leaks through search engines?
• Does your Web site use the Robots Exclusion Standard (robots.txt) to disallow spiders from indexing certain content?
• Do you have a companywide policy to clearly define what content can be put on publicly accessible Web servers? Is the policy communicated clearly to all employees who have authority to place content on those servers?
• Do not forget that Google is not the only search engine. When you find sensitive information that is being erroneously exposed to the world during an audit and get Google to remove it, make sure you also get it removed from other search engines such as MSN, Yahoo and Ask.