Blog Layout

Killer Robots From Outer SEO Space: How to Dominate the Robots.txt File

Seth Ellsworth

If you haven’t heard of Mr. Robots, don’t blame yourself. It wasn’t even on the SEO map till just a couple years ago. Most of you, however, know what it is but don’t know exactly how to dominate the robots.

Robots.txt files are no secret. You can spy on literally anyone’s robots file by simply typing “” The robots.txt should always and only be in the root of the domain and EVERY website should have one, even if it’s generic and I’ll tell you why.

There’s mixed communication about the robots. Use it. Don’t use it. Use meta-robots. You could have also heard advice to abandon the robots.txt all together. Who is right?

Here’s the secret sauce. Check it out.

First things first, understand that the robots.txt file was not designed for human usage. It was designed to command search ‘bots’ about how exactly they can behave on your site. It sets parameters that the bots have to obey and mandates what information they can and cannot access.

This is critical for your sites SEO success. You don’t want the bots looking through your dirty closets, so to speak.

What is a Robots.txt File?

The robots.txt is nothing more than a simple text file that should always sit in the root directory of your site. Once you understand the proper formats it’s a piece of cake to create. This system is called the Robots Exclusion Standard.

Always be sure to create the file in a basic text editor like Notepad or TextEdit and NOT in an HTML editor like Dreamweaver or FrontPage. That’s critically important. The robots.txt is NOT an html file and is not even remotely close to any web language. It has its own format that is completely different than any other language out there. Lucky for us, it’s extremely simple once you know how to use it.

Robots.txt Breakdown

The robots file is simple. It consists of two main directives: User-agent and Disallow.

User Agent
Every item in the robots.txt file is specified by what is called a ‘user agent.’ The user agent line specifies the robot that the command refers to.


User-agent: googlebot

On the user agent line you can also use what is called a ‘wildcard character’ that specifies ALL robots at once.


User-agent: *

If you don’t know what the user agent names are, you can easily find these in your own site logs by checking for requests to the robots.txt file. The cool thing is that most major search engines have names for their spiders. Like pet names. I’m not kidding. Slurp.

Here some major bots:

Yahoo! Slurp
Mediapartners-Google (Google AdSense Robot)
Xenu Link Sleuth

The second most important part of your robots.txt file is the ‘disallow’ directive line which is usually written right below the user agent. Remember, just because the disallow directive is present does not mean that the specified bots are completely disallowed from your site, you can pick and choose what they can and can’t index or download.

The disallow directives can specify files and directories.

For example, if you want to instruct ALL spiders to not download your privacy policy, you would enter:

User-agent: *
Disallow: privacy.html

You can also specify entire directories with a directive like this:

User-agent: *
Disallow: /cgi-bin/

Again, if you only want a certain bot to be disallowed from a file or directory, put its name in place of the *.

This will block spiders from your cgi-bin directory.

Super Ninja Robots.txt Trick

Security is a huge issue online. Naturally, some webmasters are nervous about listing the directories that they want to keep private thinking that they’ll be handing the hackers and black-hat-ness-doers a roadmap to their most secret stuff.

But we’re smarter than that aren’t we?

Here’s what you do: If the directory you want to exclude or block is “secret” all you need to do is abbreviate it and add an asterisk to the end. You’ll want to make sure that the abbreviation is unique. You can name the directory you want protected ‘/secretsizzlesauce/’ and you’ll just add this line to your robots.txt:

User-agent: *
Disallow: /sec*

Problem solved.

This directive will disallow spiders from indexing directories that begin with “sec.” You’ll want to double check your directory structure to make sure you won’t be disallowing any other directories that you wouldn’t want disallowed. For example, this directive would disallow the directory “secondary” if you had that directory on your server.

To make things easier, just as the user agent directive, there is a similar wildcard command for the disallow directive. If you were to disallow /tos then by default it will disallow files with ‘tos‘ such as a tos.html as well as any file inside the /tos directory such as /tos/terms.html.

Important Tactics For Robot Domination

  • Always place your robots in the root directory of your site so that it can be accessed like this:
  • If you leave the disallow line blank, it indicates that ALL files may be retrieved.
  • You can add as many disallow directives to a single user agent as you need to but all user agents must have a disallow directive whether the directive disallows or not.
  • To be SEO kosher, at least one disallow line must be present for every user agent directive. You don’t want the bots to misread your stuff, so be sure and get it right. If you don’t get the format right they may just ignore the entire file and that is not cool. Most people who have their stuff indexed when they don’t want it to be indexed have syntax errors in their robots.
  • Use the Analyze Robots.txt tool in your Google Webmaster Account to make sure you set up your robots file correctly.
  • An empty robots is the exact same as not having one at all. So, if nothing else, use at least the basic directive to allow the entire site.
  • How to add comments to a robots? To add comments into your robots, all you need to do is throw a # in front and that entire line will be ignored. DO NOT put comments on the end of a directive line. That is bad form and some bots may not read it correctly.
  • What stuff do you want to disallow in your robots?
    • Any folder that you don’t want the public eye to find or those that aren’t password protected that should be.
    • Printer friendly versions of pages (mostly to avoid the duplicate content filter).
    • Image directory to protect them from leeches and to make your content more spiderable.
    • CGI-BIN which houses some of the programming code on your site.
    • Find bots in your site logs that are sucking up bandwidth and not returning any value

Killer Robot Tactics

• This set up allows the bots to visit everything on your site and sometimes on your server, so use carefully. The * specifies ALL robots and the open disallow directive applies no restrictions to ANY bot.

User-agent: *

• This set up prevents your entire site from being indexed or downloaded. In theory, this will keep ALL bots out.

User-agent: *
Disallow: /

• This set up keeps out just one bot. In this case, we’re denying the heck out of Ask’s bot, Teoma.

User-agent: Teoma
Disallow: /

• This set up keeps ALL bots out of your cgi-bin and your image directory:

User-agent: *
Disallow: /cgi-bin/
Disallow: /images/

• If you want to disallow Google from indexing your images in their image search engine but allow all other bots, do this:

User-agent: Googlebot-Image
Disallow: /images/

• If you create a page that is perfect for Yahoo!, but you don’t want Google to see it:

User-Agent: Googlebot
Disallow: /yahoo-page.html
#don’t use user agents or robots.txt for cloaking. That’s SEO suicide.

If You Don’t Use a Robots.txt File…

A well written robots.txt file helps your site get indexed up to 15% deeper for most sites. It also allows you to control your content so that your site’s SEO footprint is clean and indexable and literal fodder for search engines. That, is worth the effort.

Everyone should have and employ a solid robots.txt file. It is critical to the long term success of your site.

Get it done.



By Travis Thorpe 05 Mar, 2022
By Alison Cotsonas 15 Feb, 2022
You know you need to implement SEO into your site to direct traffic your way, but this can be a major undertaking. SEO is only effective when done properly. This seems easy enough, but there are many aspects to an effective SEO plan, and making sure you cover all of the basics can be tricky. […] The post SEO 101: Have You Covered the Basics? appeared first on
By 08 Feb, 2022
Did you get a quote from an SEO company that was too good to be true? It probably is. Find out why you should look at more than the price of SEO services. The post Why You Should Look at More Than The Price of SEO Services appeared first on
By Alison Cotsonas 04 Feb, 2022
You can position your website in a visible and coveted spot on a search engine results page when you partner with an experienced SEO company. This company should be providing reports, answering your questions, and getting a lot of work done – but do you know what an SEO company actually does? The entire process […] The post What Does an SEO Company Actually Do? appeared first on
By Jen Duke 20 Oct, 2021
The post Start Your Social Media Holiday Marketing Plan Now appeared first on
By Andy Eliason 14 Oct, 2021
The post How to Prepare Your Site Now for Holiday Shopping Traffic appeared first on
By Andy Eliason 29 Jun, 2021
The post Why Subdomains are a Bad Idea for Your Website and Blog appeared first on
By Andy Eliason 22 Jun, 2021
The post Does Your Domain Name Affect Your SEO? appeared first on
By Stephanie De Leon Patterson 25 May, 2021
The post What is a Modern SEO Specialist? appeared first on
By Sarah Snider 18 May, 2021
Spring is often associated with a fresh, new, clean start and a renewed sense of life. For many, this means out with the old and in with the new, which takes on the form of spring cleaning. While you may be thinking about spring cleaning your home or even your office space, why not think of refreshing and reviving your business’s website? The post Spring Clean Your Website appeared first on
More Posts
Share by: