Yang Li Eng  

Your experienced SEO SpecialistWeb Developer

All about robots.txt


What’s this

Locating under your domain root (like our site’s yangli.com.au/robots.txt), it’s a file that is used to control search engine robots how to crawl your site, telling them what pages and files to crawl and what not to crawl. Crawling is a way how Google or any other search engines understand and index your pages (we usually call them URLs).

What you can do

1. You can use the robots.txt file to tell search engine robots block (not to crawl) our pages such as yangli.com.au/secret/ by writing the following codes in the file:

User-agent: *
Disallow: /secret/

You can specify “User-agent” value like above, which means to what search engine robots you want the following rules to affect. “*” means all search engines, or you can make it more specific such as “User-agent: googlebot”. You can see all available bots’ names here.

2. Conversely, you can tell the bots that these pages or files are good to crawl by using:

Allow: /not-a-secret/

3. Moreover, you can even use it to specify the location of your XML sitemap (what’s this?) using the following line:

Sitemap: http://yangli.com.au/sitemap_index.xml

Replace the link above with your actual sitemap location. However, this is not a must for SEO purpose because you can submit the sitemap via Google’s Search Console (Webmaster Tools).

Yang’s tips

1. Never place the following codes in your robots.txt file unless you don’t want your site to get indexed (appear) in Google search results:

User-agent: *
Disallow: /

2. Blocking in robots.txt does NOT mean Google won’t index your pages. Google can still index your pages even you block them in the file. You must have seen the following search result before. Read another post from me to safely and effectively deindex your pages.

A page blocked by robots.txt

3. Allow Google to crawl all your .css and .js files. This is very crucial as Google will need these files to render your web pages, especially if you’re blocking many folders containing plugin files, or even theme files.

Allow: /*.css*
Allow: /*.js*
Allow: /*.CSS*
Allow: /*.JS*

If you got any questions how to deal with the important robots.txt or want any SEO services, feel free to contact me now.