Block Dynamic URLs From Googlebot Utilizing Your Robots.txt File

I’ve been looking for out learn how to block some dynamic urls from the Googlebot. The search bots for Yahoo! Slurp and MSNBot use the identical or very comparable syntax to dam dynamic urls. For example I’ve this one line in my htaccess file which permits me to make use of static pages as an alternative of dynamic pages however I discovered generally the Googlebot will nonetheless crawl my dynamic pages. This will result in duplicate content material which is not condoned by any of the foremost search engines like google.

I’m making an attempt to wash up my personals web site because it at present ranks properly with Yahoo however not Google. I consider MSN Stay has comparable algorithms to Google however this is not scientifically confirmed by any means. I solely state this from my very own private expertise with search engine marketing and my shopper’s websites. I consider I’ve discovered some solutions on rating properly with Google, MSN and presumably Yahoo. I am within the midst of testing proper now. I’ve managed to rank properly on Google for a shopper’s web site already for related key phrases. Anyway, right here is learn how to block the dynamic pages from Google utilizing your robots.txt file. The next is an extract of my htaccess file:

RewriteRule personals-dating-(.*).html$ /index.php?web page=view_profile&id=$1

This rule, in case you are questioning, permits me to create static pages reminiscent of personals-dating-4525.html from the dynamic hyperlink index.php?web page=view_profile&id=4525. Nonetheless, this has brought on issues as now the Googlebot can and has “charged” me with duplicate content material. Duplicate content material is frowned upon and causes extra work on Googlebot as a result of now it has to crawl further pages and it may be considered as spammy by the algorithm. The ethical is duplicate content material needs to be prevented in any respect prices.

What follows is an extract of my robots.txt file:

Person-agent: Googlebot

Disallow: /index.php?web page=view_profile&id=*

Discover the “*” (asterisk) signal on the finish of the second line. This simply tells the Googlebot to disregard any variety of characters within the asterisk’s place. For instance, Googlebot will ignore index.php?web page=view_profile&id=4525 or another quantity or set or characters. In different phrases, these dynamic pages won’t be listed. You’ll be able to test to see in case your guidelines in your robots.txt file will perform appropriately by logging into your Google webmaster management panel account. If you do not have a Google account then you definitely merely must create one from Gmail, AdWords or AdSense and you will have entry to the Google site owners instruments and management panel. Should you’re wishing to attain increased rankings then you need to have one. Then all it is advisable to do is be logged into your gmail, adwords, or AdSense accounts to have an account. They make it fairly easy to arrange an account and it is free. Click on the “Diagnostics” tab after which the “robots.txt evaluation instrument” hyperlink underneath the Instruments part within the left column.

By the way in which, your robots.txt file needs to be in your webroot folder. The Googlebot checks your web site’s robots.txt file as soon as a day and will probably be up to date in your Google site owners management panel underneath the “robots.txt evaluation instrument” part.

To check your robots.txt file and validate in case your guidelines will perform appropriately with Googlebot then merely sort within the url that you simply want to take a look at within the subject “Check URLs towards this robots.txt file”. I added the next line to this subject: page=view_profile&id=4235

Then I clicked on the “Verify” button on the backside of the web page. The Googlebot will block this url given the situations. I consider this can be a higher approach to block Googlebot slightly than use the “URL Removing” instrument which you’ll use. The “URL Removing” instrument is on the left column of your Google site owners management panel. I’ve learn in a couple of circumstances within the Google teams that folks have had issues with the “URL Removing” instrument.

