Robots.txt for multiple websites

dav3evans · 6 Aug 2013 at 11:55

Hey all,

So I've been reading up on the use of robots.txt to prevent crawling etc... and I'm trying to get my head around the use of this if you have multiple sites.

Example:

Say you have your main hosting "www.myhostingdomain.co.uk" but then within that you have 3 separate websites within 3 folders:

awesomewebsite1a.co.uk in folder website1a
awesomewebsite2a.co.uk in folder website2a
awesomewebsite3a.co.uk in folder website3a

To get the above sites to map to the correct addresses you can set up domain mapping, but what would this mean in terms of needing to have the robots.txt in the root directory?

Logically I would think that because of the mapping, the root directory of awesomewebsite1a.co.uk would be the top level of the folder website1a, and so the mapping would go in here, meaning it would be like this:

awesomewebsite1a.co.uk = website1a/robots.txt
awesomewebsite2a.co.uk = website2a/robots.txt
awesomewebsite3a.co.uk = website3a/robots.txt

Is this correct? or would the robots.txt need to go within the root folder of "www.myhostingdomain.co.uk" and then have every setting for robots within that one file.

It sounds almost a bit redundant to have separate files as I'm writing this, but I'm thinking about it from the perspective of 3 separate owners of sites managing their own stuff, rather than having to contact their host to make changes.

As always, questions, assistance and general abuse welcome

Dave.

thenewoc · 6 Aug 2013 at 12:17

AFAIK each web site would have a separate robots.txt in its' root folder as you've described.

If you think about it in terms of what you would tell a search engine when you create a web site, such as Google Web Master Tools, it would be about that individual web site not that it sits on the same server as several other web sites. It's logical for it to find the robots.txt rules in the same place it looks for something like index.html.

Unfortunately they don't guarantee that a search engine will take any notice of them.

You can also put in 'rel' attributes into your link code, such as 'nofollow';
http://www.w3schools.com/TAGS/att_a_rel.asp

dav3evans · 6 Aug 2013 at 12:18

That's what I thought, as you tell it to index the URL, and that is directed to the specific folder which is, in effect it's root folder. It's something that I want to test but thought I would check my reasoning is sound before hand in case there is something that I overlooked

thenewoc · 6 Aug 2013 at 12:38

deleted

dav3evans · 6 Aug 2013 at 12:43

They wouldn't link to each other, but the 3 sites in the folders would have a link to the hosting domain, as means to say "created or hosted by" in the footer of each site.

thenewoc · 6 Aug 2013 at 12:46

dav3evans said:
They wouldn't link to each other, but the 3 sites in the folders would have a link to the hosting domain, as means to say "created or hosted by" in the footer of each site.

I think that would be fine as I think it would be treated as an external link. That's where the choice of using the rel attribute comes in, but in your case you probably would want it to follow those.

dav3evans · 6 Aug 2013 at 12:47

The links at the bottom of the sites would be the full URL rather than linking back up the folder stack.

When you mention the rel attribute here, can you elaborate what you mean?

Thanks,

Dave.

thenewoc · 6 Aug 2013 at 13:19

You can use the 'rel' attribute on individual links or you can use a robots meta tag at the page level to act as the directive for that page and / or with 'noindex' like so;

Code:

<meta name="robots" content="nofollow">

or

<meta name="robots" content="noindex">

or

<meta name="robots" content="noindex,nofollow">

This explains the 'rel' attribute with 'nofollow'.

eg

Code:

<a href="http://destination.com/" rel="nofollow">link text</a>

https://support.google.com/webmasters/answer/96569?hl=en

One difference between the page meta tag use relates to the use of 'noindex' which is valid with the meta tag but not with the 'rel' attribute on link code.

This is also a useful resource.

http://tools.seobook.com/robots-txt/

dav3evans · 6 Aug 2013 at 13:20

Thanks matey, I'll get my reading on

thenewoc · 6 Aug 2013 at 13:24

The meta tag part is mimicking the same as you would put in the robots.txt file, albeit only dealing with the current page, but having it in both places may mean that some crawlers take note of it from one where they might miss it in the other.

dav3evans · 6 Aug 2013 at 13:28

Ah ok so it's definitely worth implementing both where possible then. Good to know