Talk:Robots Exclusion Standard
From Wikipedia, the free encyclopedia
For the old archive, please see Talk:Robots.txt protocol.
- this is a red link, whoever moved the page messed up, or the page was moved using some form of bot/app, which just posted the mesage without checking for the presence of a talk page.--|333173|3|_||3 05:38, 27 June 2006 (UTC)
Perhaps this article should be named Robots exclusion standard instead of Robots Exclusion Standard? Wmahan. 00:17, 2004 Sep 12 (UTC)
I have ixed the Warning section, and reduced the level to a level 3 heading (=== ... === instead of == .. ==), and femoved the {{tone}} tag.--|333173|3|_||3 05:38, 27 June 2006 (UTC)
Contents |
[edit] Google info removed
- Google uses comments for the same purpose: <!--googleoff: index--> ... <!--googleon: index-->
A source is needed. - Ta bu shi da yu 13:53, 14 August 2006 (UTC)
[edit] NOINDEX
Can anyone confirm this? It sounds general, but I know of not a single reference anywhere. projectphp 00:13, 15 August 2006 (UTC)
AFIK, NOINDEX tag has been introduced by Yandex, a russian search engine, see Yandex help page (in Russian). 212.176.39.52 12:12, 15 August 2006 (UTC)
[edit] <!--noindex--> / <!--/noindex-->
One another way to exclude a portion of webpage from indexing is used by ASPSeek and DataparkSearch search engines: two special comments for the begin and the end of region to exclude <!--noindex--> / <!--/noindex-->, see DataparkSearch's documentation.
[edit] Examples section
There's a bit of a discrepancy between the first two and the other examples; the first two talks about "robots" while the latter about "crawlers". Should this be fixed/changed? Aeluwas 21:14, 30 May 2007 (UTC)
When search engines talk about their robots, they tend to call them "crawlers". However, the robots.txt applies to all robots, even the ones that don't crawl (and just check sites). Accordingly, I suggest that we use "robots" as a standard term for this article unless it's in a section that is very clearly only about a search engine crawler (such as the crawl delay).Ian McAnerin (talk) 05:20, 21 November 2007 (UTC)
[edit] Spam / Useless Links
I just removed the following external link: *[ht tp://www.google-msn-yahoo.info/ Windows XP Update Repaire] It caught my eye when I noticed "repaire" was spelled wrong. When I followed the link, it went to one of the spammier sites I've ever seen. The top half was all about wooden flooring, and there was a little tiny note at the bottom saying that robots.txt is important. Ian McAnerin (talk) 05:05, 21 November 2007 (UTC)
[edit] History
http://yro.slashdot.org/comments.pl?sid=377285&cid=21554125 gives the history of the robots.txt standard. However, I'm not sure if the information is purticulary encyclopedic, and I'm betting a slashdot comment isn't a reliable, verifiable source. OTOH, the people monitoring this talk page might want to chase it down. Theorbtwo 23:54, 2 December 2007 (UTC)
[edit] Dynamic Links
There's no info on dynamic links. ceo 13:21, 7 December 2007 (UTC)

