Wikipedia talk:WikiProject New page

From Wikipedia, the free encyclopedia

1 Name of the project
2 Facts on newly created pages
3 Userbox
4 End-of-shift report
- 4.1 My first real one
5 My End-of-shift report
6 New db- template
7 ClueBot V
8 Stopping Leaking of New Page
9 Nonconstructive comments re new pages

[edit] Name of the project

I think the name of the project should be Wikipedia:WikiProject New pages. Otolemur crassicaudatus (talk) 15:36, 30 December 2007 (UTC)

I have changed the project name to Wikipedia:WikiProject New page studies. Otolemur crassicaudatus (talk) 16:26, 30 December 2007 (UTC)

I have again changed the project name into Wikipedia:WikiProject New page. Otolemur crassicaudatus (talk) 16:31, 30 December 2007 (UTC)

Good luck with this, it seems a worthwhile thing to do. Nick mallory (talk) 06:43, 31 December 2007 (UTC)

[edit] Facts on newly created pages

The section "Facts on newly created pages" need to be expanded.Otolemur crassicaudatus (talk) 08:05, 31 December 2007 (UTC)

[edit] Userbox

I should create a userbox for this project shortly. :) Rt . 22:12, 31 December 2007 (UTC)

It's not much, but you can view it by adding {{User WikiProject New page}} or just clicking the link. :) Rt . 20:40, 1 January 2008 (UTC)

I have changed the display text from "new page wikiproject" to "WikiProject New page". This gives a more clear understanding. Otolemur crassicaudatus (talk) 14:43, 2 January 2008 (UTC)

[edit] End-of-shift report

On a voluntary basis, a newpage patroller could fill out a form that looks like this:

Newpage patrol end-of-shift report
	Begin	End	Total pages patrolled	Bot-created	CSD-tagged	SOBMTI	CSD-deleted	CSD-denied	CSD-removed	CSD-pending	PROD / AFD	Redirects	Cleanup / tagged
Date	12:30	13:30	30	30	30	30	30	30	30	30	30	30	30

Total pages patrolled - Actually, the total number of pages that were created during your shift, regardless of whether or not you actually saw them. Approximated by the number of entries in Special:Newpages from the beginning to the end of your shift.

Bot-created - Number of stubs created by bots, mass-creators, etc., during your shift. Easily determined from the bot's list of contributions.

The following can be determined from your list of personal contributions. It is a good idea to wait a while before filling out that section.

CSD-tagged - Number of pages you tagged with CSD during your shift, as determined by the number of user warnings you produced.

SOBMTI - "SomeOne Beat Me To It" - Number of pages you wanted to tag for speedy, but someone else did before you could (no practical way to follow-up).

CSD-deleted - Number of pages you tagged that no longer existed at the end of your shift, as determined by the number of user warnings not associated with an existing page.

CSD-denied - Number of CSD tags you inserted that were removed by admins and/or editors deemed reliable (including yourself), or removed following edits that made the article satisfactory.

CSD-removed - Number of CSD tags you inserted that were removed by article authors, requiring reinsertion.

CSD-pending - Number of CSD tags you inserted that were still in place at the end of your shift.

PROD/AFD - Number of pages you nominated for PROD or AFD during your shift.

Redirects - Number of pages you redirected during your shift, including page moves you made because of inappropriate titles.

Cleanup/tagged - Number of pages you tagged but not for deletion, or cleaned up yourself.

--Blanchardb-^{Me•MyEars•MyMouth}-timed 05:04, 1 January 2008 (UTC)

This chart is an excellent idea, Blanchardb. I support this. However I think the "Begin" and "End" column will be not much necessary. Otolemur crassicaudatus (talk) 08:59, 1 January 2008 (UTC)

Another chart can be created on the issue that which criteria is used for CSD. Otolemur crassicaudatus (talk) 09:01, 1 January 2008 (UTC)

[edit] My first real one

Newpage patrol end-of-shift report
	Begin	End	Total pages patrolled	Bot-created	CSD-tagged	SOBMTI	CSD-deleted	CSD-denied	CSD-removed	CSD-pending	PROD / AFD	Redirects	Cleanup / tagged
January 1st	14:13 UTC	14:56 UTC	106	78+	9	0	8	1	1	0	1	4	1
January 2nd	2:55 UTC	3:45 UTC	73	11++	8	9	6	2	1	0	1	3	5
January 2nd	12:45 UTC	14:00 UTC	98	30	8	2	7	1	0	0	0	1	6
January 3rd	23:43 UTC	00:49 UTC	150	114	21	5	20	1	2	0	3	2	5
January 6th	12:45 UTC	14:22 UTC	99	15	13	3	11	1	1	1	1	1	5
January 20th	13:23 UTC	14:22 UTC	85	33	7	2	7	1	0	0	0	1	10

+55 by Olivier (talk · contribs), 7 by Dpaajones (talk · contribs), 7 by Blofeld of SPECTRE (talk · contribs), 9 by Rtphokie (talk · contribs)

++All by Rtphokie (talk · contribs)

--Blanchardb-^{Me•MyEars•MyMouth}-timed 15:15, 1 January 2008 (UTC)

[edit] My End-of-shift report

Newpage patrol end-of-shift report
	CSD-tagged	SOBMTI	CSD-deleted	CSD-denied	CSD-removed	PROD	AFD	Cleanup / tagged
January 2nd	5	2	4	1	0	0	0	3

Otolemur crassicaudatus (talk) 14:34, 2 January 2008 (UTC)

[edit] New db- template

I have created a new speedy deletion template to be used on attempted copy-and-paste page moves (speedy G6 housekeeping). The template is {{db-copypaste}}. --Blanchardb-^{Me•MyEars•MyMouth}-timed 15:29, 2 February 2008 (UTC)

[edit] ClueBot V

Yesterday, a bot that would automatically analyze new pages and look for speedy deletion candidates has been approved for a three-day trial period but was deactivated about three hours later after complaints that an inordinate amount of false positives violates WP:BITE. An admin even threatened to block such a bot. The bot's creator, himself an admin, is looking for ways to improve his bot for a second trial, and I would like your input. Please comment at Wikipedia:Bots/Requests for approval/ClueBot V‎. --Blanchardb-^{Me•MyEars•MyMouth}-timed 21:46, 20 February 2008 (UTC)

[edit] Stopping Leaking of New Page

One of the things I have noticed when doing quick research on notability and sources for new pages I check is that within minutes of being added they are already indexed by Google and friends. Does anyone know if MediaWiki supports some kind of waiting period to block robot crawlers (outside robot crawlers like Google's from seeing pages before editors and admins have had a chance to triage the obvious problems? --Marcinjeske (talk) 03:05, 15 April 2008 (UTC)

This is beyond our jurisdiction, actually. You could state your concerns at MediaWiki, which makes the software that Wikipedia uses. --Blanchardb-^{Me•MyEars•MyMouth}-timed 11:09, 15 April 2008 (UTC)

Well, I was mainly wondering if MediaWiki already had this functionality - checking some other resources - it does not. The best it does is use a static robots.txt to ward search bots off from User space, talks pages, and other special purpose pages. But because 1) it depends on cooperation from the search bots and 2) it happens at the URL level before any info is known about the page contents, age, or status, there is no way to leverage that functionality to do the above.

But mainly I posted here because this is the logical place where I could get a sense if this was something editors though should be done. No point in convincing developers to do something if it is never going to get used. So, what would people thing of:

The ability to block search bots (generally, but in particular those from the big engines like Google) from visiting:

articles newly created - to provide time for wiki editors to vet and improve a page before it is exposed to the world
- a major idea being that articles filled with self-promotion or spam which get speedily deleted would never show up in Google results, reducing the incentive for that sort of vandalism
- a big question is how long an article should be quarantined? A few hours at least? At most a few days? We wouldn't want it to end up like US copyright, where a definite timeline would get extended to an indefinite timeline.
- what about instead of time, it was edit count, or more specifically number of unique editors. We would pick some threshold, say 3, as the minimal consensus needed for something to be accessible to search engines. That would guarantee that for the most blatant CSD stuff... (user 1 creates, user 2 tags, admin 1 deletes) and no casual users get mislead by Wikipedia's high PageRank into reading an article touting services or random thoughts.
articles tagged for copyvio or speedy deletion - if there is such high doubt that the article should exist, why let the search engines see it
- the danger here is that malicious editors might use this feature to keep legitimate pages out... but them they would be quickly corrected for inappropriately tagging pages - by definition, pages like this are either going to disappear very soon, or the tags will be removed and the search engine would be let in.
articles recently edited - again, while vandalism and other inappropriate content is usually quickly removed (thanks to vigilant editors and our own awesome WikiBots), we run the risk that in the meantime, that content gets cached in the external world.
- again, the question is how long... I would say on the order of a few minutes to an hour after the last edit. The best determination would be made by looking at logs of articles and seeing how long vandalism takes to revert on average.
- this would remove at least a bit of the incentive to vandalize/spam Wikipedia

detecting requests coming from bots in an active way would also allow MediaWiki to enforce the currently voluntary constraints of robots.txt

It is possible for negative effects:

detecting what is a robot is not an exact science... there could be false positives where legitimate readers may be blocked from reading articles...
- but this can be addressed by providing a user-easy but machine-difficult way to view the article in these cases.
- if the technical ability exists to block one or more types of users based on the content or state of the article, that ability could be misused in the future
- there is the potential for page creators to be confused why the new page is not showing up in search engines, but the there really should not be a user expectation that listing is instant (even though in practice Google has the site within minutes of creation)
the MediaWiki software would have increased work in identifying search bot requests and examining the page for tests of freshness and "minimal consensus"
- as long as the tests are kept simple, they can be introduced naturally in the course of serving the page, and only under the condition that the requesting agent is a search bot.
  - given the minimal editor test, you do not need to exhaustively search the history... any article with more than a few revision is likely to have multiple editors, once three unique editors are counted, the test succeeds
  - since copyvio and speedy deletion tags should appear at the beginning of a article, only the first few lines of the article need to be examined
- The tradeoff in processing saved from not displaying a portion new/revised pages to search bots may offset the computational cost of doing the processing

So, how good/bad does that sound as a technological measure to augment and reflect consensus? --Marcinjeske (talk) 15:58, 19 April 2008 (UTC)

[edit] Nonconstructive comments re new pages

Having created a stub article for Daniel J. O'Hern, in advance of further expansion of the article, I received this comment on my talk page stating that "It is quite annoying to see a new page completely depending on only one reference. Try use some more references for this article. Review WP:RS and WP:V before creating any other article." from User:Otolemur crassicaudatus within seconds of the article's creation. While I couldn't care less as to what this one editor finds annoying, I sincerely hope that attacks on editors of new articles is not the goal and objective of this WikiProject. When I see new articles that I have an issue with, I try my best to edit and expand the articles; While I appreciate the fact that people enjoying criticizing rather than doing, the simple statement that "this article would benefit from additional expansion and sources" -- without the personal annoyance commentary or references to policies already fulfilled by the article -- would be far more productive in the future. Alansohn (talk) 17:28, 4 June 2008 (UTC)