MediaWiki talk:Titleblacklist

From Wikipedia, the free encyclopedia

Contents

[edit] Editing the blacklist

The following instructions were copied from mw:Extension:Title Blacklist.

The title blacklist is maintained as a system message MediaWiki:Titleblacklist.

This page consists of regular expressions, each on a separate line, for example:

Foo <autoconfirmed|noedit|errmsg=blacklisted-testpage> 
[Bb]ar #No one should create article about it

Each entry may also contain optional attributes, enclosed in <> and divided by |

  • autoconfirmed — only non-autoconfirmed users are unable to create/upload/move such pages
  • noedit — users are also unable to edit this article
  • casesensitive — don't ignore case when checking title for being blacklisted
  • errmsg — the name of the message that should be displayed instead of standard
Also, you shouldn't add "^" at the beginning of the entry and "$" at the end. They will be added automatically — VasilievVV (talk) 10:11, 1 January 2008 (UTC)
There is a global title blacklist available at meta:Title blacklist awaiting activation. ~Kylu (u|t) 06:06, 5 March 2008 (UTC)
The global title blacklist has been enabled. --MZMcBride (talk) 03:08, 10 April 2008 (UTC)

[edit] HAGGER

Why have you blocked all page creations containing HAGGER? עוד מישהו Od Mishehu 09:20, 1 January 2008 (UTC)

It's a common vandal meme -- if you're familiar with "Willy on Wheels"-style pagemove vandalism, the HAGGER?!?!?!??!?! vandal(s) often take a similar approach. As I understand, the title blacklist was partially created specifically to try and deal with these sorts of things. – Luna Santin (talk) 12:01, 1 January 2008 (UTC)
Isn't this going to cause a problem if people try and create articles about notable people with the last name "Hagger"? Such as Nicholas Hagger, David Osborne Hagger, Lloyd Hagger, and Kim Hagger? All of which turned up after a quick search. Even if it is a troll-meme, all the trolls have to do is change one letter and they can get around it. Where as someone who has a valid reason for creating the hypothetical article ?????? Hagger can't reasonably be expected to change the subjects name just to get around the blacklist. Or is the blacklist case-sensitive?--69.118.143.107 (talk) 14:16, 1 January 2008 (UTC)
The section above contains the instruction about making an entry case-sensitive. Some admin should add a <casesensitive> after .*HAGGER.*. — Kalan ? 17:19, 1 January 2008 (UTC)
Y Done EVula // talk // // 19:33, 1 January 2008 (UTC)
Awhoops. Thanks for pointing that out! – Luna Santin (talk) 00:00, 2 January 2008 (UTC)
Hi, I tried that code on testwiki, and the current addition .*HAGGER.* <casesensitive> did not prevent from creating that page (without "casesensitive" it worked), best regards, --birdy (:> )=| 00:08, 2 January 2008 (UTC)
P.S. ah, I think You forgot the | -> .*HAGGER.*|<casesensitive> regards, --birdy (:> )=| 00:10, 2 January 2008 (UTC) Hm, no I am wrong. But it works casesensitve without that <casesensitive> but with the code right now it does not work at all, best regards, --birdy (:> )=| 00:19, 2 January 2008 (UTC)

[edit] My Suggestion for dealing with Hagger

Obviously Hagger is having too much fun with this. That person keeps posting a URL that is meant to confuse people(given they put YouTube in the URL) *and* take them to his/her website. It looks like "nimp.org" is registered with GoDaddy. If enough people from Wikipedia complain to GoDaddy about Hagger's actions perhaps we can get GoDaddy to revoke the domain registration based on their own Terms of Service seen here.

(Quote) "5. NO UNLAWFUL CONDUCT OR IMPROPER USE.

As a condition of Your use of Go Daddy ’s Software and Services, You agree not to use them for any purpose that is unlawful or prohibited by these terms and conditions, and You agree to comply with any applicable local, state, federal and international laws, government rules or requirements. You agree You will not be entitled to a refund of any fees paid to Go Daddy if, for any reason, Go Daddy takes corrective action with respect to Your improper or illegal use of its Services.

Go Daddy reserves the right at all times to disclose any information as Go Daddy deems necessary to satisfy any applicable law, regulation, legal process or governmental request, or to edit, refuse to post or to remove any information or materials, in whole or in part, in Go Daddy's sole discretion.

If You have purchased Services, Go Daddy has no obligation to monitor Your use of the Services. Go Daddy reserves the right to review Your use of the Services and to cancel the Services in its sole discretion. Go Daddy reserves the right to terminate Your access to the Services at any time, without notice, for any reason whatsoever.

Go Daddy reserves the right to terminate Services if Your usage of the Services results in, or is the subject of, legal action or threatened legal action, against Go Daddy or any of its affiliates or partners, without consideration for whether such legal action or threatened legal action is eventually determined to be with or without merit. Go Daddy may review every account for excessive space and bandwidth utilization and to terminate or apply additional fees to those accounts that exceed allowed levels.

Except as set forth below, Go Daddy may also cancel Your use of the Services, after thirty (30) days, if You are using the Services, as determined by Go Daddy in its sole discretion, in association with spam or morally objectionable activities. Morally objectionable activities will include, but not be limited to: activities designed to defame, embarrass, harm, abuse, threaten, slander or harass third parties; activities prohibited by the laws of the United States and/or foreign territories in which You conduct business; activities designed to encourage unlawful behavior by others, such as hate crimes, terrorism and child pornography; activities that are tortuous, vulgar, obscene, invasive of the privacy of a third party, racially, ethnically, or otherwise objectionable; activities designed to impersonate the identity of a third party; illegal access to other computers or networks (i.e., hacking); distribution of Internet viruses or similar destructive activities; and activities designed to harm or use unethically minors in any way. Notwithstanding anything to the contrary herein, in the event Go Daddy cancels Your Services during the first thirty (30) days after You purchase the Services, You will receive a refund of any fees paid to Go Daddy in connection with the Services being canceled. In the event Go Daddy deletes Your Services because they are being used in association with spam or morally objectionable activities, no refund will be issued. You agree You will not be entitled to a refund of any fees paid to Go Daddy if, for any reason, Go Daddy takes corrective action with respect to Your improper or illegal use of its Services. " (/END QUOTE)

and here.

(Quote) " GoDaddy.com, Inc. does not tolerate the transmission of spam. We monitor all traffic to and from our Web servers for indications of spamming and maintain a spam abuse compliant center to register allegations of spam abuse. Customers suspected to be using Go Daddy products and services for the purpose of sending spam are fully investigated. Once Go Daddy determines there is a problem with spam, Go Daddy will take the appropriate action to resolve the situation. Our spam abuse compliant center can be reached by email at abuse@godaddy.com.

How We Define Spam We define spam as the sending of Unsolicited Commercial Email (UCE), Unsolicited Bulk Email (UBE) or Unsolicited Facsimiles (Fax), which is email or facsimile sent to recipients as an advertisement or otherwise, without first obtaining prior confirmed consent to receive these communications from the sender. This **can include, but is not limited to**, the following:

  1. Email Messages
  2. Newsgroup postings
  3. Windows system messages
  4. Pop-up messages (aka "adware" or "spyware" messages)
  5. Instant messages (using AOL, MSN, Yahoo or other instant messenger programs)
  6. Online chat room advertisements
  7. Guestbook or Website Forum postings
  8. Facsimile Solicitations

" (/END QUOTE)

And they have a SPAM reporting tool seen here:

Feel free to jump in.

CaribDigita (talk) 15:45, 3 May 2008 (UTC)

[edit] Ivan Drach

I suppose the Ukrainian poet Ivan Drach also has some /bad meaning/ to keep an entry from being made on him?

--Will Dockery

[edit] All-uppercase entry

I think that we should only allow autoconfirmed users to create pages in which all the letters are uppercase. There are probably few cases where such pages are needed, except for abbreviations which already have articles. עוד מישהו Od Mishehu 16:38, 1 January 2008 (UTC)

I disagree. There's still plenty of abbreviations that don't have pages. Also, most of such page creations will be honest errors of well-meaning people. Such attempts should be welcomed and then corrected (or the other way around), not stopped in their tracks. Finally, autoconfirmed blocking of page creation is still impossible, if I understand correctly. - Andre Engels (talk) 19:54, 1 January 2008 (UTC)
I agree with Andre, I think this would net us far too many false positives. Phrases are useful to block, but an entire style? Eh... EVula // talk // // 20:28, 1 January 2008 (UTC)
Worth discussing, but as said, probably a bit of a heavy tool for a smallish problem. On the upside, ALL-CAPS draw the rapid attention of newpage patrollers. ;) – Luna Santin (talk) 23:59, 1 January 2008 (UTC)
Blocking anonymous and new users from creating articles about abbreviations is worth discussing, as such users have WP:AFC. But blocking people from editing, say, KPMG or TIAA-CREF or AARP or MS-DOS or NEC? I don't think semi-protecting every page about an organization or product identified by a string of uppercase Latin letters. --Damian Yerrick (talk | stalk) 00:37, 12 February 2008 (UTC)

[edit] Edit protected request

{{editprotected}} Request that "/w/" be banned to prevent the notorious /w/wiki.php? and /w/index.php? SPAMmers. 68.39.174.238 (talk) 01:27, 2 January 2008 (UTC)

The absence is intentional. east.718 at 02:28, January 2, 2008
Why, so you can get SPAMmed? 68.39.174.238 (talk) 09:46, 3 January 2008 (UTC)
There are legitimate reasons for allowing bots to create predictable vandalism. If a global title blacklist is implemented, which seems pretty likely, it will most certainly contain the /w/ and /index.php regexes. Cheers. --MZMcBride (talk) 23:13, 3 January 2008 (UTC)
Isn't this a "global title blacklist"? 68.39.174.238 (talk) 07:59, 4 January 2008 (UTC)
No, this blacklist is only for en.wikipedia. Identifying predictable spammers while instantly blocking them and cleaning up their spam on enwiki helps the stewards on the small wiki monitoring team shut them down on more vulnerable wikis before they do serious damage. east.718 at 08:07, January 4, 2008
By the way, global blacklist is already implemented. You should just put bugzilla request for creating such on Wikimedia — VasilievVV (talk) 14:49, 4 January 2008 (UTC)
bugzilla:12484 --MZMcBride (talk) 03:20, 8 January 2008 (UTC)

[edit] Interesting, but this is prone to host-phishing-style attacks

For instance, someone could replace the 'A' in 'HAGGER' with a Cyrillic 'A' or a Greek 'A' to go around the blacklist. And L337-speak would be another alternative (H4663R, anyone?). The blacklist might grow unwieldy if ever some vandal is dedicated enough. (But I guess, this can be used in conjunction with the user banning features, so maybe this is not too much of a problem.) --seav (talk) 02:49, 2 January 2008 (UTC)

The beauty of regex means that we can solve this with one giant expression, rather than a huge blacklist. east.718 at 03:03, January 2, 2008

[edit] I just made MediaWiki talk:Titleblacklist/log.

I figure we should probably have something to keep track of what is added when, etc, just like the local spam link list has, but better. :P I figure that having a sortable table would be much cooler than a boring flat file log, so I did it. :P Cheers. --slakrtalk / 06:33, 2 January 2008 (UTC)

[edit] BRIAN PEPPERS DAY??? ‎

this should be added to the list, as it is always a major disruption on Febuary 21. Blueanode (talk) 20:09, 8 January 2008 (UTC)

Declined. I can't seem to find any extensive history of titles like it being deleted or salted. The title blacklist is primarily for titles for which normal deletion/salting is insufficient to resolve the problem. But, lemme know if I missed something. --slakrtalk / 21:24, 23 January 2008 (UTC)
I found this lot recently. Is that related? • Anakin (talk) 13:59, 28 February 2008 (UTC)

[edit] Removed addition

I've reverted the addition of the anti-phone-number regex. Wikipedia's already got several articles on phone numbers (555-1212 and 867-5309 come to mind). --Carnildo (talk) 10:42, 28 January 2008 (UTC)

[edit] Namespace?

Is the title blacklist restricted to the mainspace? If not, what's preventing us from replacing the fairly crude protected-image system that stops users uploading images with titles such as Image:Picture.jpg?? Using this list would be a much more elegant solution. Happymelon 20:09, 26 February 2008 (UTC)

Sure. It works for all namespaces — 213.181.10.210 (talk) 06:58, 15 March 2008 (UTC)

[edit] The "Jews did WTC" entry

It seems that this entry only prevents "JEWS DID WTC" (with quotation marks) from being created, whereas any variation without the quotation marks do not return the title blacklist message. Should the quotation marks be removed? TML (talk) 11:48, 4 March 2008 (UTC)

[edit] Hagger

People are using H.A.G.G.E.R now. Probably move onto H..A..G..G..E..R.. or H-A-G-G-E-R. Should we add .*H\W*A\W*G\W*G\W*E\W*R.* now? Sceptre (talk) 01:56, 24 March 2008 (UTC)

Depends. How many false positives will it generate? --Carnildo (talk) 05:58, 24 March 2008 (UTC)
\W means non-"word" characters, so unless you consider H!@#A%^$#G!)(G./?E!)R)!)!!!!! to be a false positive, it shouldn't cause too many. --Random832 (contribs) 16:48, 9 April 2008 (UTC)
I've changed it to .*H\W*(A|Α)\W*G\W*G\W*E\W*R.* after a page had been moved to HΑGGER???????????????????. עוד מישהו Od Mishehu 07:13, 15 April 2008 (UTC)
Could someone please modify the HAGGER-regexp so that the Н sign, which has recently been used, is not allowed at the beginning ? I'm not sure if the correct regexp is .*(H|Н)\W*(A|Α)\W*G\W*G\W*E\W*R.*. Thanks --Oxymoron83 09:45, 16 April 2008 (UTC)
Y Done John Vandenberg (chat) 12:03, 16 April 2008 (UTC)
I've also pre-emptively added other cyrillic and greek lookalikes --Random832 (contribs) 14:27, 18 April 2008 (UTC)

This vandal is also being disruptive at my sites Palaeos and EvoWiki can someone please use checkuser on the hagger usernames on wikipedia and tell me what their ip addresses are so i can block those ip addresses at Palaeos (and get someone else to block these usernames on EvoWiki since I don’t have administrator privileges there.--Fang 23 (talk) 14:25, 26 April 2008 (UTC)

Declined - that would violate the privacy policy. Stifle (talk) 11:31, 15 May 2008 (UTC)

He's now starting to use the character Ң. Perhaps it's time to add that to the list. 128.2.152.135 (talk) 07:53, 4 May 2008 (UTC)

[edit] .*[!?]{3,}.*

I've modified the ! and ? regex to apply to all users except sysops. I can't think of any legitimate use of more than three question marks or exclamation points, and we've had some page move vandalism lately that has abused this form of punctuation. --MZMcBride (talk) 00:19, 2 April 2008 (UTC)

There's an indie band by that name. Might come into trouble if they release a new album that needs to be disambiguated. Sceptre (talk) 02:54, 2 April 2008 (UTC)

[edit] Two rules removed

.*[\p{P}\p{Mc}\p{Z}]{4,}.* is causing problems at Portal:Indianapolis/On this day.../April 9 and other similar pages. .{135,} <autoconfirmed> blocks the creation of pages with long page titles. Both of these rules are evidently causing more harm than good, so I've removed them. --- RockMFR 19:12, 9 April 2008 (UTC)

Well, the second rule only applied to non-autoconfirmed users. Have there been any actual complaints about the regex or is this purely theoretical at the moment? --MZMcBride (talk) 20:02, 9 April 2008 (UTC)
By the way, I wouldn't totally remove .*[\p{P}\p{Mc}\p{Z}]{4,}.* because it catches excessive punctuation that's common in that herbie or whatever dude's vandal creates/moves. I simply didn't account for ".../". Instead, consider simply upping the minimum count to perhaps 6 or 7. --slakrtalk / 03:33, 17 April 2008 (UTC)
I should also stick a link to unicode character properties so that the \p{properties} stuff makes sense. E.g., \p{P} matches all punctuation characters— no matter how inverted or funky they are. There's a cool table on that page with the full list. --slakrtalk / 03:43, 17 April 2008 (UTC)

[edit] Removed redundant entries

There were several HAGGER entries that were redundant with (i.e. matched strict subsets of) the most recently added one. Let's try to keep the list clean. Also, the first one (with nothing in between the letters) is case-sensitive (to prevent blocking articles that have "Hagger" as a genuine surname, I expect), but these are caught by one of the other entries. Any opinions on what to do about this? --Random832 (contribs) 14:45, 18 April 2008 (UTC)

[edit] Removed entry

I've removed .*[\p{Mc}]{4,}.* from the list.

  1. According to my regex reference (Programming Perl), that's four consecutive combining marks, not four consecutive space-like characters, which would be perfectly valid (if rare) in a title.
  2. How common a problem are such titles, anyways?

--Carnildo (talk) 05:42, 29 April 2008 (UTC)

Would two in a row ever be necessary? I think GRAWP, et al. have been abusing "special" spaces. --MZMcBride (talk) 06:11, 29 April 2008 (UTC)
If something's actually showing up often enough to be a problem, then adding it to the blacklist is fine. --Carnildo (talk) 06:21, 29 April 2008 (UTC)
Oops, that should have been \p{Z}, but technically multiple space combining characters in a row is equally problematic (especially for that user). --slakrtalk / 06:22, 29 April 2008 (UTC)
You mean special spaces like " ," " ," " ," " ," " ," " ," " ," " ," " ," " ," "​," " "? Surely these can be added.—Ryūlóng (竜龙) 06:25, 29 April 2008 (UTC)
Interestingly, MediaWiki doesn't, from what I can tell, support multiple regular spaces consecutively (i.e., _____ snaps back to _). Special spaces can be consecutive, though. --MZMcBride (talk) 06:25, 29 April 2008 (UTC)
Ryu: that's what the \p{Z} should nab. Though, if mw doesn't support multiple regular spaces, then it might be an idea to reduce the "4" down to "3" or "2", unless someone can think of a typical instance where it's needed. *shrug* --slakrtalk / 06:30, 29 April 2008 (UTC)
Those are unicode spaces. One is the unicode full width ideographic space. These should be added, but I don't know how to do it myself.—Ryūlóng (竜龙) 09:25, 29 April 2008 (UTC)

[edit] Why is throw under the bus blocked?

Come on, it's the hottest cliché out there right now! Why is it blocked?--The lorax (talk) 06:49, 1 May 2008 (UTC)

A regular space was accidentally included in the blacklist. That title should work again. --MZMcBride (talk) 07:27, 1 May 2008 (UTC)

[edit] Two odd inclusions

Any reason for including John Cabell Breckenridge, which should surely redirect to John C. Breckenridge, or Talk:Johann Jakob Breitinger (the talk page for Johann Jakob Breitinger)? Frickeg (talk) 07:02, 1 May 2008 (UTC)

Those titles were being blocked by .*[\x{2100}-\x{214F}].*. That regex has been removed for the time being. --MZMcBride (talk) 07:26, 1 May 2008 (UTC)
Okay, anyone have any idea what's wrong with it? It works in Perl. —Ilmari Karonen (talk) 07:29, 1 May 2008 (UTC)
Aha, found it. Apparently PHP's Unicode implementation considers the character "\x{212A}" (Kelvin sign, K) equivalent to the letter "K", and applies said equivalence rule even inside regular expressions, such that the letter "K" matches the Unicode character range "[\x{2100}-\x{214F}]". Same apparently goes for "\x{212B}" (Angstrom sign, Å) vs. "Å" as well as "\x{2126}" (Ohm sign, Ω) vs. "Ω".
Fortunately, none of those unit sign characters are usable in titles anyway: they get automatically normalized to their simpler equivalents. (Using them in wikilinks gives odd results, with apparent redlinks ĺeading to existing pages: Ω, K, Å.) Anyway, removing those and some other non-letterlike symbols from the range, I get the following fixed regexp, which I've tested on a local MediaWiki installation:
.*[ℂ℃℄ℇ℈℉ℊℋℌℍℎℏℐℑℒℓℕ№℗℘ℙℚℛℜℝ℞℟℣ℤℨ℩ℬℭ℮ℯℰℱℲℳℴℹ℺⅁⅂⅃⅄ⅅⅆⅇⅈⅉⅎ].* <casesensitive>
That should take care of all the latin letter lookalikes in that range; of course, I'm sure there are more in other parts of the Unicode repertoire. And now, having spilled the beans, I'd better go add that regexp to the blacklist before our friend, who I'm sure is reading this talkpage, saves a copy of that list to cut and paste from. —Ilmari Karonen (talk) 01:21, 3 May 2008 (UTC)
Nice work. : - ) --MZMcBride (talk) 01:32, 3 May 2008 (UTC)

[edit] What's Going On?

All of a sudden, I can't leave warnings on anon talk pages. Its coming up "Unauthorized title". See [1] for example. Collectonian (talk) 02:57, 3 May 2008 (UTC)

A regular space was accidentally included in the blacklist. That title should work again. --MZMcBride (talk) 03:06, 3 May 2008 (UTC)
Ah...thanks, all good again :) Collectonian (talk) 03:08, 3 May 2008 (UTC)

[edit] Edward Henry Lewinski Corwin

Please check Edward H. L. Corwin which was rejected as Edward Henry Lewinski Corwin.-- Matthead  Discuß   02:59, 3 May 2008 (UTC)

A regular space was accidentally included in the blacklist. That title should work again. --MZMcBride (talk) 03:06, 3 May 2008 (UTC)
I wish people would stop trying to get cute with blocking unusual characters -- they keep blocking huge numbers of valid article titles by accident. --Carnildo (talk) 03:20, 3 May 2008 (UTC)
I wish Grawp and Grawp socks would stop moving tens of pages to obscure Unicode titles. The blacklist is updated when there are attacks. The most recent one was by User talk:I think 2 + 2 = 22.. When Random updated the blacklist, Firefox converted a non-breaking space into a regular space -- not really anyone's fault. The issue has been fixed going forward. --MZMcBride (talk) 03:24, 3 May 2008 (UTC)

[edit] Recent error

The brief inclusion of a regular space character in the title blacklist was due to an error in Firefox's implementation of forms, and has been fixed by replacing the non-breaking space in one of the regexes with the code "\x{00A0}". The problem should not recur. I apologize for the inconvenience. --Random832 (contribs) 03:21, 3 May 2008 (UTC)

[edit] More Hagger substitutes

{{editprotected}}

Please modify the hagger regexp to blacklist the title ¿¿¿H Å G G Ệ Ŕ!. Thanks. MER-C 10:05, 10 May 2008 (UTC)

Done. It seems there are three blacklist lines for this title, where one would probably suffice. -- zzuuzz (talk) 10:36, 10 May 2008 (UTC)

Another one: ¿Я Ǝ อ อ Ά H. MER-C 11:46, 12 May 2008 (UTC)

I think this diff did that. Woody (talk) 11:50, 12 May 2008 (UTC)

And another... HΑGGĘRʔ (ʔs repeated). Kesac (talk) 02:18, 14 May 2008 (UTC)

[edit] Reorganized

I've just made some major changes to the list:

  • Reorganized the list into sections based on what each regexp is checking for, hopefully making it easier to maintain.
  • Added inline comments to most of the entries, except for the most self-explanatory ones.
  • Rewrote the "HAGGER" regexps based on some grepping of Unicode character tables. The new regexps should match everything the old ones did, and plenty more variants besides, but keep an eye open for additions just in case.
  • Added some new regexps to catch any mixed-script (i.e. Latin/Greek, Latin/Cyrillic, etc.) titles containing letters from the "HAGGER" regexps. This should help reduce the number of possible titles each new lookalike character makes available to the vandal.

I'm sure there are further improvements that could be made; for example, several characters in the HAGGER regexps may be obscure enough that they'd be worth blacklisting individually. On the converse side, I've expanded the whitelist to include all single-character titles; most of them could be potentially valid redirects to articles about the character or symbol in question. I've also compiled a list, User:Ilmari Karonen/Funnycode, of existing article titles (as of April 28) containing characters that the current blacklist disallows entirely; some of the characters in the list might be worth allowing after all (the "Other punctuation" class matches a whole lot of them), while others are just obvious errors and may require cleanup (though most are just leftover redirects). —Ilmari Karonen (talk) 19:27, 14 May 2008 (UTC)

I excluded the miscellaneous symbols and dingbats ranges, as well as the "№" and "™" signs, from the "other punctuation" regexp; the list of matching existing titles is a lot shorter now. A big fraction of the remaining ones contain non-breaking spaces; it might be worth adding a custom error message for those, since the default one we have isn't very informative. —Ilmari Karonen (talk) 20:10, 14 May 2008 (UTC)

...as I've just done: see MediaWiki:Titleblacklist-custom-nbsp. —Ilmari Karonen (talk) 20:33, 14 May 2008 (UTC)

[edit] Last addition

The last addition (this one) seems to be preventing User:Петър Петров from creating user subpages. See this ANI thread. Perhaps it was too wide and needs to be reverted. Stifle (talk) 11:24, 15 May 2008 (UTC)

Yeah, sorry, that regexp was just broken — I basically just forgot to consider non-article titles. I've prefixed the mixed-script regexps with (?!(User|Wikipedia|Image|MediaWiki|Template|Help|Category|Portal)( talk)?:|Talk:), which should prevent them from matching outside mainspace. —Ilmari Karonen (talk) 14:02, 15 May 2008 (UTC)

[edit] Soft hyphen

Could someone replace the soft hyphen with the appropriate \x{} code? Since it's a non-printing character, it could easily get lost when someone edits the page. --Carnildo (talk) 01:42, 17 May 2008 (UTC)

[edit] Existing matching titles

I ran a script that compared the blacklist against the latest database dump (from April 25), and uploaded the results at User:Ilmari Karonen/Badtitles. The list contains a lot of broken titles as well as plenty of vandal userpages, but there are also genuine false positives that may we worth looking into. For example, there seem to be quite the few titles that are listed because they contain three consecutive exclamation points; it might be worth loosening up that particular rule a little. Another similar case are titles like Talk:(-)-borneol dehydrogenase, where the colon in "Talk:" is enough to push the number of consecutive punctuation characters to five and thus hit the blacklist. —Ilmari Karonen (talk) 23:06, 17 May 2008 (UTC)

First, let me say you've been doing great work lately. Truly. : - )

As for the blacklist issues, I would be more inclined to write a few custom error messages for the special cases (!!!, etc.) than allow a lot of repeated punctuation. It seems more sensible to block moving on this specific (small) subset of articles, which in reality probably won't be moved much, than allow other page move vandalism that has tens of exclamation points or tens of question marks. --MZMcBride (talk) 23:23, 17 May 2008 (UTC)

True, but we could still make a specific exception for exactly three exclamation points in a row. Something like .*([?‽¿][!?‽¿][!?‽¿]|![?‽¿][!?‽¿]|!![?‽¿]|!!!!).* would do it, though it does look rather messy. The problem is that the blacklist doesn't just affect moves; it also, for example, prevents the creation of talk pages for the matched articles (assuming the regexp isn't explicitly made namespace-specific), as well as the archival of any existing talk pages. —Ilmari Karonen (talk) 00:39, 18 May 2008 (UTC)
Perhaps whitelist just "!!!"? --MZMcBride (talk) 05:01, 19 May 2008 (UTC)
The whitelist, unfortunately, is useless for this: if you were to add, say, .*!!!.* to the whitelist, that would allow every title containing "!!!", even ones that would be blacklisted for any other reason. Basically, it's only good for regexps narrow enough that one can be sure that any title matching any of them is valid. To exclude certain titles from matching only one blacklist entry, the entry itself needs to be altered. (It does occur to me, though, that it should be possible to simplify the version I suggested above using negative lookbehind: .*[!?‽¿]{3}(?<!!!!).*, with a separate regexp to match "!!!!".) —Ilmari Karonen (talk) 06:09, 19 May 2008 (UTC)

[edit] Latest Grawp stuff

Apparently Wikipedia:Ή.A.G.G.E.R.?.. wasn't protected. Can we fix this? NawlinWiki (talk) 01:08, 19 May 2008 (UTC)

  • And more of same: Wikipedia:ḤAGGER??. Looks like moves to the Wikipedia namespace aren't blacklisted. Let's fix this quick. NawlinWiki (talk) 01:11, 19 May 2008 (UTC)
    Done. Let me know if there's any more. —Ilmari Karonen (talk) 03:59, 19 May 2008 (UTC)
      • And the latest: see here. He's now using IBHHFS ("HAGGER" + 1 letter for each), and there are some interesting symbol combinations that I would have thought would be blocked. NawlinWiki (talk) 11:37, 19 May 2008 (UTC)

(unindent) Hm, I think it would be best if we banned any character from being repeated more than 5 times and if we added a custom error message for this particular case. Also, it seems that some type of upside-down question mark was able to be used. That should be added to the question mark line. --MZMcBride (talk) 17:01, 19 May 2008 (UTC)

  • Added the ¿ to the question mark line. I'll let Ilmari take care of the rest -- I'm afraid I'd break something if I tried. NawlinWiki (talk) 18:20, 19 May 2008 (UTC)
I've added some regxeps to catch these. In particular, it turned out that PCRE's definition of "punctuation" ([\p{P}]) was rather narrow; I replaced it with "not a letter, 0-9 or space" ([^\p{L}\d ]), which should match for example "^" too. I also implemented MZMcBride's suggestion of blacklisting any character repeated five or more times (except "0", too many numbers with lots of zeros). Oh, and I added "IBHHFS" as well as "IFSNZ", though I'm sure these will not slow him much. (I think it's pretty safe to add simple rules like .*IBHHFS.* even if you're not familiar with regexps; it's only when you involve trickier regexp syntax or odd Unicode characters that things can break.) —Ilmari Karonen (talk) 22:08, 19 May 2008 (UTC)
    • Added ∑ to the e's section of the Haggers based on vandalism from last night. Please let me know if I didn't do this right. NawlinWiki (talk) 14:14, 28 May 2008 (UTC)
      Looks fine to me. —Ilmari Karonen (talk) 15:02, 28 May 2008 (UTC)
      Oops, looks like your edit lost the closing bracket ("]") from the character class. Didn't spot that the first time. Never mind, I've put it back. —Ilmari Karonen (talk) 01:06, 29 May 2008 (UTC)
      • Thanks. Latest from last night is moves like this -- any ideas? NawlinWiki (talk) 11:56, 29 May 2008 (UTC)
I made some changes to the regexp that should catch those. —Ilmari Karonen (talk) 15:33, 29 May 2008 (UTC)
        • Report from last night -- now using repeating letters such as HAGGGER, HAAGGER, etc., and one HWAGGER. Your next challenge... Thanks for all you do, NawlinWiki (talk) 11:40, 30 May 2008 (UTC)
            • He's using Greek/Cyrillic characters to get around the .*Grawp.* blacklist entry. Instead of expanding it to include lookalikes, as has been done for the HAGGER entries, I decided to add a blanket prohibition against pagemoves to mixed-script titles. This will probably cause some false positives, but hopefully not too many; page moves aren't all that common anyway. On the other hand, if it works it ought to make it much harder to get around other blacklist rules. Anyway, if it causes too much trouble for legitimate editors, please revert it. —Ilmari Karonen (talk) 03:40, 31 May 2008 (UTC)
  • I don't think this will be a problem -- thanks! NawlinWiki (talk) 03:44, 31 May 2008 (UTC)

[edit] Incorrect blacklisting

Which blacklist rule was preventing Ιερουσαλήμ (Greek: Jerusalem) from being created? --Carnildo (talk) 23:00, 19 May 2008 (UTC)

This one: .*[ΉḤĤĦɧ⒣Ⓗⓗ].*. The first character is an uppercase eta with tonos, and since it's case-insensitive, it matches "ή". That regexp is a bit problematic in other ways too: the "Ḥ", for example, matches quite a few Arabic names. It might help if it was made case-sensitive, that would reduce the false positives a bit. Of course, that would make it somewhat less effective, too. —Ilmari Karonen (talk) 01:24, 20 May 2008 (UTC)

And another incorrect blacklisting: User_talk:Nooooob. I assume it's .*([^0])\1{4}.* that's the problem; would it make sense to have that rule an article-only rule? --Carnildo (talk) 23:31, 28 May 2008 (UTC)

Maybe. Or simply exclude the User and User_talk namespaces. Vandalism in Wikipedia:, while not as "bad," is still annoying. --MZMcBride (talk) 23:58, 28 May 2008 (UTC)
I agree, these should only be restricted from the article space, or at least permitted in User and User talk (for user pages/subpages and talk pages/archives), and Wikipedia and Wikipedia talk (for SSPs, RFAs, RFCUs and MFDs). --Snigbrook (talk) 23:59, 28 May 2008 (UTC)
See also, the "moveonly" section below: repeated letters could be allowed in User and User talk, and allowed with the exception of page move targets in Wikipedia and Wikipedia talk, if there is a problem with vandalism. --Snigbrook (talk) 00:04, 29 May 2008 (UTC)
Added <moveonly> to the .*([^0])\1{4}.* entry. Might be worth considering it for some other entries in that section too. —Ilmari Karonen (talk) 15:31, 29 May 2008 (UTC)

[edit] <moveonly>

I recently committed rev:35163, which adds support for a <moveonly> option that makes the blacklist entry apply only to page move targets. That means we can now more easily add regexps to catch pagemove vandalism without having to worry about also hitting things like legitimate redirects that would never be targets for a move (such as Ιερουσαλήμ above). I've added the option to the "HAGGER" regexps, but it occurs to me that some further simplification of the regexps might be possible. —Ilmari Karonen (talk) 08:28, 23 May 2008 (UTC)

[edit] The "log"

The log isn't being kept up to date, and it duplicates the function of the page history. I'd like to propose discontinuing use of the log page. --Random832 (contribs) 16:37, 23 May 2008 (UTC)

[edit] .*skater.* <moveonly>

Is there really enough of a problem with pagemoves to "skater" that we need to make it harder to disambiguate articles for sports figures? --Carnildo (talk) 23:22, 7 June 2008 (UTC)

  • We had someone moving hundreds of user talk pages to titles including "skater girl". I'll change it to that rather than just "skater". NawlinWiki (talk) 03:07, 9 June 2008 (UTC)

[edit] "On wheels"

"On wheels" is contained in the title of several articles, and the entry here has been changed as it was previously in capitals but is now in lowercase. It would cause problems for deletion nominations (and possible featured article nominations), and in at least one case, prevent the article from being discussed as the talk page cannot be created. It has been used in page move vandalism but has not, as far as I know, been used often in the titles of vandalism articles. Could the entry be changed to have a <moveonly> next to it? --Snigbrook (talk) 15:26, 11 June 2008 (UTC)

Y Done. —Ilmari Karonen (talk) 16:46, 11 June 2008 (UTC)

[edit] Disable userpage moves

I propose disabling userpage moves, allowing non-admin users to move user subpages only. The code is currently live on testwiki, see testwiki:MediaWiki:Titleblacklist and the message testwiki:MediaWiki:Titleblacklist-custom-userpagemove. Technically, user_talk moves could be disabled as well, but unfortunately some users archive their discussions this way. —AlexSm 18:07, 11 June 2008 (UTC)

Would it possible to allow non-admin users to move their own user pages but not the pages of other users? --Snigbrook (talk) 20:17, 11 June 2008 (UTC)
I filed a bug about this type of thing a while ago, bugzilla:13883. While it's trivial to block non-sysops from moving user pages and user talk pages, the issue is that the user themselves wouldn't be able to move their own user page, for better or for worse. I've seen quite a few users moving their userpage to the main namespace, however, so perhaps blocking the root user page from being moved would be beneficial. And, we can always include a custom error message that links to WP:RM or some such. --MZMcBride (talk) 23:10, 11 June 2008 (UTC)