MediaWiki talk:Usernameblacklist/Archive
From Wikipedia, the free encyclopedia
Note: This is an archive of past discussions held here
Contents |
[edit] Working?
I'm not having much luck using this page (see my most recent username creations). If you have a chance, take a look, I must be doing something completely wrong. See here for documentation. alphachimp 05:24, 29 March 2007 (UTC)
- Doesn't appear to work. --Jeffrey O. Gustafson - Shazaam! - <*> 05:35, 29 March 2007 (UTC)
- As an admin, you can override it, but as anon, it works. →AzaToth 17:26, 29 March 2007 (UTC)
- Just tested this and confirmed it. alphachimp 17:31, 29 March 2007 (UTC)
- As an admin, you can override it, but as anon, it works. →AzaToth 17:26, 29 March 2007 (UTC)
- Hmm... WP:POINT? :-P Maybe MediaWiki needs some time to refresh the content of this list. -- ReyBrujo 05:38, 29 March 2007 (UTC)
[edit] Proposed edits
The list we have here is great, but it doesn't take advantage of... shall we say, the magic of regexes. I suggest the following list, and also suggest that someone who has worked with regexes for longer than I have take a look before implementing this. This blacklist is extremely powerful, in the sense that it allows no exceptions. I have them numbered here, but that should be changed to bullets:
- \bass?!(ess|oc|yr|em|is)
- Perhaps add the British spelling, but similar provisions (for assess, association, assyrian, assembly, assistance, etc.) would have to be made. Either that, or we could take the approach of only filtering out certain permutations -- when followed by nothing, "hole", "hat" (maybe), etc. GracenotesT § 19:49, 29 March 2007 (UTC)
- "ass" is just too short and simple. Even with \bass\b it could be appropriate to appear some where ("Bob the ass assin"?) —Centrx→talk • 21:37, 29 March 2007 (UTC)
- That name might be blocked for containing sexual innuendo, even unintentional. With what's here, "assassin" would not be allowed, but that can be fixed by adding "|assin". GracenotesT § 21:48, 29 March 2007 (UTC)
- There are many acceptable usernames that might contain the string "ass", for any letter of the alphabet that might follow, enough that \bass\b would be the only possibly appropriate blacklist entry. —Centrx→talk • 21:56, 29 March 2007 (UTC)
- Hm. As a compromise, how about
\bass(hole|hat|es)?\b? GracenotesT § 22:01, 29 March 2007 (UTC)
- Hm. As a compromise, how about
- There are many acceptable usernames that might contain the string "ass", for any letter of the alphabet that might follow, enough that \bass\b would be the only possibly appropriate blacklist entry. —Centrx→talk • 21:56, 29 March 2007 (UTC)
- That name might be blocked for containing sexual innuendo, even unintentional. With what's here, "assassin" would not be allowed, but that can be fixed by adding "|assin". GracenotesT § 21:48, 29 March 2007 (UTC)
- fu[c(]k
- sh[ia]t - William Shatner :) --Conti|✉ 19:38, 29 March 2007 (UTC)
- \bsh[ia]t(ter|ting|e)?\b GracenotesT § 19:47, 29 March 2007 (UTC)
- Gah, that means that "shatter" won't work. Getting rid of [ia], and replacing it with "i", may be good enough. Failing that, there can just be two items. GracenotesT § 20:15, 29 March 2007 (UTC)
- \bsh[ia]t(ter|ting|e)?\b GracenotesT § 19:47, 29 March 2007 (UTC)
- \ban(us|al) - Analog, analyze.. -- Conti|✉ 19:38, 29 March 2007 (UTC)
- \ban(us|al)\b -- GracenotesT § 19:47, 29 March 2007 (UTC)
- sh[ia]t - William Shatner :) --Conti|✉ 19:38, 29 March 2007 (UTC)
- \ban(us|al)\b
- manuscript, canal, IcanuseWP, manalive, ... and what's the point if you force a word boundary, someone can just pick User:UpYourAnus Matchups 03:29, 8 April 2007 (UTC)
- So "it doesn't work in every goddamn case, so let's keep it out?" Not exactly. GracenotesT § 01:33, 12 April 2007 (UTC)
- Do you mean we should bar all names containing "anus" regardless of word boundary or that we should only bar ones with word boundary and accept that we will still have to deal with clever idiots who smusheverythingtogether? Matchups 03:10, 12 April 2007 (UTC)
- The latter. GracenotesT § 11:50, 12 April 2007 (UTC)
- Do you mean we should bar all names containing "anus" regardless of word boundary or that we should only bar ones with word boundary and accept that we will still have to deal with clever idiots who smusheverythingtogether? Matchups 03:10, 12 April 2007 (UTC)
- So "it doesn't work in every goddamn case, so let's keep it out?" Not exactly. GracenotesT § 01:33, 12 April 2007 (UTC)
- manuscript, canal, IcanuseWP, manalive, ... and what's the point if you force a word boundary, someone can just pick User:UpYourAnus Matchups 03:29, 8 April 2007 (UTC)
- phall(us|ic)
- faggot
- dick(s|head)?\b
- cunt
- scunthorpe. --Carnildo 03:45, 8 April 2007 (UTC)
- Yes, I'm aware of that, but 99% of the time, "cunt" is going to be used in a derogatory manner (see Special:Listusers). In addition, actually registering a name with "Scunthorpe" (as a reference) in it seems questionable, but a clause allowing "horpe" following it can be provided. GracenotesT § 11:54, 12 April 2007 (UTC)
- scunthorpe. --Carnildo 03:45, 8 April 2007 (UTC)
- slut
- \blub(ric|es?\b)
- cock(s|sucker)?\b
- Potential collateral with this, I think, we've even got an article on Cocks (surname). (Cocksucker shouldn't be any problem). Seraphimblade Talk to me 05:58, 30 March 2007 (UTC)
- Huh. There could be sexual innuendo (e.g., "John Thomas Cocks with your mom")... but there's not that much harm in doing some stuff on a case-by-case basis. GracenotesT § 13:37, 30 March 2007 (UTC)
- Potential collateral with this, I think, we've even got an article on Cocks (surname). (Cocksucker shouldn't be any problem). Seraphimblade Talk to me 05:58, 30 March 2007 (UTC)
- vaginal?
- scrotum
- dildo - Not offensive, IMHO. We should only cover clear policy violations here. --Conti|✉ 19:38, 29 March 2007 (UTC)
- "Dildo" is currently on the list. I was thinking of adding it in order to prevent attack usernames, i.e. "Fuck Gracenotes with a dildo", or similar. Plus, if someone is thinking about sex as they create a user account... well, no solid conclusions there, but still. The username policy restricts even sexual innuendo... this is a sex toy; rather explicit. Compare this to the spam blacklist. There's only one way to link to a site, but here, there are many many usernames to choose from. GracenotesT § 20:48, 29 March 2007 (UTC)
- dildo - Not offensive, IMHO. We should only cover clear policy violations here. --Conti|✉ 19:38, 29 March 2007 (UTC)
- Gracenotes is not a sick individual
- on wheels
- colbert - Could be someone's name. --Conti|✉ 19:38, 29 March 2007 (UTC)
- admin - (collateral - User:Padminiraman, User:BadmintonL, User:Breadmine?, User:Toadminor?, User:Dreadminus ...)
- Thank for pointing that out. The word can be isolated, then:
\badmin(istrator)?\b. GracenotesT § 20:21, 29 March 2007 (UTC)
- Thank for pointing that out. The word can be isolated, then:
- \badmin(istrator)?\b
- banned
- bannedinboston? Matchups 03:29, 8 April 2007 (UTC)
- Why would that name be allowed? GracenotesT § 01:33, 12 April 2007 (UTC)
- It would be allowed because it doesn't violate policy. Matchups 03:10, 12 April 2007 (UTC)
- It has the word "banned" in it, and anything else would be a stretch, really. In addition, there's always WP:ACC GracenotesT § 11:50, 12 April 2007 (UTC)
- It would be allowed because it doesn't violate policy. Matchups 03:10, 12 April 2007 (UTC)
- Why would that name be allowed? GracenotesT § 01:33, 12 April 2007 (UTC)
- bannedinboston? Matchups 03:29, 8 April 2007 (UTC)
- sysop
- steward - Could be a last name -- see List_of_people_by_name:_Stew-Stez Matchups 03:29, 8 April 2007 (UTC)
- username policy
- \.(com|org|co\.uk|net|info)(\b|/)
- By the way: right now we have
\.(com|org|co\.uk|net|info)\b. However, someone would still be able to register an account with the namewww.somespam.com/main.php. This is why there's the (\b|/), or even [\b/]. GracenotesT § 22:21, 29 March 2007 (UTC)- [\b] (the metacharacter \b in a character class) does not mean what you think it does, but rather a backspace character. Kotepho 06:57, 30 March 2007 (UTC)
- Supposedly, the slash qualifies for the word boundary \b. Anyway, it doesn't matter because, when I tried testing it, such a username is prevented at the lower software level, not here. —Centrx→talk • 07:17, 30 March 2007 (UTC)
- Damn, I forgot about \b's special use in a character class. Parentheses will work, then. GracenotesT § 13:40, 30 March 2007 (UTC)
- What about ".net programmer"? Matchups 03:29, 8 April 2007 (UTC)
- Also, I suggest these 20:52, 29 March 2007 (UTC), per WP:U#Wikipedia:
- By the way: right now we have
- Wikipedia (done)
- Wikiquote (done)
- Wiktionary (done)
- Wikibooks (done)
- Wikiversity (done)
- Wikisource (done)
- Wikinews (done)
- nigger (done by User:Alphachimp April 11, 2007)
- Please change this to require that this be at the beginning of the word, so we don't rule out "SniggeringRomantic" or "Sniggerlet". Matchups 03:34, 12 April 2007 (UTC)
- If we did this to only whitespace at beginning of the word, it should work though. I can't think of any legitimate use other than "snigger", and a lot of very nasty ones could start off with "nigger" and add something. Seraphimblade Talk to me 04:06, 12 April 2007 (UTC)
- Please change this to require that this be at the beginning of the word, so we don't rule out "SniggeringRomantic" or "Sniggerlet". Matchups 03:34, 12 April 2007 (UTC)
- douche
- bitch
- nazi
- watch out for names like "MianAziz." Matchups 03:34, 12 April 2007 (UTC)
- What about doing this one whitespace-only before and after? Seraphimblade Talk to me 04:06, 12 April 2007 (UTC)
- watch out for names like "MianAziz." Matchups 03:34, 12 April 2007 (UTC)
That's it... any other suggestions would be good. GracenotesT § 19:23, 29 March 2007 (UTC)
- I've commented a few of those out for numerous reasons, mostly since there'd be some collateral damage. --Conti|✉ 19:38, 29 March 2007 (UTC)
- How about changing "cock" to "cock(?!er)", since "cocker" and "cockerspaniel" are legit? GracenotesT § 19:41, 29 March 2007 (UTC)
- Oof. then there's cockatoo, cockle, cockroach... meh. Maybe just have it restricted to "\bcock(s|sucker)?\b", then. GracenotesT § 19:45, 29 March 2007 (UTC)
- I would appreciate it if you improved them rather than removed 'em. Namely, "\ban(us|al)" could become "\ban(us|al)\b"... etc. with other items. GracenotesT § 19:42, 29 March 2007 (UTC)
- Sorry, I don't know much about regular expression (but I'm learning fast :)), so I didn't knew how to correctly create such expressions. "cock(?!er)" would mean that everything that includes "cock" would be blacklisted, unless it's "cocker", right?
If so, that's fine by me.Okay, it's not, just saw the other examples above. --Conti|✉ 19:48, 29 March 2007 (UTC)- Yep! And this may of interest to you, if that's the sort of thing you're interested in. GracenotesT § 19:51, 29 March 2007 (UTC)
- Thanks! Well, I'm mostly interested in creating as less collateral damage as possible with this list. ;) --Conti|✉ 19:53, 29 March 2007 (UTC)
- Me too. By the way, "cock(s|sucker)?\b" will restrict "cock", "cocks", and "cocksucker". GracenotesT § 19:54, 29 March 2007 (UTC)
- Yep! And this may of interest to you, if that's the sort of thing you're interested in. GracenotesT § 19:51, 29 March 2007 (UTC)
- Sorry, I don't know much about regular expression (but I'm learning fast :)), so I didn't knew how to correctly create such expressions. "cock(?!er)" would mean that everything that includes "cock" would be blacklisted, unless it's "cocker", right?
- How about changing "cock" to "cock(?!er)", since "cocker" and "cockerspaniel" are legit? GracenotesT § 19:41, 29 March 2007 (UTC)
[edit] bot?
Someone want to add bot to the list?--VectorPotentialTalk 20:26, 29 March 2007 (UTC)
- That would be an effective way of stopping legitimate bots from getting accounts
:)GracenotesT § 20:28, 29 March 2007 (UTC)
[edit] Good resource
Check out User:Lupin/badwords for a pretty exhaustive list of profanity. alphachimp 22:29, 29 March 2007 (UTC)
- It is extremely exhaustive, except I don't think we need to restrict
burp(er|ing)s?:) GracenotesT § 22:36, 29 March 2007 (UTC) - Unless people are actively creating usernames with these words, I think it would be a waste of effort (and possibly server processing time) to add them all. —Centrx→talk • 22:38, 29 March 2007 (UTC)
- Heh, it's just a nice starting point, I think. alphachimp 22:40, 29 March 2007 (UTC)
[edit] Case sensitive
A note, the regex are all case sensitive. you can force a particular regex as case insensitive. Extracted from http://se.php.net/manual/en/reference.pcre.pattern.syntax.php →AzaToth 20:28, 29 March 2007 (UTC)
| “ | The settings of PCRE_CASELESS, PCRE_MULTILINE, PCRE_DOTALL, PCRE_UNGREEDY, PCRE_EXTRA, and PCRE_EXTENDED can be changed from within the pattern by a sequence of Perl option letters enclosed between "(?" and ")". The option letters are:
For example, (?im) sets caseless, multiline matching. It is also possible to unset these options by preceding the letter with a hyphen, and a combined setting and unsetting such as (?im-sx), which sets PCRE_CASELESS and PCRE_MULTILINE while unsetting PCRE_DOTALL and PCRE_EXTENDED, is also permitted. If a letter appears both before and after the hyphen, the option is unset. When an option change occurs at top level (that is, not inside subpattern parentheses), the change applies to the remainder of the pattern that follows. So /ab(?i)c/ matches only "abc" and "abC". This behaviour has been changed in PCRE 4.0, which is bundled since PHP 4.3.3. Before those versions, /ab(?i)c/ would perform as /abc/i (e.g. matching "ABC" and "aBc"). If an option change occurs inside a subpattern, the effect is different. This is a change of behaviour in Perl 5.005. An option change inside a subpattern affects only that part of the subpattern that follows it, so (a(?i)b)c matches abc and aBc and no other strings (assuming PCRE_CASELESS is not used). By this means, options can be made to have different settings in different parts of the pattern. Any changes made in one alternative do carry on into subsequent branches within the same subpattern. For example, (a(?i)b|c) matches "ab", "aB", "c", and "C", even though when matching "C" the first branch is abandoned before the option setting. This is because the effects of option settings happen at compile time. There would be some very weird behaviour otherwise. The PCRE-specific options PCRE_UNGREEDY and PCRE_EXTRA can be changed in the same way as the Perl-compatible options by using the characters U and X respectively. The (?X) flag setting is special in that it must always occur earlier in the pattern than any of the additional features it turns on, even when it is at top level. It is best put at the start. |
” |
- I would assume that it was already case insensitive, as two equivalent usernames (ignoring case) can't be registered. Maybe that caveat wasn't caught, however, in the code for this extension. GracenotesT § 20:30, 29 March 2007 (UTC)
[edit] Repetitions in usernames
How about adding this:
- (.+)\1{9}
A character (or sequence of characters) repeated at least 10 times. Миша13 22:21, 29 March 2007 (UTC)
- Careful with incredibly broad pattern like that. It's quite possible to create a regex that, when tested against a carefully-selected input, will take thousands of years to fail to match. --Carnildo 08:10, 30 March 2007 (UTC)
[edit] For future requests
Once the basic list has been established, does anyone think that it's worth it to create the page MediaWiki talk:Usernameblacklist/Requests for addition and removal? Thus, if there are any false positives, we can make a piped link to that page from MediaWiki:Blacklistedusernametext, so that newbies can say "Hey, I have [insert perfectly acceptable name]" somewhere. Or if there's an influx of vandals with a certain offensive username pattern, it can be stopped. (Although the blacklist should not be used as a temporary solution to anything, I think.) GracenotesT § 22:55, 29 March 2007 (UTC)
- I'm changing MediaWiki:Blacklistedusernametext to link to Wikipedia:Request an account (a page that already has this functionality). Admins can create accounts that bypass this blacklist. alphachimp 22:57, 29 March 2007 (UTC)
- Ah, great idea! It might also be good idea to give people the chance to explain why a certain expression matches should be removed. GracenotesT § 23:00, 29 March 2007 (UTC)
- This talk page is the most appropriate place to make requests for addition and removal. There will be almost nothing else going on here once the initial excitement dies down. There is no need for a subpage. See MediaWiki:Bad image list for a similar such page. —Centrx→talk • 23:02, 29 March 2007 (UTC)
-
- I have previously pondered upon the value of a subpage for the bad image list. I think that it would help. However, maybe a link to this page would work? GracenotesT § 23:06, 29 March 2007 (UTC)
-
-
- That might not be a good idea. Some could use it to find ways to bypass the blacklist. alphachimp 23:14, 29 March 2007 (UTC)
-
-
-
-
- That's right. Then perhaps admins working at WP:ACC should be aware of this MediaWiki page, so that they in turn can see if there's a faulty item. GracenotesT § 23:20, 29 March 2007 (UTC)
-
-
[edit] Malfunctioning?
I think this "username disallowal" may be malfunctioning. While working at WP:ACC I tried to fill several of the newest requests, but every time got a message saying "The username you have chosen is disallowed because it contains some forbidden string, such as an offensive word." But the usernames didn't seem offensive, they were:
- 555michael
- cybernerd1999
- scottlaverdiere
- DeathbyWiki
- jawdrops
It seems that every username is being denied; WP:ACC is being flooded with requests. Anyone know what is happening and how to fix it? jwillburtalk 23:37, 29 March 2007 (UTC)
[edit] The Race
While listing some common swearwords here is a good idea, we should avoid trying to enumerate every single inappropriate username, because turning this into a race between us and the people who like such names is simply a waste of effort. >Radiant< 08:06, 30 March 2007 (UTC)
- I suggested linking to this page from MediaWiki:Blacklistedusernametext. But as alphachimp mentioned above, security through obscurity is probably a good idea, so linking to it isn't. GracenotesT § 13:31, 30 March 2007 (UTC)
- I don't think security through obscurity is a very good idea, especially on a wiki. Vandals aren't stupid, I'm sure they won't have a problem finding this page at all. Is it possible to add to MediaWiki:Blacklistedusernametext the regex that lead to the blocking of the username, so the user at least knows why his username was blocked? --Conti|✉ 14:40, 30 March 2007 (UTC)
- We'd have to bug Rob Church for that—he wrote this extension, after all! I think that a list of generally negative terms would be good. For example, check out {{Test5}}. It's basically a laundry list of bad things to do: a bag of beans, essentially. However, by staying general, we can list things that are technically impossible to do, and if we keep this list comprehensive enough, vandals would probably get frustrated and give up. This would be aided by keeping this MediaWiki page in relative obscurity, such that not everyone would know about it. If there's a freak exception, someone can post at WP:ACC, and then the admin could discuss the troublesome item here, and create the account in the meanwhile. GracenotesT § 14:47, 30 March 2007 (UTC)
- I agree that a few examples are nice to have. We should especially mention the not so obvious cases, IMHO. ("Hey! Why am I not allowed to register User:I'm_a_wine_steward??") I think it is inevitable that we'll soon enough have vandals watching this page, creating inappropriate usernames that are not on this list, whether we announce this page or not. But that's life, I guess. I just hope we're not going to panic and make this list bigger than it needs to be. --Conti|✉ 15:09, 30 March 2007 (UTC)
- Hm. Come to think of it, "steward" isn't that bad of a word to have; perhaps we can remove it. After all, if someone is not familiar with the Wikimedia Foundation enough to know what a steward is, they would probably not be aware that having "steward" in a user name does not imply that that person is a steward. Not the same for sysop, however: "sysop" has a common meaning. And in cases where there might be a user name "Wikimedia stewards suck", we can take that on a case-by-case basis. GracenotesT § 01:49, 31 March 2007 (UTC)
- I went ahead and removed steward. I think when (if) a case came up where it was being abused, we could handle that, rather an a (more likely) larger fallout of blocked usernames and backlog at WP:ACC. I think, per Gracenotes, that people outside the WMF don't know was a steward is (to us), so the potential for abuse is rather low. ^demon[omg plz] 04:21, 31 March 2007 (UTC)
- Hm. Come to think of it, "steward" isn't that bad of a word to have; perhaps we can remove it. After all, if someone is not familiar with the Wikimedia Foundation enough to know what a steward is, they would probably not be aware that having "steward" in a user name does not imply that that person is a steward. Not the same for sysop, however: "sysop" has a common meaning. And in cases where there might be a user name "Wikimedia stewards suck", we can take that on a case-by-case basis. GracenotesT § 01:49, 31 March 2007 (UTC)
- I agree that a few examples are nice to have. We should especially mention the not so obvious cases, IMHO. ("Hey! Why am I not allowed to register User:I'm_a_wine_steward??") I think it is inevitable that we'll soon enough have vandals watching this page, creating inappropriate usernames that are not on this list, whether we announce this page or not. But that's life, I guess. I just hope we're not going to panic and make this list bigger than it needs to be. --Conti|✉ 15:09, 30 March 2007 (UTC)
- We'd have to bug Rob Church for that—he wrote this extension, after all! I think that a list of generally negative terms would be good. For example, check out {{Test5}}. It's basically a laundry list of bad things to do: a bag of beans, essentially. However, by staying general, we can list things that are technically impossible to do, and if we keep this list comprehensive enough, vandals would probably get frustrated and give up. This would be aided by keeping this MediaWiki page in relative obscurity, such that not everyone would know about it. If there's a freak exception, someone can post at WP:ACC, and then the admin could discuss the troublesome item here, and create the account in the meanwhile. GracenotesT § 14:47, 30 March 2007 (UTC)
- I don't think security through obscurity is a very good idea, especially on a wiki. Vandals aren't stupid, I'm sure they won't have a problem finding this page at all. Is it possible to add to MediaWiki:Blacklistedusernametext the regex that lead to the blocking of the username, so the user at least knows why his username was blocked? --Conti|✉ 14:40, 30 March 2007 (UTC)
[edit] Code
Please fill me in on how the coding works. BuickCenturyDriver (Honk, contribs, odometer) 22:19, 2 April 2007 (UTC)
- The text after an asterik (rendered as a bullet when you load MediaWiki:Usernameblacklist) is treated as a regular expression and, if any of the regular expressions are detected when creating a new user name, that user name cannot be created. For general information about regular expressions, see regular expressions and http://www.codeproject.com/dotnet/RegexTutorial.asp; for information about their implementation in PHP (the native language of MediaWiki) see http://se.php.net/manual/en/reference.pcre.pattern.syntax.php; and for general information about the extension Username Blacklist which this is part of, see mw:Extension:Username Blacklist. Regards, Iamunknown 16:14, 6 April 2007 (UTC)
[edit] Unified login
I like this feature. I wonder, does this have any implications for the Single Unified Login that we're getting Real Soon Now? --Abu-Fool Danyal ibn Amir al-Makhiri 20:10, 17 April 2007 (UTC)
[edit] Implementation
I see that this has died down somewhat. Any change we can implement some of the uncontested/resolved changes above? GracenotesT § 01:29, 21 May 2007 (UTC)
- {{editprotected}}? --Iamunknown 05:39, 21 May 2007 (UTC)
- Such as which ones? Most of these have collateral damage, and for others are people actually creating many usernames with these strings? —Centrx→talk • 06:15, 21 May 2007 (UTC)
These seem to be the least-collateral-damage ones.
- phall(us|ic)
- faggot
- slut
- cock(s|sucker)?\b
- vaginal?
- scrotum
- dildo
- on wheels
- \badmin(istrator)?\b
- douche
- bitch
- \bnazi\b
- \bass(hole|hat|es)?\b
I do not expect all of these to be put in there; certainly some may qualify, however. GracenotesT § 22:25, 22 May 2007 (UTC)
- The last regex could match 'assess', denying, for example, a property appraiser from creating User:I assess houses. —Crazytales (talk) (alt) 00:50, 5 June 2007 (UTC)
- Actually, it wouldn't match that (the
\b). GracenotesT § 19:10, 17 June 2007 (UTC)
- Actually, it wouldn't match that (the
[edit] More regular expressions
I was at work, and I found some regex's used for data validation that might be of some use here. First off, we have one that should catch "Most Common Curses."
\b(?:(?:ass\s?(?:hole|wipe|bandit|clown|hat|licker)?['s]*)|( ?:bi[a]?tch(?:s|ing|e[sd]?)?)|(?:blow[-]?\s?job(?:s|[b]?ing| [b]?ed)?)|(?:(?:cock[s]?|cunt[s]?)(?:suck(?:er[s]?|ing|ed)?) ?)|(?:(?:mother[-]?|mutha[-]?|un|ass[-]?|finger[-]?|fist[-]? |dry[-]?)?fuck(?:[-]?all|able|er|a|ing|ed|[-]?head|pisswankt it|[-]?wit)?[']?(?:z|s)?)|(?:(?:puss(?:y|ies))(?![-]?cat|[-] ?foot))|(?:(?:jack[-]?|dip[-]?)?shit(?:[-]?\w*)*)|(?:dick[-] ?head[s]?)|(?:gang(?:sta[sz]?|bang(?:er|ing|s|a[z]?)?))|(?:p ecker[-]?\s?(?:track[s]?|wood|head[s]?|cheese|face[s]?)))\b
Secondly, I have one that would capture phone numbers:
1?\s*-?\s*(\d{3}|\(\s*\d{3}\s*\))\s*-?\s*\d{3}\s*-?\s*\d{4}
Finally, I have one that I *know* is long, but supposedly it will catch any possible URL you could throw at it, seriously limiting the number of spam usernames people could register (as e-mails are already disabled on a software level).
\b(?:http://(?:(?:(?:(?:(?:[a-zA-Z\d](?:(?:[a-zA-Z\d]|-) *[a-zA-Z\d])?)\.)*(?:[a-zA-Z](?:(?:[a-zA-Z\d]|-)*[a-zA-Z\d])?))|(?:( ?:\d+)(?:\.(?:\d+)){3}))(?::(?:\d+))?)(?:/(?:(?:(?:(?:[a-zA-Z\d$\-_. +!*'(),]|(?:%[a-fA-F\d]{2}))|[;:@&=])*)(?:/(?:(?:(?:[a-zA-Z\d$\-_.+! *'(),]|(?:%[a-fA-F\d]{2}))|[;:@&=])*))*)(?:\?(?:(?:(?:[a-zA-Z\d$\-_. +!*'(),]|(?:%[a-fA-F\d]{2}))|[;:@&=])*))?)?)|(?:ftp://(?:(?:(?:(?:(? :[a-zA-Z\d$\-_.+!*'(),]|(?:%[a-fA-F\d]{2}))|[;?&=])*)(?::(?:(?:(?:[a -zA-Z\d$\-_.+!*'(),]|(?:%[a-fA-F\d]{2}))|[;?&=])*))?@)?(?:(?:(?:(?:( ?:[a-zA-Z\d](?:(?:[a-zA-Z\d]|-)*[a-zA-Z\d])?)\.)*(?:[a-zA-Z](?:(?:[a -zA-Z\d]|-)*[a-zA-Z\d])?))|(?:(?:\d+)(?:\.(?:\d+)){3}))(?::(?:\d+))? ))(?:/(?:(?:(?:(?:[a-zA-Z\d$\-_.+!*'(),]|(?:%[a-fA-F\d]{2}))|[?:@&=] )*)(?:/(?:(?:(?:[a-zA-Z\d$\-_.+!*'(),]|(?:%[a-fA-F\d]{2}))|[?:@&=])* ))*)(?:;type=[AIDaid])?)?)|(?:news:(?:(?:(?:(?:[a-zA-Z\d$\-_.+!*'(), ]|(?:%[a-fA-F\d]{2}))|[;/?:&=])+@(?:(?:(?:(?:[a-zA-Z\d](?:(?:[a-zA-Z \d]|-)*[a-zA-Z\d])?)\.)*(?:[a-zA-Z](?:(?:[a-zA-Z\d]|-)*[a-zA-Z\d])?) )|(?:(?:\d+)(?:\.(?:\d+)){3})))|(?:[a-zA-Z](?:[a-zA-Z\d]|[_.+-])*)|\ *))|(?:nntp://(?:(?:(?:(?:(?:[a-zA-Z\d](?:(?:[a-zA-Z\d]|-)*[a-zA-Z\d ])?)\.)*(?:[a-zA-Z](?:(?:[a-zA-Z\d]|-)*[a-zA-Z\d])?))|(?:(?:\d+)(?:\ .(?:\d+)){3}))(?::(?:\d+))?)/(?:[a-zA-Z](?:[a-zA-Z\d]|[_.+-])*)(?:/( ?:\d+))?)|(?:telnet://(?:(?:(?:(?:(?:[a-zA-Z\d$\-_.+!*'(),]|(?:%[a-f A-F\d]{2}))|[;?&=])*)(?::(?:(?:(?:[a-zA-Z\d$\-_.+!*'(),]|(?:%[a-fA-F \d]{2}))|[;?&=])*))?@)?(?:(?:(?:(?:(?:[a-zA-Z\d](?:(?:[a-zA-Z\d]|-)* [a-zA-Z\d])?)\.)*(?:[a-zA-Z](?:(?:[a-zA-Z\d]|-)*[a-zA-Z\d])?))|(?:(? :\d+)(?:\.(?:\d+)){3}))(?::(?:\d+))?))/?)|(?:gopher://(?:(?:(?:(?:(? :[a-zA-Z\d](?:(?:[a-zA-Z\d]|-)*[a-zA-Z\d])?)\.)*(?:[a-zA-Z](?:(?:[a- zA-Z\d]|-)*[a-zA-Z\d])?))|(?:(?:\d+)(?:\.(?:\d+)){3}))(?::(?:\d+))?) (?:/(?:[a-zA-Z\d$\-_.+!*'(),;/?:@&=]|(?:%[a-fA-F\d]{2}))(?:(?:(?:[a- zA-Z\d$\-_.+!*'(),;/?:@&=]|(?:%[a-fA-F\d]{2}))*)(?:%09(?:(?:(?:[a-zA -Z\d$\-_.+!*'(),]|(?:%[a-fA-F\d]{2}))|[;:@&=])*)(?:%09(?:(?:[a-zA-Z\ d$\-_.+!*'(),;/?:@&=]|(?:%[a-fA-F\d]{2}))*))?)?)?)?)|(?:wais://(?:(? :(?:(?:(?:[a-zA-Z\d](?:(?:[a-zA-Z\d]|-)*[a-zA-Z\d])?)\.)*(?:[a-zA-Z] (?:(?:[a-zA-Z\d]|-)*[a-zA-Z\d])?))|(?:(?:\d+)(?:\.(?:\d+)){3}))(?::( ?:\d+))?)/(?:(?:[a-zA-Z\d$\-_.+!*'(),]|(?:%[a-fA-F\d]{2}))*)(?:(?:/( ?:(?:[a-zA-Z\d$\-_.+!*'(),]|(?:%[a-fA-F\d]{2}))*)/(?:(?:[a-zA-Z\d$\- _.+!*'(),]|(?:%[a-fA-F\d]{2}))*))|\?(?:(?:(?:[a-zA-Z\d$\-_.+!*'(),]| (?:%[a-fA-F\d]{2}))|[;:@&=])*))?)|(?:mailto:(?:(?:[a-zA-Z\d$\-_.+!*' (),;/?:@&=]|(?:%[a-fA-F\d]{2}))+))|(?:file://(?:(?:(?:(?:(?:[a-zA-Z\ d](?:(?:[a-zA-Z\d]|-)*[a-zA-Z\d])?)\.)*(?:[a-zA-Z](?:(?:[a-zA-Z\d]|- )*[a-zA-Z\d])?))|(?:(?:\d+)(?:\.(?:\d+)){3}))|localhost)?/(?:(?:(?:( ?:[a-zA-Z\d$\-_.+!*'(),]|(?:%[a-fA-F\d]{2}))|[?:@&=])*)(?:/(?:(?:(?: [a-zA-Z\d$\-_.+!*'(),]|(?:%[a-fA-F\d]{2}))|[?:@&=])*))*))|(?:prosper o://(?:(?:(?:(?:(?:[a-zA-Z\d](?:(?:[a-zA-Z\d]|-)*[a-zA-Z\d])?)\.)*(? :[a-zA-Z](?:(?:[a-zA-Z\d]|-)*[a-zA-Z\d])?))|(?:(?:\d+)(?:\.(?:\d+)){ 3}))(?::(?:\d+))?)/(?:(?:(?:(?:[a-zA-Z\d$\-_.+!*'(),]|(?:%[a-fA-F\d] {2}))|[?:@&=])*)(?:/(?:(?:(?:[a-zA-Z\d$\-_.+!*'(),]|(?:%[a-fA-F\d]{2 }))|[?:@&=])*))*)(?:(?:;(?:(?:(?:[a-zA-Z\d$\-_.+!*'(),]|(?:%[a-fA-F\ d]{2}))|[?:@&])*)=(?:(?:(?:[a-zA-Z\d$\-_.+!*'(),]|(?:%[a-fA-F\d]{2}) )|[?:@&])*)))*)|(?:ldap://(?:(?:(?:(?:(?:(?:[a-zA-Z\d](?:(?:[a-zA-Z\ d]|-)*[a-zA-Z\d])?)\.)*(?:[a-zA-Z](?:(?:[a-zA-Z\d]|-)*[a-zA-Z\d])?)) |(?:(?:\d+)(?:\.(?:\d+)){3}))(?::(?:\d+))?))?/(?:(?:(?:(?:(?:(?:(?:[ a-zA-Z\d]|%(?:3\d|[46][a-fA-F\d]|[57][Aa\d]))|(?:%20))+|(?:OID|oid)\ .(?:(?:\d+)(?:\.(?:\d+))*))(?:(?:%0[Aa])?(?:%20)*)=(?:(?:%0[Aa])?(?: %20)*))?(?:(?:[a-zA-Z\d$\-_.+!*'(),]|(?:%[a-fA-F\d]{2}))*))(?:(?:(?: %0[Aa])?(?:%20)*)\+(?:(?:%0[Aa])?(?:%20)*)(?:(?:(?:(?:(?:[a-zA-Z\d]| %(?:3\d|[46][a-fA-F\d]|[57][Aa\d]))|(?:%20))+|(?:OID|oid)\.(?:(?:\d+ )(?:\.(?:\d+))*))(?:(?:%0[Aa])?(?:%20)*)=(?:(?:%0[Aa])?(?:%20)*))?(? :(?:[a-zA-Z\d$\-_.+!*'(),]|(?:%[a-fA-F\d]{2}))*)))*)(?:(?:(?:(?:%0[A a])?(?:%20)*)(?:[;,])(?:(?:%0[Aa])?(?:%20)*))(?:(?:(?:(?:(?:(?:[a-zA -Z\d]|%(?:3\d|[46][a-fA-F\d]|[57][Aa\d]))|(?:%20))+|(?:OID|oid)\.(?: (?:\d+)(?:\.(?:\d+))*))(?:(?:%0[Aa])?(?:%20)*)=(?:(?:%0[Aa])?(?:%20) *))?(?:(?:[a-zA-Z\d$\-_.+!*'(),]|(?:%[a-fA-F\d]{2}))*))(?:(?:(?:%0[A a])?(?:%20)*)\+(?:(?:%0[Aa])?(?:%20)*)(?:(?:(?:(?:(?:[a-zA-Z\d]|%(?: 3\d|[46][a-fA-F\d]|[57][Aa\d]))|(?:%20))+|(?:OID|oid)\.(?:(?:\d+)(?: \.(?:\d+))*))(?:(?:%0[Aa])?(?:%20)*)=(?:(?:%0[Aa])?(?:%20)*))?(?:(?: [a-zA-Z\d$\-_.+!*'(),]|(?:%[a-fA-F\d]{2}))*)))*))*(?:(?:(?:%0[Aa])?( ?:%20)*)(?:[;,])(?:(?:%0[Aa])?(?:%20)*))?)(?:\?(?:(?:(?:(?:[a-zA-Z\d $\-_.+!*'(),]|(?:%[a-fA-F\d]{2}))+)(?:,(?:(?:[a-zA-Z\d$\-_.+!*'(),]| (?:%[a-fA-F\d]{2}))+))*)?)(?:\?(?:base|one|sub)(?:\?(?:((?:[a-zA-Z\d $\-_.+!*'(),;/?:@&=]|(?:%[a-fA-F\d]{2}))+)))?)?)?)|(?:(?:z39\.50[rs] )://(?:(?:(?:(?:(?:[a-zA-Z\d](?:(?:[a-zA-Z\d]|-)*[a-zA-Z\d])?)\.)*(? :[a-zA-Z](?:(?:[a-zA-Z\d]|-)*[a-zA-Z\d])?))|(?:(?:\d+)(?:\.(?:\d+)){ 3}))(?::(?:\d+))?)(?:/(?:(?:(?:[a-zA-Z\d$\-_.+!*'(),]|(?:%[a-fA-F\d] {2}))+)(?:\+(?:(?:[a-zA-Z\d$\-_.+!*'(),]|(?:%[a-fA-F\d]{2}))+))*(?:\ ?(?:(?:[a-zA-Z\d$\-_.+!*'(),]|(?:%[a-fA-F\d]{2}))+))?)?(?:;esn=(?:(? :[a-zA-Z\d$\-_.+!*'(),]|(?:%[a-fA-F\d]{2}))+))?(?:;rs=(?:(?:[a-zA-Z\ d$\-_.+!*'(),]|(?:%[a-fA-F\d]{2}))+)(?:\+(?:(?:[a-zA-Z\d$\-_.+!*'(), ]|(?:%[a-fA-F\d]{2}))+))*)?))|(?:cid:(?:(?:(?:[a-zA-Z\d$\-_.+!*'(),] |(?:%[a-fA-F\d]{2}))|[;?:@&=])*))|(?:mid:(?:(?:(?:[a-zA-Z\d$\-_.+!*' (),]|(?:%[a-fA-F\d]{2}))|[;?:@&=])*)(?:/(?:(?:(?:[a-zA-Z\d$\-_.+!*'( ),]|(?:%[a-fA-F\d]{2}))|[;?:@&=])*))?)|(?:vemmi://(?:(?:(?:(?:(?:[a- zA-Z\d](?:(?:[a-zA-Z\d]|-)*[a-zA-Z\d])?)\.)*(?:[a-zA-Z](?:(?:[a-zA-Z \d]|-)*[a-zA-Z\d])?))|(?:(?:\d+)(?:\.(?:\d+)){3}))(?::(?:\d+))?)(?:/ (?:(?:(?:[a-zA-Z\d$\-_.+!*'(),]|(?:%[a-fA-F\d]{2}))|[/?:@&=])*)(?:(? :;(?:(?:(?:[a-zA-Z\d$\-_.+!*'(),]|(?:%[a-fA-F\d]{2}))|[/?:@&])*)=(?: (?:(?:[a-zA-Z\d$\-_.+!*'(),]|(?:%[a-fA-F\d]{2}))|[/?:@&])*))*))?)|(? :imap://(?:(?:(?:(?:(?:(?:(?:[a-zA-Z\d$\-_.+!*'(),]|(?:%[a-fA-F\d]{2 }))|[&=~])+)(?:(?:;[Aa][Uu][Tt][Hh]=(?:\*|(?:(?:(?:[a-zA-Z\d$\-_.+!* '(),]|(?:%[a-fA-F\d]{2}))|[&=~])+))))?)|(?:(?:;[Aa][Uu][Tt][Hh]=(?:\ *|(?:(?:(?:[a-zA-Z\d$\-_.+!*'(),]|(?:%[a-fA-F\d]{2}))|[&=~])+)))(?:( ?:(?:(?:[a-zA-Z\d$\-_.+!*'(),]|(?:%[a-fA-F\d]{2}))|[&=~])+))?))@)?(? :(?:(?:(?:(?:[a-zA-Z\d](?:(?:[a-zA-Z\d]|-)*[a-zA-Z\d])?)\.)*(?:[a-zA -Z](?:(?:[a-zA-Z\d]|-)*[a-zA-Z\d])?))|(?:(?:\d+)(?:\.(?:\d+)){3}))(? ::(?:\d+))?))/(?:(?:(?:(?:(?:(?:[a-zA-Z\d$\-_.+!*'(),]|(?:%[a-fA-F\d ]{2}))|[&=~:@/])+)?;[Tt][Yy][Pp][Ee]=(?:[Ll](?:[Ii][Ss][Tt]|[Ss][Uu] [Bb])))|(?:(?:(?:(?:[a-zA-Z\d$\-_.+!*'(),]|(?:%[a-fA-F\d]{2}))|[&=~: @/])+)(?:\?(?:(?:(?:[a-zA-Z\d$\-_.+!*'(),]|(?:%[a-fA-F\d]{2}))|[&=~: @/])+))?(?:(?:;[Uu][Ii][Dd][Vv][Aa][Ll][Ii][Dd][Ii][Tt][Yy]=(?:[1-9] \d*)))?)|(?:(?:(?:(?:[a-zA-Z\d$\-_.+!*'(),]|(?:%[a-fA-F\d]{2}))|[&=~ :@/])+)(?:(?:;[Uu][Ii][Dd][Vv][Aa][Ll][Ii][Dd][Ii][Tt][Yy]=(?:[1-9]\ d*)))?(?:/;[Uu][Ii][Dd]=(?:[1-9]\d*))(?:(?:/;[Ss][Ee][Cc][Tt][Ii][Oo ][Nn]=(?:(?:(?:[a-zA-Z\d$\-_.+!*'(),]|(?:%[a-fA-F\d]{2}))|[&=~:@/])+ )))?)))?)|(?:nfs:(?:(?://(?:(?:(?:(?:(?:[a-zA-Z\d](?:(?:[a-zA-Z\d]|- )*[a-zA-Z\d])?)\.)*(?:[a-zA-Z](?:(?:[a-zA-Z\d]|-)*[a-zA-Z\d])?))|(?: (?:\d+)(?:\.(?:\d+)){3}))(?::(?:\d+))?)(?:(?:/(?:(?:(?:(?:(?:[a-zA-Z \d\$\-_.!~*'(),])|(?:%[a-fA-F\d]{2})|[:@&=+])*)(?:/(?:(?:(?:[a-zA-Z\ d\$\-_.!~*'(),])|(?:%[a-fA-F\d]{2})|[:@&=+])*))*)?)))?)|(?:/(?:(?:(? :(?:(?:[a-zA-Z\d\$\-_.!~*'(),])|(?:%[a-fA-F\d]{2})|[:@&=+])*)(?:/(?: (?:(?:[a-zA-Z\d\$\-_.!~*'(),])|(?:%[a-fA-F\d]{2})|[:@&=+])*))*)?))|( ?:(?:(?:(?:(?:[a-zA-Z\d\$\-_.!~*'(),])|(?:%[a-fA-F\d]{2})|[:@&=+])*) (?:/(?:(?:(?:[a-zA-Z\d\$\-_.!~*'(),])|(?:%[a-fA-F\d]{2})|[:@&=+])*)) *)?)))\b
Discussion? Implementation? ^demon[omg plz] 14:14, 31 May 2007 (UTC)
- I think they're too complicated. I can't seem to quote or escape the first one properly to test it against my dictionary, so I couldn't tell you if it matches anything it shouldn't. --Carnildo 01:42, 1 June 2007 (UTC)
- Is there a need for this? Are people registering gopher URLs as usernames for spam? —Centrx→talk • 04:18, 1 June 2007 (UTC)
- Why would we need a phone-number-catching regex? someone can register 867-5309 or 555-1234 and have it be a within-policy number. —Crazytales (public computer) (talk) (main) 19:52, 1 June 2007 (UTC)
- Also, hat wouldn't catch letter exchanges. For example, my fictitious home number is RIver1-4601, rendered often as RI1-4601, and dialed as 7414601. —Crazytales (public computer) (talk) (main) 19:55, 1 June 2007 (UTC)
- It also doesn't catch non-US numbers, or numbers written in a non-standard format. --Carnildo 21:37, 1 June 2007 (UTC)
- Yeah, like +044 0800 414560, a UK toll-free number. —Crazytales o.o 15:13, 2 June 2007 (UTC)
- Could be easily modified to catch some of those, I would think...
- But at what point does it start generating false positives? --Carnildo 21:12, 2 June 2007 (UTC)
- Could be easily modified to catch some of those, I would think...
- Yeah, like +044 0800 414560, a UK toll-free number. —Crazytales o.o 15:13, 2 June 2007 (UTC)
- It also doesn't catch non-US numbers, or numbers written in a non-standard format. --Carnildo 21:37, 1 June 2007 (UTC)
- Also, hat wouldn't catch letter exchanges. For example, my fictitious home number is RIver1-4601, rendered often as RI1-4601, and dialed as 7414601. —Crazytales (public computer) (talk) (main) 19:55, 1 June 2007 (UTC)
[edit] 2 More Additions
I went ahead and added two more, Oversight and Checkuser. Figured those shouldn't be allowed either. Hope I wasn't out of line adding them. ^demon[omg plz] 22:22, 6 August 2007 (UTC)
[edit] String ideas
I made a list of all blocks since 2007-04-12 whos reason contains 'username', some 8000 in all. I lowercased them and split the names into substrings. Below are the top N length substrings in blocked names. The first column is the number of users blocked who had that substring. Some of these may be good to block.
[edit] Length 7
51 product
46 roducti
46 oductio
46 duction
46 1234567
44 2345678
42 asshole
42 3456789
39 uctions
39 bastard
38 vandali
38 sername
37 yourmom
37 usernam
37 aaaaaaa
35 sockpup
34 vandal
33 ockpupp
33 kpuppet
33 ckpuppe
32 ooooooo
31 oompapa
28 account
26 nationa
26 ational
25 n criss
25 len cri
25 helen c
25 en cris
25 elen cr
[edit] Length 6
102 vandal
84 is a
67 엄마
55 123456
52 retard
52 produc
51 roduct
51 nigger
50 puppet
48 uction
46 oducti
46 ductio
46 234567
44 dotcom
44 345678
43 asshol
43 456789
42 sshole
41 yourmo
41 aaaaaa
40 ctions
40 christ
39 will
39 poopoo
39 bastar
39 astard
38 sernam
38 ername
38 andali
37 userna
[edit] Length 5
151 bitch
149 the
137 sucks
119 jesus
107 isgay
103 vanda
103 ation
103 andal
90 is a
85 and
84 nigga
84 is a
77 ihate
77 admin
74 block
71 tions
71 12345
68 poopy
56 suck
56 23456
55 yourm
55 balls
54 retar
54 igger
54 ction
53 you
53 poopo
53 etard
53 chris
52 roduc
[edit] Length 4
374 poop
256 suck
234 is
209 the
194 tion
180 shit
170 your
169 itch
159 the
152 bitc
146 hate
141 ucks
141 nigg
133 ing
123 name
120 esus
120 anda
119 jesu
118 wiki
114 and
113 sgay
113 of
110 vand
110 s a
108 love
107 isga
106 cock
105 �의
104 atio
103 tard
103 ndal
101 you
100 1234
97 will
96 is a
95 like
94 dick
93 butt
88 ster
87 igga
[edit] Attack usernames
{{editprotected}} Many username vandals create usernames which constitute personal attacks on editors: Here's a pile. As a frequent target of these types of vandals, can I have \bMER\-C\s.+ listed here, to stop this harassment? Other frequent targets include Misza13 and Slimvirgin. MER-C 10:04, 11 August 2007 (UTC)
Done. Cheers. --MZMcBride 21:35, 11 August 2007 (UTC)
[edit] dot com problem?
I noticed User:Renuncio.com was registered today [1], despite having \.(com|org|co\.uk|net|info)\b in the blacklist. It's not the first time I've seen this happen. Is it just a bug, or does the regexp need tweaking? -- zzuuzz (talk) 20:58, 25 August 2007 (UTC)

