Wikipedia talk:WikiProject Database analysis
From Wikipedia, the free encyclopedia
[edit] Interwiki link analysis
I've always had a hankering to perform some basic analysis of interwiki links, but never have managed to get enough disk space and processing power at ocne to do the job. In a nutshell, it should be possible to suggest interwiki links based purely on those that exist already. Some useful operations might be:
- Interwiki links to articles that do not exist (A:X links to B:X which does not exist)
- Interwiki links to redirect pages (A:X links to B:X which redirects to B:Y)
- Inconsistent interwiki links (A:X links to B:X, but B:X links to A:Y)
- Suggested reciprocal links (A:X links to B:X, should B:X link back to A:X?)
- Pages with multiple interwiki links to one language (A:X links to B:X and B:Y)
- Pages that are linked to from different articles in the same language (both A:X and A:Y link to B:X)
- Potential commutations (A:X links to B:X and C:X. Should B:X should link to C:X also?)
- Potential inductions (A:X links to B:X and B:X links to C:X. Should A:X link to C:X?)
- TB 22:33, 5 July 2007 (UTC)
- That sounds like a great idea, and I'll look into doing such analysis in the near future. I'm just returning to rewriting my analysis scripts after an extended break, and am getting ready to process the latest en-wiki dump. I'll extract the interwiki links from that dump first, then proceed from there. --Sapphic 17:45, 30 July 2007 (UTC)
-
- I've started playing about with this (finally was able to get a resonably reliable utf-32 system working). Results (such as they are) in Wikipedia:WikiProject Interlanguage Links - TB 13:12, 19 August 2007 (UTC)
[edit] Handling UTF characters
Not sure if anyone else has had similar, but until recently I was having problems handling UTF characters in mysql databases. Having deduced that the problems were down to windows xp, I've moved my mysql 5.0 installation onto an old PC running Ubuntu linux. Accessed through a putty terminal from my desktop windows xp box with the Translation option set to UTF-8, extended UTF characers still don't display properly but *can* be cut and pasted into a Firefox window for saving to Wikipedia or viewing. - TB 15:52, 27 August 2007 (UTC)

