User talk:Crispy1989/Dataset

From Wikipedia, the free encyclopedia

[edit] Naming the links

Should the links be default numbered or named? Naming the links would help differentiate which edits are which. --209.244.31.53 (talk) 17:46, 8 April 2008 (UTC)

It doesn't matter which is used - either way, only the actual link is used. —Preceding unsigned comment added by Crispy1989 (talkcontribs) 20:35, 10 April 2008 (UTC)
Some are removing the labels. Those would help with quality control and reading the current composition of the dataset. 209.244.31.53 (talk) 18:45, 27 May 2008 (UTC)

[edit] Easier way to build the data set?

Hi, I'd love to help build a better mousetrap, but there's got to be a better way to get a volume of the needed data faster. Do you think you would be able to relatively quickly whip up a tool that would give say a list of links cluebot has currently identified as vandalism or have been rolled back using admin rollback and a quick and dirty interface that would allow clicking a checkbox to verify that the link is in fact vandalism and then submitting the list en-masse? That way with the list of links I can quickly open a chunk of them in tabs, or pop-ups, verify them and check the box. I think this would allow lots of people to help get your data very fast. Currently it requires cutting and pasting urls from semi random places in short, it's tedious. - Taxman Talk 16:17, 23 May 2008 (UTC)

[edit] A method

Open a history with &limit=9999&action=history. From bottom to top look at each diff for a page, capture every other diff unless it is a special edit such as adding more than a few words of original text or this. Then call that page history searched through. 209.244.31.53 (talk) 18:50, 29 May 2008 (UTC)