Jump to content

Wikipedia:Edit filter/Instructions

From Wikipedia, the free encyclopedia

Creating a filter

[edit]

This section explains how to create a filter with some preliminary testing, so you don't flood the history page.

  • Read documentation at mw:Extension:AbuseFilter/Rules format
  • Test some expressions at debugging tools:
    for example, evaluate 'some string' rlike 'myregexp' to test your regexp;
    true expression evaluates to 1, false shows nothing
  • Manually test your code at batch testing page:
    find someone who recently made an edit that you're trying to target,
    put that username in "Changes by user" field and click "test" button;
    if you don't see positive triggers:
    check "show changes that do not match the filter" and click "test" again
    find the edit that you targeted and click on "(details)"
    check the variables and maybe return back to debugging tools
  • Create idle (logging only) filter
    in the notes field add something like "Testing phase, will add a warning"
    let the idle filter run for a while to test for false positives and negatives
  • Post a message at WP:EFN, so other edit filter managers have a chance to improve it
  • Finally, fully enable your filter, e.g. add warning, prevention, tagging, etc.

Controlling efficiency

[edit]

Because these filters are run on every single edit, a poorly worded filter has the strong potential to severely slow down editing or even cause some larger pages to time out. However, some very minor changes in how the conditions are ordered can greatly decrease the running time of the filters. Making use of the order of operations in this way can make the difference between a good filter and one that must be disabled for performance reasons.

Order of operations

[edit]

Operations are generally done left-to-right, but there is an order to which they are resolved. As soon as the filter fails one of the conditions, it will stop checking the rest of them (due to short-circuit evaluation) and move on to the next filter. The evaluation order is:

  1. Anything surrounded by parentheses (( and )) is evaluated as a single unit.
  2. Turning variables/literals into their respective data. (i.e., article_namespace to 0)
  3. Function calls (norm, lcase, etc.)
  4. Unary + and - (defining positive or negative value, e.g. -1234, +1234)
  5. Keywords
  6. Boolean inversion (!x)
  7. Exponentiation (2**3 → 8)
  8. Multiplication-related (multiplication, division, modulo)
  9. Addition and subtraction (3-2 → 1)
  10. Comparisons. (<, >, ==)
  11. Boolean operations. (&, |, ^, in)

Making expensive operations cheaper

[edit]

When using keywords such as rlike, in, or contains, the filter must go through the entire string variable to look for the string you're searching for. Variables such as old_wikitext have the tendency to be very large. Sometimes you will be able to approximate these variables by using smaller ones such as added_lines or removed_lines, which the filter can process much faster. Also, using a check for old_size can also help to ensure that you're not going to even try checking a large block of wikitext.

You should always order your filters so that the condition that will knock out the largest number of edits is first. Usually this is a user groups or a user editcount check; in general, the last condition should be the regex that is actually looking for the sort of vandalism you're targeting.