Initial measures to manage trackback spam in KNotes blogs

06-September-2005

email this
I had hoped we could put this one off until an initial public release of KNotes, but recent experience tells us that we cannot recommend installing KNotes into high-page-rank Plone sites until we've put some measures in place for managing trackback spam. this post outlines the issue and roughly documents the measures we're working on.

The only definition I ever found that created the lighbulb moment I was feeling was “Social software is stuff that gets spammed.” Not a perfect definition, but servicable in its way.

Many-to-Many: Tags run amok!

I hate spammers and dirty, venal link-farming vandals. I really, really hate them.

But they are out there, and they have noticed how useful trackback 2-way linking has been in raising the google page ranking of well-connected weblogs.

The link-farming spammers may also have noticed that the boost to the page ranks of well connected weblogs has, through the development of blogging, been a good thing for seekers after good content and for google: it helped people to find what they were looking for by googling it, and it helped them to browse around communities of discourse in fascinating and useful ways. They may have noticed this, but it has not stopped them from adding noise, aggravation and ugliness to the system by perpetrating trackback-spam.

In this post, I briefly explain trackback spam, and outline the measures we are planning to help the managers of KNotes content to resist the spammers by keeping spam links out of their own content. Click the permalink ("Continue reading this item") for more...


Extended text for this entry:


What is trackback spam?
A spam trackback is a trackback ping sent to your server which pretends to be reporting some content on another server which links to some conent on your server. Like a respectable ping from a real item of content which adds value to your own, it will tell your server about a link, a trackback title, some excerpt text and a weblog source. Unlike a respectable ping, everything it tells you server is a lie -- except for the link. Oh yes - the link really is a link, or at least the domain it points to is real. But that link has nothing whatsoever to do with your content - it is only pinged at you to fool your server into showing it as a trackback link which will notch up the google page-ranking of the domain being linked to.
What does it look like?
Nasty, usually. When I first started noticing trackback spam in our own site, the title and excerpt were non-descript - usually just quoting our own content. Thus they were hard to spot unless you looked at the URLs - which were often conspicuously non-legit. Since then the spam we get has gotten worse in its offensiveness - the title and excerpts are explicit andwould outrage many visitors. On the other hand, this makes them easy to spot if you happen to set eyes on their tityles or excerpts.
Why do they do it?
TB spammers do what they do as part of an ongoing cat-and-mouse with goggle's engineers, by creating large numbers of spurious links to their nasty no-no sites which look real to google because they appear in respectable sites. They prefer to place their links in sites with high page-ranks, of course - though they are not afraid of upping your own page-ranks with spurious links into your content from the other acres in their link-farms.

I am not sure whether the spammers' link-farm links actually help to raise their page-ranks. I guess it likely does in some cases - enough for them to sell their services to the owners of the sites they point to. They seem quite happy to have a low impact from each link they place- though one horribly link from them can disfigure your content and shock your users.

What can we do about it ?

Monitor it closely - KNotes will make this easy
Step one is to make sure that you notice spam trackbacks before your users do. Since the links are buried deep within your older content, they are hard to notice unless you look for them. So we have been working on special RSS feeds which will notify you of every trackback coming into your site ( or blog...) so that you can quickly notice offensive material. We run trackback monitoring feeds in our own older content, and will refine these and deploy them into KNotes very soon. We are working out policy issues - for instance we feel these ought to require manager-role authentication. We have also written a new template for viewing all trackbacks within a content level through the web TTW; this will be deployed into the distribution this week.
Make it easy to delete them
Having noticed a spam link, it needs to be very easy for a manager to get rid of it. To that end, we will be adding delete widgets and making sure these are accessible wherever you might spot a trackback. The first step will be to add delete to the TTW listing (this is almost done). The next step is to render delete-me links next to the display of all trackbacks within blog content, recent trackbacks, and even RSS feeds. These would call quick and simple delete-one-item or delete-a-list scripts. Of course, you would have to havrr manager role over content to do this -- as a policy I reckon this can include the poster of an entry being linked-to even when the poster has only contributor role.
Make it easy to report them
A well as rendering delete-this-trackback links for managers, we should have links written next to trackbacks for reporting offensive material. We have to be careful that these links do not themselves make an entry poin for spam - as a first pass I'd be inclined to have them written in javascvript so that sniffers and scrapers cannot see them. For an example of this kind of report-this-trackback link, see the Guardian news blog (blogs.guardian.co.uk/news and open an entry). Again, there are policy decisions to be made here, since it is important to balance ease-of-reporting against avoiding fase-reports and at all costs avoiding spammed reports.

We've got most of these ingredients in place in our test server now, and will be finishing and refining and deploying in the next two weeks, ahead of KNotes publicised release.

Plase note that at this time we are not planning or recommending a textual-analysis or blacklist approach to automaically rejecting trackbacks. Before resorting to the heavy weapons we thought it best to try the more focused, fine-grained approach :O)


Mike Malloch; 06-September-2005 07:28:21; forum (0) help

Comments please

If you are already registered here, please click the "Login" button to send your username/password with the comment. Click the "Anonymous" button to leave a comment without logging in.

Please tell us who you are

E-Mail Address (Required)
We need a valid email address in order for you to post a comment. You will recieve an email containing a special validation link. The comment will not be published until validated
Name
Please leave your name
Title
Lead-in
Body Text ( HTML tags are allowed )
Validation
Please enter the text from the image above
Preview your comment

Linking and trackbacks

When linking to this weblog entry, please use the 'permalink', which is http://www.knownet.com/Members/mmalloch/blog/entries/8228256372

Some weblog systems will ask you for a "trackback link" (most systems will find this special 'hook' automatically, in the code for this page).

The trackback link for this entry is http://www.knownet.com/Members/mmalloch/blog/entries/8228256372/tb