Spoofed Referral Traffic in Google Analytics

The contined spoofing of referral traffic in Analytics highlights a couple of things:

  • Shortcomings in one of Google’s flagship products
  • The shift away from old-skool SEO for spammers to more subtle ways of gaining traffic

My hobby site (weirdisland.co.uk. Go visit it now. Please) – even with its paltry visitor numbers (just shy of a couple of hundred per day) gets a small but noticeable trickle of traffic from fake sources such as:

  • Semalt.com
  • buttons-for-website.com
  • ilovevitaly.com
  • swagbucks.com
  • priceg.com
  • darodar.com

These are covered in good detail over at Refugeek and by Dave Buesing (both sources have some good tips for removing these sites from appearing in Analytics if you want clean, realistic visitor numbers).

The basic method relies on the fact that Analytics can be spoofed – tricking the unwary visitor into thinking they are getting actual human visitors from sources. In fact, these are just faked visits by bots posing as browsers and passing through false headers.

The motivation seems to be (as far as I can tell) to get site visitors to visit these sites to see where their link is. Personal example: I started getting traffic from Semalt.com and visited their site to see where/how/why they were linking to me. I couldn’t find anything, but noticed that they had some on-the-face-of-things useful SEO tools. I signed up for a ‘free account’ and then promptly forgot all about them, but they still send me emails asking if I want to upgrade to their pro package.

It’s a cunning sleight of hand when you look at it this way. In an easily scalable way, they can effectively drive reasonable levels of traffic to their site by bringing themselves to the attention of anyone with Google Analytics installed. Once those people are on semalt.com, the bait and switch takes place, and a certain number of people will thus sign up to their product. I imagine it’s probably profitable.

That’s obviously deceitful practice, but highlights how the nature of scamming has changed. As Google has made it harder and harder to spam the SERPs, so innovators/black hats (delete as per your prejudice) are looking for new routes.


A current fake referrer to my site disguises itself as Huffington Post. At first, I was briefly excited – perhaps I’d got a link from HuffPo! In fact, the referral itself was spoofed: the Huffington Post link – when clicked in Analytics – actually redirected to some Chinese shopping site, presumably dropping some affiliate cookies along the way to capture revenue from me should I ever do any shopping on Aliexpress.com (which is where the link actually redirected).

Update: on closer inspection, I’ve noticed that the URL is actually “hulfingtonpost.com”, which also explains how the redirect works.

It’s cunning stuff, to be sure, but I find it hard to believe that it’s a sustainable or large enough niche for anyone to make more than a few quid from. As I mentioned a couple of posts ago, it adds to my belief that black hat/affiliate sites are finally being shuttered by Google and the glory days of such operations are now behind us.

As such, we should actually tip a hat to Google in thanks. For many years, spammers and scammers tried – and succeeded – in keeping the SERPs cluttered with affiliate links dressed as content. Google announced their intention to do away with this years ago and now – if you want to go down that route – you have to go big on site quality and content. Of course, the high price of doing that makes most affiliate programs unsustainable because building the necessary traffic levels can’t simply be left to content spinning and xrumer any more.


Keyword Data back in Analytics

One of trad SEO’s biggest gripes for the last couple of years is the obscuring of keyword data in Analytics. Of course, much of that data has actually been available in Webmasters Tools for quite a while now


Until today (so far as I’ve seen – it’s probably been rolled out all over the place in stages) the nearest equivalent data in Analytics was found under the Acquisition > Keywords > Organic screen.

But now? That’s gone, and the data from GWT is showing up in Analytics


This is a nice move, as it puts back a little context into the job rather than educated guesswork based on landing page URLs. It still means a bit of legwork if you want to do detailed analysis but for most SEO purposes it is a long-overdue move. The only critical issue with this is that assuming it follows the pattern used in Webmasters Tools the data will only be available for the last 90 days, and won’t include the last 2 days – which will obviously cause some limitations in analysis.

Schema: Addenda

Schema was (you may recall) backed by Microsoft, Yahoo (in the days when it still had its own search tech) and Google. As I’ve described, I’m not seeing anything in Google’s treatment of the site beyond the adoption of aggregate ratings in the SERPs. However, in Bing (and thus Yahoo!) the site has seen a positive bounce in traffic and (one assumes) ranking.

Sadly, 114% of all searches in the UK are done on Google, so don’t expect any sudden transformation.

Is Blackhat SEO Dead?

I’m not plugged into the SEO grid any more these days. At my end of the market, there is very little point in engaging with anything remotely dodgy and much of the work is curatorial or carefully technical. From time to time though, I descend from my ivory tower to pop onto the blackhat forums to seek for interesting snippets that might inform decisions we take inside the business.

And I can’t remember the last useful lesson I took away from these forays.

Example: at one time, Bluehatseo was a must-read site, packed with interesting ways to leverage content and build at industrial scale. It wasn’t something I ever did myself, but it gave me ideas and also meant I could talk sensibly with the more aggressive side of the SEO community (god, I hate that phrase). It also seemed to work. Whether or not Eli was kidding us all, I knew anecdotally of several people making a good living on the margins of Google – moving from market to market, building one from the other till their second incomes became their first incomes and even living off the results of their affiliate schemes.

I don’t get that vibe any more. It seems that times have changed – perhaps even that Google have won their long war of attrition against the “spammers” (as they defined them). You still get the odd lonely ranter complaining in the comments under every Searchengineland blog post about how crazy it is that some of their pages have lost traffic while others have gained, but somehow you know that their “25% drop” in traffic means a dip from 18 people to 14 people or whatever.

I had the honour of working alongside Dave Naylor at Bronco – a one time King of the Black Hats whose ability to spot and exploit a hole in Google’s algorithm was peerless. Today I don’t think he’d touch black hat with a bargepole – not merely because he now occupies a different space, but because the margins just aren’t there any more. Even while I was at Bronco, at least half the work coming in was from people trying to escape from under penalties they’d brought down on themselves.

(As an aside: get me that job at Searchengineland that consists purely of rewriting each Google announcement and transcribing their Webmaster videos – that’s some serious value-add right there, my friends)

And you know what? I welcome that change. I rarely visited an affiliate site and felt enriched by the experience. It annoyed the hell of of me to sit next to my wife while she was shopping and to see her going to click on what would clearly be an affiliate site before trying to find what she actually wanted.

And as an SEO, what could be worse than negotiating link prices from a faceless Estonian blogfarm owner?

Of course, the legacy of the spam wars is still with us. There are still bots mindlessly plugging Ugg boots on comment threads everywhere (I conceded defeat on my own blog recently and installed Disqus) and people buying and selling links by the thousand, but the more I look the more it feels like these are the last shots in a war that has concluded. a sort of digital version of the Continuity IRA.

I know I have (by approximation) zero readers, but if you are a blackhat making good dollar from it as we turn the corner into 2015, I’d be interested to hear about it.

Revisiting Schema

I have always been pretty dismissive of Schema. To me, it was and is an exercise in futility. The number of webmasters with the time, knowledge and inclination to enact Schema tags is a tiny fraction of the publishing audience, and thus its impact was never going to change the face of the world.

In addition, Schema itself is pitifully incomplete and actually retrograde, in that it attempts to force responsibility for telling search engines what a page is about onto site owners, rather than forcing the search engines to get better at what they’re supposed to be getting better at.

There’s also the notable potential side effect of Google slowing taking “ownership” of information away from sites. Google’s “Knowledge box” has, for some time now, slowly been taking traffic away from Wikipedia. Once Google “knows” something (the classic example being the height of the Eiffel Tower) it increasingly displays that information itself up front and centre rather than merely giving you a bunch of links to parse for yourself.

And actually, if you are sat in Mountain View that makes sense. The height of the Eiffel tower is known and is exactly the kind of thing that people could try to “spam” to get some AdSense revenue. Once you’re confident you have the right answer in your “knowledge box” database why run the risk of polluting your own reputation by sending people to potentially disreputable sources through the SERPs?

Everyone has experienced that moment when you’ve Googled something, clicked the first link and taken the answer as true, only to find later on that it was actually just rubbish. Not for nothing is Yahoo! Answers a poisoned chalice.

All of which brings us down to the subject of Schema. Schema is sold as a way for you to add a structure to your site to allow Google to get that knowledge directly, thus contributing to their knowledge graph.

As an SEO by trade, it would be remiss of me not to be dabbling with it to see what benefits and potential pitfalls it could have.

Enacting Schema

Firstly, it is worth pointing out that Schema isn’t brilliantly documented. To say it is backed by some of the biggest names in tech, it is (ironically) presented in such a way as if the internet hasn’t changed since 2004. It is text-heavy. There are no walk-though videos explaining the potential benefits. In this sense, it reminds me a lot of the W3C website – probably appealing to geeks, but lacking the sense of ‘fizz’ that is necessary to draw in the casual users who will absolutely define the success or failure of Schema or web standards (I once wrote at length about why there is no such thing as “web standards” but my ex-employer has deleted the post – I will revisit the subject in the future).

As such, the technically-minded can pick there way through to discover what it is you have to do to enact Schema. It’s basically adding a load of additional attributes to HTML elements and (disappointingly) adding additional <span> tags around things to fulfill Schema’s structure.

To see some examples, check out the <a href=”www.weirdisland.co.uk/people/murders/peter-sutcliffe-the-yorkshire-ripper.html”>source code of this page</a>. Apologies for my poor coding standards overall – it’s been many years since I considered myself to be a developer. As you can see, there are various additions to the code like:

<span itemscope itemtype="http://schema.org/Answer" itemprop="suggestedAnswer"> and <div itemprop="aggregateRating" itemscope itemtype="http://schema.org/AggregateRating" id="rating">

That sort of thing.

Obviously, it’s fairly trivial to add additional bits of code, but it’s another layer of work to add to your to-do list and does add to nesting and tag redundancy, which runs counter to everything we’ve been told to strive for for the last decade.

And as such, it must compete with other less trivial matters like writing content, maintaining a database, running plug-ins, refreshing the design, promotion etc. So I imagine that unless there is a strong imperative, deploying Schema is going to be well down the list.

Secondly, while deploying the bits of code necessary to enact Schema is fairly trivial, understanding the way that a Schema ‘object’ is constructed is often very frustrating. In my original critique of Schema, I harangued the void about the limited range of things available. The Schema for ‘person‘ for example barely touches on what a person can be and is heavily skewed towards the professions.

And just try understanding why “diet” is a property of “person”.

The best way to actually test your Schema code is basically trial and error and constant testing through Google Webmasters Tools structured data tester. Some of the “errors” are baffling – for example when you are told that an “event” object must be in the future (a proviso I got over by simply ignoring it).


So. Having “done” Schema for my site, what are my findings? Honestly, it’s hard to tell. Positive aggregate star ratings always look pretty attractive in the results, so I have no doubt that my primitive voting system allied to Schema is helping to improve my clickthrough rates. Aside from that though? I can’t  say there’s been any notable benefit. I am deliberately not promoting the site as part of my experimentation, beyond automatically tweeting each new post and contributing a couple of comments to threads on other sites. As such, there is little wonder that my traffic remains below 150 visits a day (itself a riposte to those who would have you believe that active SEO promotion is a waste of time.) Since enacting Schema there has been no noticeable leap in the gentle upward slope of traffic, so claims that “doing Schema” is going to transform your website’s performance in itself are probably misplaced.

Nonetheless, enacting Schema has made me think more deeply about the way that data is structured and how I build content. I couldn’t recommend it in all good conscience, but as part of a broad effort to give Google what it claims to want, it is probably a tick worth having if your data in any way fits into any of the available schemas.

And, at the back of it all lurks the suspicion about what happens if all your markup and the trust it helps to build leads to your site getting highjacked by Google itself. For argument’s sake, let’s pretend that my article on the Yorkshire Ripper becomes the definite oversight. With so much Schema data in there – geo co-ordinates and dates of his attacks, properly attributed images, factually correct dates etc – Google could, unilaterally, decide to take that info as gospel and simply pull it through into their knowledge box. And what then for my traffic…?

In conclusion. As an exercise, Schema is worth thinking about and experimenting with, but as a long-term venture it comes with risks that probably at least equal the potential benefits.

Fare well, Google Authorship…

So it’s a fond-farewell to the occasional little portrait that accompanied things you submitted via or wrote on Google+. It’s officially dead in the water

I’ll just pause here for a moment for you to dry your eyes.

The whole thing was always a little bit shonky. The take-up was low, the benefits seemingly minimal,  and it became yet another thing used solely by SEOs to try and improve their rankings in organic listings.

In some ways, this highlights once again the shortcomings of Google’s mission to ‘organise the world’s information.’ It was easy enough to set up a profile if you could be bothered, but doing so didn’t somehow magically confer authority on either you or your content. In effect, some no-mark from Leeds like me could get their fizzog into the rankings alongside Polly Toynbee or Robert Scoble or whoever.

But that was just an attribution and a tiny sprinkle of glitz in the SERPs. It didn’t make you suddenly an expert in whatever you were talking about. It wasn’t a signal of quality or…. anything really. Just occasionally an extremely mild tingle of delight at seeing your face (or that of a friend) in the rankings.

And in a way, this only further serves to highlight the problem that Google has in the social space. I predicted (in a spirit of larkfulness) that Google+ would be dead in the water by 2013. I was obviously wrong, but only by a matter of time. Google cannot attract content users to seriously engage with its space. The game has been lost, and really all that is left is a disorganised retreat. The abandonment of Google Authorship is merely a waymarker on that long and dismal road.

Random thoughts about ‘brand’ in the online space

In one of his regular broadcasts, Matt Cutts mused on the problems of ‘real world’ companies competing in the online space in response to a question posed by one of his viewers. It’s a problem as old as the commercial web – and something of a philosophical conundrum for Google and marketers alike: if a big, household name exists in a market, should it ‘naturally’ get traffic from Google, even if its online presence is poorly built, optimised and/or marketed?

Most obviously, this is reflected in the nature of what search marketers like to call ‘brand signal’. If a company receives hundreds of thousands of searches for its brand name then surely that site should do well for the products it sells, almost regardless of how well the site is built from a technical perspective?

I’ve seen this in effect myself in a previous role. The company we were doing work for had around 2 million searches for their brand name every month. The site itself was appallingly built – with duplicate content issues, constantly spiralling redirects and broken internal links and riddled with empty pagination and search filters. Despite that, we could rank the site for hugely competitive two and three word head phrases with a relatively small amount of spadework. The conclusion? Branding works in Google, with sufficient volume.

Tangled up in this is the whole, quasi-religious debate about exact match domains: if someone searches for ‘cheap car insurance’ are they looking for cheap car insurance, a company called cheap car insurance or cheapcarinsurance.com? The vagaries inherent in this line of thought lies behind a lot of zig-zagging debate and attendant strategising.

At the opposite end of the spectrum, you also have companies (such as my own) which don’t actually have any ‘real world’ presence but are online-only brands. One such brand that I am aware of is motors.co.uk – which is a used car listings site, owned in the past by the Daily Mail Group and currently under the auspices of Manheim – mainly of note to me as a bellweather for the industry as it has always been a huge online brand.

I am not privy to what promotional work Motors have been engaged in but I do know that they spent millions in the past on radio, local advertising, offline promotion and print work – as well as untold sums in content production and website development. For the past couple of weeks however, they have ceased to rank for their own brand name.


In SEO terms, this is a colossal slap. I don’t know what it is that Motors have ‘done’ in Google’s eyes to deserve this – most likely some ancient linkbuilding campaign has come back to bite them (a recurring problem I have alluded to here before in relation to our own site). Doubtless there is a huge disavowal exercise going on behind the scenes now to recover what they’ve lost. Hopefully for them they will extricate themselves from that particular hole.

This illustrates the flipside of the ‘big company with poor web presence’ paradigm, namely: ‘a company with nothing but a web presence’.

Assuming that this evident penalty has struck motors.co.uk across the board in relation to their SEO, I can only assume they’re suffering from a big drop in traffic and thus revenue. Luckily, they are backed by Manheim, so will have resources to weather the storm  – but many companies aren’t so lucky. Such are the complications of Google’s algorithm (and the competing internal imperatives of Google as a business in and of itself) these days that I’m no longer sure that anyone can really claim to understand the market any more – regardless of the noise, fog and general sturm-und-drang of the SEO community. In the same market – and I will not name names – I know for absolute fact that some of the big players are spending £5-10,000 a month on aggressive link buying and haven’t (as yet) seen any penalisation.

The truth is that being wholly reliant on ‘natural’ search traffic is actually a dangerous place to be in. If your only focus is SEO I would strongly advise starting to siphon some of your revenues into other channels as a bulwark against potential penalisation – either in terms of building up a ‘fighting fund’ for a rainy day, or spend on social/offline channels to build up “brand traffic” as best you can.

None of these options is cheap.

Any way you look at it, the days when you could truly view the web as ‘a level playing field’ seem laughably distant today.