Sunday, 26 February 2006

Technorati tag pages problems, revisited

It's well known that the blogosphere search engine Technorati sometimes doesn't show tagged posts on their tag pages even when they've been correctly tagged - I've blogged about these Technorati problems before (and that post has been mentioned e.g. on Om Malik's blog).

I still encounter that problem myself. For instance my recent post on Technorati favorites isn't on Technorati's Improbulus tag page, whereas it's on Icerocket's Improbulus tag page. So it's clearly something to do with how Technorati are picking up (or rather not picking up) my posts or tags from my posts, or how they're not displaying tagged posts properly.

Consistently with my previous experience, it's not that Technorati aren't indexing the posts at all - e.g. my Technorati favorites page clearly includes that post, so the post has in fact been indexed by Technorati. So it has to be that they're not picking up the tags from that post, or that their tags database doesn't return the correct info for certain tags (or tags from certain types of posts). I've gone into this in more detail before.

Now Niall Kennedy (then, though not now, of Technorati) commented that one thing Technorati consider important is how valid the behind the scenes HTML or XHTML of your posts - the less "valid", the less likely that Technorati will index them properly. But I'm sure that validation isn't it, in my case at least - I know my template throws up some warnings or errors as far as validation goes (though I've tried to make it as valid as I can within reaons), but that applies to all my posts; yet some are on Technorati's tag pages, and some aren't.

I've pretty much given up asking Technorati for help on this point as, despite their recently recruiting a new full time customer support specialist Janice Myint, my latest emails on this subject still go unanswered.

Now I'd rather not further hassle their no doubt extremely busy CEO David Sifry, who's been kind enough to help sort out a problem in the past when none of my posts were getting indexed on Technorati at all (they then tweaked something at their end and it was fine), though I do think this issue is something Technorati need to sort out if they want to maintain user confidence and trust in their service in the longer-term.

But I suspect there are a few limitations to Technorati's system which for whatever reason their competitor Icerocket doesn't share, though Icerocket certainly has issues of their own ("system" is vague, I know, but I don't know whcther it's their spider or their database or indeed tag searcher that's choking, so I'll just say "system").

Leaving aside the "validity" question (which I believe may be a red herring in my case), my guess is Technorati's system particularly doesn't like:
  • long posts (this post never got picked up properly for example, whereas Google's spider loves lots of text)
  • posts with lots of code examples (so, this post isn't on their tag pages - and it's long)
  • posts with forms or lots of other HTML that's not just text and links/pics (e.g. my original post on the problems!)
Now, it could be that the way Blogger handles posts with lots of more complex HTML, it translates the code into non-valid XHTML on publishing, and that could be why Technorati doesn't like them (though I think their spider is way too sensitive in that case).

But, I'm going to try an experiment. I'm going to split my previous post, which happens to be long, has lots of code examples and a form, into different individual posts, and republish them - then see which posts' tags get picked up, and which don't.

If you too have some posts not displaying on Technorati's tag pages, I'd be interested to know which posts, and are there any common factors, and do they fit any of the suspected criteria I've listed above?

And if your long post or post with forms etc doesn't show up properly on Technorati's tag pages, why not try doing a short post with no forms or any code other than links, but with exactly the same Technorati tags, which links to your "missing" post? That way the new post (assuming it doesn't get missed too) could at least be a way to lead people to the original post.

I'm about to follow my own suggestion too, as well as doing the split posts I mentioned. I'll of course report on the results of my experiment.

Update 13 March 2006: My test results are here, interesting but puzzling, and after I posted them Dave Sifry the Technorati CEO emailed me to say they're on it - see this post; if people regularly report this problem to Technorati when they encounter it, it might help them fix it faster.

Technorati Tags: , , , , , , , , , , , , , , , ,


David said...


As usual a great post. I'm sending this to all the folks a Technorati for required reading. We'll get to the bottom of your tag issues, and hopefully get everyone else sorted out as well...


mark said...

what kind of problems are you experiencing with IceRocket ?


Improbulus said...

Thanks very much David, it's great to know that you're looking at the tag issues. I'll report back on the results of my experiments when done but it's rather puzzling so far.

Mark - Icerocket didn't index my blog at all for a period of about a month (i.e. the whole of December 2005 pretty much) despite pings etc. Not just tags, but all my posts. Nada.

Anonymous said...

I noticed yesterday my posts stopped appearing on Technorati. I don't recall my blogs numbers, but it was like 312 something and 121 links... all GONE... checked my Technorati account profile and found my main blog has vanished... at least the name "Capital Region People" has... the rest of my info is there, but the blog titled is listed as "Untitled" and when I try to go in the admin panel and change it, the box where one would input the information does not appear.
AAAAAUUGGH! Always when I post something timely and relevant (about MySpace dangers)! I show up on Google's blogsearch, but that is not that widely used (I can tell by incoming links) so I have emailed T'rati and gotten back the 'form' responses.

QUESTION::: are we too dependent on Technorati? Are there alternative ways to get our blogs into public areas where they can be found immediately, that is, the same day a timely topic is posted?

Anonymous said...

Hey! back to say I looked for alternatives to T'rati, found them, implemented them, and at the same time, got email back from T'rati that they FIXED the problem...YAY! So, I am happy back with T'rati and happy to have discovered alternatives... never be without a backup! Never rely on just one thing, because if it disappears tomorrow, what then?

Improbulus said...

Thanks for your comments Dave. Very glad they've fixed your problem.

Out of interest, what alternatives did you discover? I'd recommend using Google Sitemaps to get your updated blog reindexed by Google - most of my visitors come from straight Google (as opposed to Blogsearch) searches.

Improbulus said...

Just to say I've updated my original post to say I've heard from Dave Sifry - see this post.

Vijay said...

I was having similar problems on Technorati.

I just dropped a not to their customer service of the two problems encountered by me

Two problems:
1. Technorati shows that my blog was last update 73 days ago. I must have written atleast 40 posts since 73 days.
2. I tag my posts regularly with tags that are quite unique to my blog such as "India-IT-Pulse". However when I search for the tag "India-IT-Pulse", Technorati says search results = 0. This happens for all tags that I use on my blogs.
Please advise.

Gotta see how soon the problem gets rectified.


Improbulus said...

Vijay, the key problem is that Technorati may not be indexing your blog (last update 73 days ago), but they've admitted they have an indexing issue - see this post.

If they don't index your blog (have you tried to see if a straight search of text - not tags - on Technorati turns up any recent posts?) then they won't index your tags either.

But if they've indexed your blog but not your tags, then it'll be the good ol' tags problem I've been banging on about for yonks, I'm afraid.