Our company has around half a million products which will be on our new site. Each one can be represented by it's own page / URL.
However, many products are near-identical to others - they form part of a 'family', with minor variation in detail from one to the next.
For example we could have say 200 products, with 20 fields, all identical apart from a different SKU and maybe one field with a different value.
We can't present these in any other way than as individual products.
Does anyone know if there are possible problems with search engines - e.g could this be seen as possible index spamming?
And if so, what could we do so as not to be penalised?
Last edited by KatDog; 11-06-2009 at 11:28 AM.
As far as I know, the real penalty for duplicated content comes into play when content is copied between two different domains. For instance, xyz.com has 10+ paragraphs of content that is exactly the same as what is seen on zyx.com. In any case, check out this post from Google on their Webmaster Central Blog, it demystifies the 'duplicate content penalty'.
If the pages look identical or near identical, Google will likely simply choose to serve only one version. The main "penalty" is that the PageRank/importance is spread out among near duplicate pages that Google is unlikely to serve. It's better to concentrate such traffic/links/PageRank.
If you can't consolidate the near identical pages, I see a couple options.
(1) Differentiate the pages more. Make sure the differentiation appears in the meta description of the page and near the beginning to the HTML body content, or near keywords users are likely to use to search for that product. Make it obvious to users and robots how near-duplicate pages differ.
(2) If the differentiation is more for internal purposes and isn't relevant to users, a better approach might be to use canonical meta tags. This tells robots that multiple pages should be indexed as one.
Excellent, thanks Memidex. (1) will be done partly as a matter of course and partly by the nature of the content, so that's fine. I'll look at the canonical tag - must admit that's a new one on me - looks promising. Cheers again.
I think the title tag is the most important factor and try to make them different using the variants of your main keywords or include your SKU there. This practice would help your site in on-page SEO, too.