How Search Engines Work : Learn About Search Engines & Their working
It’s been 10 years since I wrote the second edition of a book about search engines called “Search Engine Marketing: The Essential Best Practice Guide”. It was a very big seller and, in fact, it carried on selling through to the beginning of 2010 when I took it offline.
I’ve decided to start this year by revisiting the chapter in the book about how search engines work. I’ve said many times over the years that, most books about SEO have a section called how search engines work. But rarely (if ever) do they describe the interdisciplinary approach to information retrieval (IR) covering mathematics, computer science, library science, information architecture, cognitive psychology, linguistics, and statistics – to name but a few.
Previously, I had written mainly about methods of manipulating rankings by keyword stuffing and other black hat type techniques of the time. But as I began to realize the importance of linkage data and even more so, link anchor text, I became more and more inquisitive as to what it was exactly that search engines used in their ranking technologies.
After talking to one of the pioneers in web search (Brian Pinkerton of WebCrawler) I was introduced to the work of foremost information retrieval scientist, Gerard Salton. This was a major breakthrough for me.
Salton’s work was cited in just about every IR research paper I read at the time. So I tracked down and bought a copy of his seminal work “Modern Information Retrieval” (written back in 1983, however Salton’s work in the field goes back to the early 1970s).
As a marketer, not a scientist, this was no easy read. Yet, as I began to grasp the basic concepts and drivers behind information retrieval (and the way it is applied to the web) the more I was able to understand the major challenges involved. And that led me to change, not just the amateurish and spammy techniques I’d used previously, but to thinking about SEO in an entirely different way.
And to this day, I still firmly believe that a basic understanding of the science of information retrieval on the web goes a long way toward helping search marketers dispel myths and do their jobs more professionally and proficiently.
Of course, 10 years later my personal library has grown to include a very large section of information retrieval and data mining texts as more and more become available. This is also largely due to the fact that the subject matter is so fascinating it’s hard not to become engrossed.
As I revisited the chapter I’d written on how search engines work from a decade ago, I expected it to be a bit stale, but it wasn’t at all. Although, I dare say, to an IR scientist, if not stale it probably seems about as elementary as it gets. I wrote the chapter placing great emphasis on trying to make it non-mathematical. By that I mean highlighting concepts and background theory rather than matrices and formulae. That said, it’s extremely hard to cover the subject without references to linear algebra and other mind-numbing math.
Anyway, if you’re genuinely interested in how search engines work (but really, not the anecdotal stuff generally bandied around) then it’s as good a place as any to start. I mention in the introduction that it is totally unchanged from the very quirky, very British flavor it had when it was first published. A few pages were eliminated purely because they were totally irrelevant a decade later. There are a few little gems in it which I’d forgotten about.
No the subhead above isn’t a typo or spelling mistake. It’s actually a conversation.
When the French author Victor Hugo had Les Miserables published, he was not living in Paris at that time. He was waiting to hear news from his publisher about the kind of reception his new book was having. When he could wait for news no longer, he sent a letter to his publisher which contained only the character: ?
On receiving this, his publisher knew exactly what it meant and he returned a note to him containing only the character: ! This let Victor Hugo know that his book was a huge success. It is said that this is the shortest correspondence in history.
What’s that got to do with anything? It was actually a good analogy I used relating to the length of the average query at search engines at the time and how difficult it is to deal with short queries.
I’m seriously thinking about trying to find the time to update the entire book this year and make it available free to Search Engine Watch and ClickZ subscribers. More recently I’ve been reading about a feature-centric view of information retrieval and also learning to rank for information retrieval and natural language processing (very hot research topics). I’ll be writing a couple of follow up columns covering these subjects combined with fascinating insights into the strength of end user data and, of course, weaving that into the update of the book once I get time to make a start.
But right now, feel free to download the PDF of “How Search Engines Work”. If nothing else, I hope it acts as a very basic introduction.
Better page titles in search results : Google Webmaster Central Blog Guidelines
Page titles are an important part of our search results: they’re the first line of each result and they’re the actual links our searchers click to reach websites. Our advice to webmasters has always been to write unique, descriptive page titles (and meta descriptions for the snippets) to describe to searchers what the page is about.
We use many signals to decide which title to show to users, primarily the <title> tag if the webmaster specified one. But for some pages, a single title might not be the best one to show for all queries, and so we have algorithms that generate alternative titles to make it easier for our users to recognize relevant pages. Our testing has shown that these alternative titles are generally more relevant to the query and can substantially improve the clickthrough rate to the result, helping both our searchers and webmasters. About half of the time, this is the reason we show an alternative title.
Other times, alternative titles are displayed for pages that have no title or a non-descriptive title specified by the webmaster in the HTML. For example, a title using simply the word “Home” is not really indicative of what the page is about. Another common issue we see is when a webmaster uses the same title on almost all of a website’s pages, sometimes exactly duplicating it and sometimes using only minor variations. Lastly, we also try to replace unnecessarily long or hard-to-read titles with more concise and descriptive alternatives.
For more information about how you can write better titles and meta descriptions, and to learn more about the signals we use to generate alternative titles, we’ve recently updated the Help Center article on this topic. Also, we try to notify webmasters when we discover titles that can be improved on their websites through the HTML Suggestions feature in Webmaster Tools; you can find this feature in the Diagnostics section of the menu on the left hand side.
Google Page Layout Algorithm Improvement : Google Algorithm Update
As we’ve mentioned previously, we’ve heard complaints from users that if they click on a result and it’s difficult to find the actual content, they aren’t happy with the experience. Rather than scrolling down the page past a slew of ads, users want to see content right away. So sites that don’t have much content “above-the-fold” can be affected by this change. If you click on a website and the part of the website you see first either doesn’t have a lot of visible content above-the-fold or dedicates a large fraction of the site’s initial screen real estate to ads, that’s not a very good user experience. Such sites may not rank as highly going forward.
We understand that placing ads above-the-fold is quite common for many websites; these ads often perform well and help publishers monetize online content. This algorithmic change does not affect sites who place ads above-the-fold to a normal degree, but affects sites that go much further to load the top of the page with ads to an excessive degree or that make it hard to find the actual original content on the page. This new algorithmic improvement tends to impact sites where there is only a small amount of visible content above-the-fold or relevant content is persistently pushed down by large blocks of ads.
This algorithmic change noticeably affects less than 1% of searches globally. That means that in less than one in 100 searches, a typical user might notice a reordering of results on the search page. If you believe that your website has been affected by the page layout algorithm change, consider how your web pages use the area above-the-fold and whether the content on the page is obscured or otherwise hard for users to discern quickly. You can use our Browser Size tool, among many others, to see how your website would look under different screen resolutions.
If you decide to update your page layout, the page layout algorithm will automatically reflect the changes as we re-crawl and process enough pages from your site to assess the changes. How long that takes will depend on several factors, including the number of pages on your site and how efficiently Googlebot can crawl the content. On a typical website, it can take several weeks for Googlebot to crawl and process enough pages to reflect layout changes on the site.
Overall, our advice for publishers continues to be to focus on delivering the best possible user experience on your websites and not to focus on specific algorithm tweaks. This change is just one of the over 500 improvements we expect to roll out to search this year. As always, please post your feedback and questions in our Webmaster Help forum.
SEO Tips and Tricks: Things You Should and Shouldn’t Do!
Search Engine Optimization (SEO) is a basic necessity for businesses to survive in the online marketplace. SEO is widely used by individuals and business to optimize their websites in order to achieve a prominent place on the search engine’s result page (SERP), thereby drawing the attention of users looking for products/services.
This technique serves to drive traffic to a website, improves its business revenue, and also helps to maintain / improve the SERP ranking. Although the concept and its implementation may seem quite simple to a novice in this domain, there is more to it than what meets your eye. Optimization process usually spans several steps right from analyzing your website current traffic and ranking, to optimizing keywords, content, website structure, links, meta-tags and building inbound links and social media presence.
It is mandatory that all these steps be well-planned before execution. Online marketing certainly has a wider reach than conventional marketing techniques and one wrong step is enough to mar your reputation with both search engines as well as your potential customers. You certainly will have to watch each step of your SEO journey. Here are a few tips to help you with your SEO campaign.
List of Must Do’s
Use relevant keywords in the right proportion to convey messages in a crisp meaningful manner. A single theme per page helps to communicate better with the user.
Titles, headings, content, anchor text for links, URLs and meta tags should all be keyword rich, and convey the nature of your business to the end user. The descriptive text used in these cases also helps the search engine to match user queries to your description and return your website on the SERP.
Make sure that all images /pictures on your website have descriptive, keyword-rich Alt tags. These are not only used by search engines to look-up results but are also displayed to the users if their browsers do not support the select image formats.
Generate quality inbound links to your websites from authentic external websites that already rank high on the SERP. These links are bound to drive traffic to your website and also serve to promote your business and ranking.
Ensure that file names corresponding to each of your web pages include relevant keywords in their names and use hyphens instead of underscores so that keywords are read correctly. Hyphens are read as spaces, while underscores are deleted when file names are processed.
Refresh your website content, update blogs and submit articles at regular intervals and this will force the search engine indexes to re-rank your website.
Create a good sitemap, update Robots.txt file to facilitate ranking, fix any broken links, blank pages, distorted images and such similar issues.
Opt for social media marketing, affiliate marketing and other online marketing techniques (paid and free), which also serve to improve your SERP ranking.
Continuously analyze, track and improve your rankings by constantly optimizing your content and keeping pace with the evolving SEO scenario.
Things to Avoid
Steer clear of verbose content that is stuffed with irrelevant keywords. Keywords when used in the right manner, improve your website ranking, however an overdose may prove fatal as the search engine indexing bots will black-list your site. Visitors to your website too are bound shy away because of the confusing content.
Do not ever resort to link / content / keyword spamming techniques or websites that promote such techniques to achieve a high website ranking.
Avoid plagiarized content as unique content alone is effective in capturing the attention of search engine and the users. Do not repeat same /copied content across different pages of your website, or in article / blog submissions.
Minimize the use of animations as they will not contribute to your website’s ranking even though they may appeal to users visiting your page.
Avoid multiple URLs to your website. A single main URL with traffic redirected from other URLs prevents your website from competing with itself.
Complacency may not really help you as the indexing algorithms of search engines are constantly evolving. You will have to keep pace with the latest guidelines and make sure that your website meets all the standard requirements in order to achieve a high ranking on the SERP.
By following these tips you will certainly be able to do a good job on your SEO implementation, without having to burn a hole in your pocket by opting for expensive services.
Alternatives to Google Analytics Website Analysis tool
Since its launch in 2005 Google Analytics has become almost unassailable in the world of website analytics, with 57% of the world’s 10,000 most popular websites using the popular site statistics suite.
Prior to the arrival of Google Analytics, the choices were largely between the inferior data of ‘server stats’ packages, lightweight 3rd party services or paying several hundred dollars a month for an enterprise level solution. Google Analytics brought powerful, accurate analytics to the masses and as site owners we lapped it up.
Tools such as Google Analytics give us the data to make smarter decisions about our websites and our businesses. Whether we are looking to increase traffic, improve conversions, conceive content ideas or do any of a myriad of other tasks, our analytics suite will often be the starting point.
However Analytics != Google Analytics. There are alternatives out there, and what’s more some of them are rather good.
Why wouldn’t you use Google Analytics?
The Ubiquitous Google
However there are growing concerns by many about the all-seeing and all-knowing nature of Google. For some the idea of one company controlling so many parts of their online operation is uncomfortable. For those working in the areas of online marketing that might require head-wear of a darker shade it just isn’t an option.
Complexity
One complaint that I hear frequently, particularly from new clients, is the complexity of Google Analytics. For the occasional user and those just seeking fast answers to simple questions that complexity can be off-putting. With Google Analytics version 5 likely to become compulsory early in the new year that complexity is only going to increase, which will undoubtedly be off-putting to some.
Feature Set
Finally there is the simple fact that you sometimes either need or want something a little different. Whilst Google Analytics certainly offers a lot, it doesn’t offer everything or suit everyone. Between them, the alternatives below offer a host of features that are not available in Google’s product. For some projects that might be just what is needed.
What alternatives are there?
Google Analytics fulfills different needs for different people, so your choice of alternatives will really be governed by what you are actually looking for from a solution. However there is little that Google Analytics offers that is truly unique, which leaves plenty of interesting alternatives to look at. I’ve picked 7 alternatives that should offer something for everyone.
1. Clicky
Clicky prides itself on being easy to use, in fact they confidently claim to be the easiest analytics service you have ever used. Ease of use often means simplicity and Clicky certainly doesn’t provide the depth of data that an expert Google Analytics user might expect, but to Clicky and its loyal users that is one of the key advantages! Most site owners never look at most of the in depth data and the addition of live data makes Clicky appealing to many.
Despite its claimed simplicity, Clicky does offer useful click-stream data, visitor level information that Google doesn’t share through analytics. Clicky is free for up to 3,000 daily page views with paid packages starting at under $5/month.
2. Mixpanel
Mixpanel is another package that headlines with real-time analytics, however it is their handling of conversion funnels that stands out for me. In particular; the ability to be able to create and analyse funnels retroactively in a way that is both quick and elegant.
Cohort analysis in Mixpanel allows you to track the retention of your product, graphing how often customers return after their initial visit – a great metric for building a stickier (and more profitable) site.
Mixpanel’s pricing structure is based on how many events you track, which might seem confusing to some. However they have a free package allowing you to track up to 25,000 data points which can be increased to 200,000 if you give them a footer link in return. For most sites that would be more than adequate to at least test this innovative offering.
3. FoxMetrics
FoxMetrics gives you the ability to track metrics that are specific to your business, in the form of events. Using their API you could for instance track software installs, newsletter views, media consumption or
almost any event that you can get to trigger an API call.
These events, along with more standard metrics, can be used to trigger personalization of your website based on user behaviour. Simple examples of this might be displaying a “subscribe” call to action to visitors who have read multiple pages, or a different banner to newsletter subscribers.
FoxMetrics offers a free package for up to 25,000 events and premium packages from under $10/month.
4. Open Web Analytics
Open Web Analytics is the open source community’s answer to Google Analytics and has a look and feel that will be rather familiar to many. Rather than being a hosted solution, OWA is a downloadable program that you install on your server. Whist this means some extra work at the outset it does also mean retaining control and ownership of your site’s analytics data.
In terms of features OWA does it’s best to mimic Google Analytics and covers the key features quite well. OWA adds tracking of mouse movements and visual heatmaps to the feature set which will be of use to those with a casual interest in usability. However the key selling point of Open Web Analytics is not really it’s feature set, but in offering a self-hosted and open source alternative to Google Analytics.
5. Kissmetrics
Many site owners will be aware of Kissmetrics thank to the excellent blog they run, yet I suspect far fewer have tried their analytics solutions. Kissmetrics tries to make analytics more personal by tracking, and allowing you to easily visualise, the user life-cycle. If you’ve ever found yourself wondering why some of your site visitors are so much more valuable than others then Kissmetrics allows you to drill down to see the behaviour of individual visitors on your site and how that has changed over time.
The other great draw of Kissmetrics is the ability to analyse conversion funnels in real time and produce clear visualisations of your site’s ability to convert visitors to whatever goals you set define.
Kissmetrics don’t offer a free service level, but their focus on actionable data should mean that the $30/month starting subscription isn’t too difficult to recoup for any commercial site.
6. Log File Analysis
Depending on your hosting set-up you might already have an alternative to Analytics installed and collecting data in the form of log files analysis. Webservers collect masses of usage data as default behaviour and many hosts include software to analyse this data for free to call their customers. Popular choices include AWStats and Webalizer.
The log file data that these packages analyse (and the way that they collect it), does differ from what you might be used to through the likes of Google Analytics, but will still give you valuable information on who is using your website and how they are using it.
One advantage of log file analysis programs is speed for the user. Because these solutions analyse data that your server is already collecting there is no overhead at all for the site visitor. Another aspect that appeals to some is that of data security. Because you are not generating additional data, and in particular not sharing that data with a third party, information about your site is kept safely on your server.
7. Website Tracking Tools
3rd party tracking tools are the precursor to modern analytics suites and share many similarities with them. In most cases you are using a embedded snippet of code to pass data to a third party service who collect, collate and present it in a meaningful way to you.
The line between what I would term “website tracking” and “analytics” is blurry at best and technically the terms are Synonymous. However there is a clear difference between the likes of Google Analytics or Kiss Metrics and Sitemeter or eXTReMe Tracking. Both groups of products deal with website visitors and their behaviour, however those that position themselves as Analytics tend to offer additional dimensions such as conversion tracking, segmentation and campaign tracking that are essential to the professional marketer.
However many site owners not only don’t want those features, but are actively put off by them. Being able to log in and see visitor numbers, most popular pages and what the last X visitors to the site did is all that is wanted and needed. For such projects website tracking tools such as eXTReMe Tracking, Sitemeter, GoStats or Statcounter are ideal solutions.
Which Analytics Package Will I Use Next?
With all these options available it does beg the question of which I will use on my next project? The answer to that will really be dependent on what that project is, but if I were a betting man I would put my money on Google Analytics. Whilst there are brilliant alternatives out there, Google Analytics does provide a solution that is ideal for most projects. It’s also the package that I (like most people) are most familiar with, so the one that I can pull actionable data from most quickly.
However there are undoubtedly times when Google’s offering is not the best analytics product for the job and in those cases I am more than happy to turn to one of the options above to understand my site visitors better.