Wednesday, May 21, 2008

SEO-What are Google's design and technical guidelines?

Design and content guidelines
• Make a site with a clear hierarchy and text links. Every page should be reachable from at least one static text link.
• Offer a site map to your users with links that point to the important parts of your site. If the site map is larger than 100 or so links, you may want to break the site map into separate pages.
• Create a useful, information-rich site, and write pages that clearly and accurately describe your content.
• Think about the words users would type to find your pages, and make sure that your site actually includes those words within it.
• Try to use text instead of images to display important names, content, or links. The Google crawler doesn't recognize text contained in images.
• Make sure that your TITLE and ALT tags are descriptive and accurate.
• Check for broken links and correct HTML.
• If you decide to use dynamic pages (i.e., the URL contains a "?" character), be aware that not every search engine spider crawls dynamic pages as well as static pages. It helps to keep the parameters short and the number of them few.
• Keep the links on a given page to a reasonable number (fewer than 100).
Technical guidelines
• Use a text browser such as Lynx to examine your site, because most search engine spiders see your site much as Lynx would. If fancy features such as JavaScript, cookies, session IDs, frames, DHTML, or Flash keep you from seeing all of your site in a text browser, then search engine spiders may have trouble crawling your site.
• Allow search bots to crawl your sites without session IDs or arguments that track their path through the site. These techniques are useful for tracking individual user behavior, but the access pattern of bots is entirely different. Using these techniques may result in incomplete indexing of your site, as bots may not be able to eliminate URLs that look different but actually point to the same page.
• Make sure your web server supports the If-Modified-Since HTTP header. This feature allows your web server to tell Google whether your content has changed since we last crawled your site. Supporting this feature saves you bandwidth and overhead.
• Make use of the robots.txt file on your web server. This file tells crawlers which directories can or cannot be crawled. Make sure it's current for your site so that you don't accidentally block the Googlebot crawler. Visit http://www.robotstxt.org/wc/faq.html to learn how to instruct robots when they visit your site.
• If your company buys a content management system, make sure that the system can export your content so that search engine spiders can crawl your site.
• Don't use "&id=" as a parameter in your URLs, as we don't include these pages in our index.
What are Google's quality guidelines?
These quality guidelines cover the most common forms of deceptive or manipulative behavior, but Google may respond negatively to other misleading practices not listed here (e.g. tricking users by registering misspellings of well-known websites). It's not safe to assume that just because a specific deceptive technique isn't included on this page, Google approves of it. Webmasters who spend their energies upholding the spirit of the basic principles will provide a much better user experience and subsequently enjoy better ranking than those who spend their time looking for loopholes they can exploit. If you believe that another site is abusing Google's quality guidelines, please report that site at http://www.google.com/contact/spamreport.html. Google prefers developing scalable and automated solutions to problems, so we attempt to minimize hand-to-hand spam fighting. The spam reports we receive are used to create scalable algorithms that recognize and block future spam attempts.
Quality guidelines - basic principles
• Make pages for users, not for search engines. Don't deceive your users or present different content to search engines than you display to users, which is commonly referred to as "cloaking."
• Avoid tricks intended to improve search engine rankings. A good rule of thumb is whether you'd feel comfortable explaining what you've done to a website that competes with you. Another useful test is to ask, "Does this help my users? Would I do this if search engines didn't exist?"
• Don't participate in link schemes designed to increase your site's ranking or PageRank. In particular, avoid links to web spammers or "bad neighborhoods" on the web, as your own ranking may be affected adversely by those links.
• Don't use unauthorized computer programs to submit pages, check rankings, etc. Such programs consume computing resources and violate our Terms of Service. Google does not recommend the use of products such as WebPosition Gold™ that send automatic or programmatic queries to Google.
Quality guidelines - specific guidelines
• Avoid hidden text or hidden links.
• Don't employ cloaking or sneaky redirects.
• Don't send automated queries to Google.
• Don't load pages with irrelevant words.
• Don't create multiple pages, subdomains, or domains with substantially duplicate content.
• Don't create pages that install viruses, trojans, or other badware.
• Avoid "doorway" pages created just for search engines, or other "cookie cutter" approaches such as affiliate programs with little or no original content.
• If your site participates in an affiliate program, make sure that your site adds value. Provide unique and relevant content that gives users a reason to visit your site first.
If a site doesn't meet our quality guidelines, it may be blocked from the index. If you determine that your site doesn't meet these guidelines, you can modify your site so that it does and request reinclusion.
What is a WWW robot?
A robot is a program that automatically traverses the Web's hypertext structure by retrieving a document, and recursively retrieving all documents that are referenced.
Note that "recursive" here doesn't limit the definition to any specific traversal algorithm; even if a robot applies some heuristic to the selection and order of documents to visit and spaces out requests over a long space of time, it is still a robot.
Normal Web browsers are not robots, because they are operated by a human, and don't automatically retrieve referenced documents (other than inline images).
Web robots are sometimes referred to as Web Wanderers, Web Crawlers, or Spiders. These names are a bit misleading as they give the impression the software itself moves between sites like a virus; this not the case, a robot simply visits sites by requesting documents from them.
________________________________________
What is an agent?
The word "agent" is used for lots of meanings in computing these days. Specifically:
Autonomous agents
are programs that do travel between sites, deciding themselves when to move and what to do. These can only travel between special servers and are currently not widespread in the Internet.
Intelligent agents
are programs that help users with things, such as choosing a product, or guiding a user through form filling, or even helping users find things. These have generally little to do with networking.
User-agent
is a technical name for programs that perform networking tasks for a user, such as Web User-agents like Netscape Navigator and Microsoft Internet Explorer, and Email User-agent like Qualcomm Eudora etc.
________________________________________
What is a search engine?
A search engine is a program that searches through some dataset. In the context of the Web, the word "search engine" is most often used for search forms that search through databases of HTML documents gathered by a robot.
________________________________________

No comments: