Monday, May 30, 2011

How To Make Your AJAX Site Crawlable: 3 Simple Steps

One of the most important aspects of any search engine optimized website is that its content is accessible to search engine bots.  If your site is using javascript there is a good chance that some of your content is not being indexed.  This will result in lower rankings on google and other search engines.

A Google Solution
Luckily Google has a specification for Making AJAX Applications Crawlable.  My implementation of the specification is outlined in three steps below:

1) Hash Fragments That Begin With !

The first requirement is that all javascript-based links (i.e., links that perform some ajax call and return content as a result) should contain '!' as the first character of the hash fragment:

//example Link from my site
<a href="#!about">about</a>

For history management I'm using jquery address with crawlable set to true.  You will need to make sure that your javascript code is able to appropriately handle links of this nature.

2) Create Static HTML Pages

In order for this solution to work you need some html to map to when  requests are made by Google-bot.  For each AJAX link from step 1, generate a separate html page.  For simplicity, name the html page whatever name you have given the hash fragment (i.e., #!about => about.html).  The content of each .html page should contain whatever results from clicking the AJAX link.  NOTE: These static pages will be used only for bots.

3) Handle "_escaped_fragment_" Server Side

There is a slight bit of ambiguity in Google's specification about mapping between #! and _escaped_fragment_.  All this really boils down to is that when Google-bot is scanning your site, urls with #! will get replaced by ?_escaped_fragment_=.  In other words, if you have a url containing #!<some_value>, you can be assured that when Google-bot attempts to index this url, a request containing parameter '_escaped_fragment_' with value '<some_value>' will be made to your server.  My primary jsp that handles all requests before doing anything else will look for this parameter and redirect if necessary as follows:


<%
if(request.getParameter("_escaped_fragment_")!=null) {
String escapedFragment = request.getParameter("_escaped_fragment_");
String decodedEscapedFrag = URLDecoder.decode(escapedFragment,"UTF-8");
response.sendRedirect(App.context()+decodedEscapedFrag+".html");
return;
}
%>



In the above code snippet notice that I am appending '.html' to the end of the decodedEscapedFragment.  This code takes the hash fragment and redirects to the appropriate static html page.  For instance, if the link clicked has href="#!about", then request.getParameter("_escaped_fragment_") will be equal to "about".  The resulting redirect will have a value of "/about.html"

Test It Out!
Now that you are finished, there are a couple ways to see if you're content is now accessible.  For each URL containing your new hash fragment, replace "#!" with "?_escaped_fragment_=", and make sure your static html page is returned by your server.  You can try this out at my website.  You'll notice that http://dsswebdesign.com/#!/about can be changed to http://dsswebdesign.com/?_escaped_fragment_=about, and the page looks the same.

A second way to test this is to add your site to Webmaster Tools.  Under diagnostics there is an option to 'Fetch as Googlebot'.  From there you can make sure that each of your #! links are reached successfully by Google-bot.

17 comments:

  1. Hi,
    cool article.
    I am currently playing around with jquery address and wanted to ask you a few questions.
    1.You wrote, that the links in the html have to look like "#!index" but on their demo pages the links all look like this "/crawling?_escaped_fragment_=%2F%3Fpage%3D%2Fgetting-started".
    I am a bit confused?

    2. The jsp code you use to redirect the requests - could it be written in inside a .htaccess? If so, how would it look like?
    Thnx!

    ReplyDelete
  2. 1) If you inspect the code from the JQuery Address Crawling demo (http://www.asual.com/jquery/address/samples/crawling) you will see that the href attributes begin with "#!" (ie, #!/?page=/getting-started). You should never see "_escaped_fragment_=" in the URL unless you request it explicitly. The example on the Asual site paints an unclear picture of how to make your site crawlable because it unnecessarily passes "_escaped_fragment_" to the server as a request parameter via javascript. The key points to remember about the google-bot is it will not execute javascript but it will send "_escaped_fragment_" request param when it finds hrefs with "#!". Therefore, your code should only include "_escaped_fragment_" on the server (it's not needed in your client code).

    2) I believe there is a way to specify redirects in .htaccess based on request parameters. You should be able to use a RewriteRule with a conditional that includes %{QUERY_STRING}. Have a look at this Apache documentation for more details: http://httpd.apache.org/docs/trunk/mod/mod_rewrite.html#rewriterule

    ReplyDelete
  3. Thanks for your reply!

    ReplyDelete
  4. This comment has been removed by the author.

    ReplyDelete
  5. Hey very good blog!!!! Wow... Gorgeous .. Amazing

    ReplyDelete
  6. HTML5 has a built kept in storage potential that allows it to store client databases off-line, cache data files and this is considered to be the most amazing features of HTML5 Development
    services.

    ReplyDelete
  7. Great info! Thank you for the post. Really it will b helpful for me. I really love to read such articles for you share different body of knowledge that people should know.


    web applications development

    ReplyDelete
  8. I always enjoy reading such posts which provide knowledge based information like this blog. There are many person searching about that now they will find enough resources by your post.
    website design

    ReplyDelete
  9. Thanks for making me aware as how I can make my ajax website crawlable, and that too in 3 steps. This is amazing.....
    Web Design Company India

    ReplyDelete
  10. I appreciate this post. In a very simple way, everyone can easily understand. In fact satisfy each client by providing them with a prominent market existence, Dynamic website development and management integrated with usable applications for greater client interactivity.

    web application developers

    ReplyDelete
  11. HTML5 Developmentoffers compelling benefits to users who would like to make their presence felt in a distinctive way in a world of web sites and technologies.

    ReplyDelete
  12. How To Make Your AJAX Site Crawlable: 3 Simple Steps is a so nice post. I like your post.Thanks for your post.

    ReplyDelete
  13. AJAX link from step 1, generate a separate html page. For simplicity, name the html page whatever name you have given the hash fragment (i.e., #!about => about.html)

    Keep doing you are doing great job...thanks!.................

    Toranto web designer cost | this is my rss

    ReplyDelete
  14. I went over this website and I believe you have a lot of wonderful information, saved to my bookmarks
    SignatureInfotech

    ReplyDelete
  15. Your website is really cool and this is a great inspiring article.
    web design company singapore

    ReplyDelete
  16. I think this is an informative post and it is very useful and knowledgeable. therefore, I would like to thank you for the efforts you have made in writing this article. Ajax Party Bus

    ReplyDelete