Crawljax

Crawling Ajax-based Web Applications

Fork me on GitHub

About

Crawljax is an open source Java tool for automatically crawling and testing modern web applications. Crawljax can explore any JavaScript-based Ajax web application through an even-driven dynamic crawling engine. It automatically creates a state-flow graph of the dynamic DOM states and the event-based transitions between them. This inferred state-flow graph forms a very powerful vehicle for automating many types of web analysis and testing techniques:

  • Test generation
  • Invariant-based testing
  • Non functional testing (Accessibility, validation, I18n, security, …)
  • Detecting broken links/images/tooltips
  • Detecting unused code, and much more…

Crawljax can easily be extended through its easy-to-use plugin architecture.

History

The idea of crawling Ajax-based web applications started in 2007 as part of the PhD research of Ali Mesbah working with Arie van Deursen. The initial idea, written in a technical report titled Exposing the Hidden-Web Induced by Ajax, was to automatically generate a static linked mirror of the dynamic DOM states, to make the web application accessible to search engines. Later in 2008, the event-based crawling technique was published at ICWE’08 and many applications for Crawljax began to emerge. Extending and using Crawljax for automatically testing modern Web 2.0 applications through structural DOM invariants was the next step, which resulted in an ICSE’09 paper. Furthermore, we have been using Crawljax for security testing (ESEC/FSE’09), regression testing (ICST’10), cross-browser compatibility testing (ICSE’11) and many other interesting domains of web analysis and testing.

General support and technical discussion

If you encounter any problems or have questions, don’t hesitate to use our mailing list, which can be found at:

http://groups.google.com/group/crawljax

Bug Tracker

If you find bugs or would like to propose a new feature, you can open an issue in our bug tracker, which can be found at:

https://github.com/crawljax/crawljax/issues