Ingest/Index static and dynamic web pages

New Contributor

What would the recommended method be to index/ingest standard classic HTML and client-side Javascript rendered web page content? Is there a native web crawler/indexer for "dynamic" web page content?  

4 Replies

@Search720 there's no built-in indexer for crawling web pages so customers often leverage an open-source crawler such as Apache Nutch to extract content from web pages. From there, you can land the content in a supported data source such as Blob storage/Cosmos DB/ADLS Gen2 and index it. You can also push the data directly to the index via the Push API as described here.


@Search720 You can use the Norconex HTTP connector for dynamic webpages.



We support Ukraine and condemn war. Push Russian government to act against war. Be brave, vocal and show your support to Ukraine. Follow the latest news HERE