Browser-based DAST fails to retrieve response bodies of cached requests
Problem
DAST hangs or errors when attempting to get the response body from Chromium when the page has transitioned to a new page. This only affects resources that Chromium serves from cache.
Underlying cause
DAST must stop sending the Chromium DevTools message Network.getResponseBody
to retrieve response bodies for cached resources. It's unreliable because DAST does not know if Chromium has transitioned to a new page at the time when it's called. An alternative solution must be provided.
Network.getResponseBody fails on previously loaded pages |
---|
network-get-response-body |
Proposal
Given that all Chromium cached resources have by definition been requested before, it stands to reason that DAST has previously intercepted the request before. DAST should be able to retrieve the response body from the DAST database, assuming that all HTTP messages are persisted.
At time of writing, DAST does not persist all HTTP messages, so a solution to this must also be provided. Only the response body is required to be persisted, not the entire HTTP message.
Response bodies are often large in size, so to prevent the same response body being saved multiple times, DAST persists only a single copy of each response body. This mechanism can be extended to persist every response body. When DAST calls Fetch.GetResponseBody
, the response body should be immediately hashed and saved to a response body repository (if not already existing, equality based on hash value). This will make sure all response bodies are persisted, and is reliable because Chromium halts while the persistence takes place (until Fetch continue is called). Performance is expected to be unaffected because this really only moves saving of response bodies to earlier in the crawl process. There may be some performance impact if Chromium is halted for long periods of time (unlikely).
Implementation plan
-
If possible, convert FetchEventHandler
and other event handlers to be injected using dependency injection. -
Create a response body store, this can be extracted from current logic in the store.HTTPResponseStore
. -
Inject the response body store into the fetch event handler. -
When the response is intercepted in fetch event handler, and the response body is returned using Fetch.GetResponseBody
, persist the response body using the store. -
When persisting a response body, also persist the request method and URL. -
When receiving a Network.requestServedFromCache
event,HTTPMessageService.FinalizeHTTPMessage
is called, and the container will not have a body for the response. If the response body is not in the browserker cache, search in the response body store for the body using the request method and URL. -
Network.GetResponseBody
should no longer need to be called in theHTTPMessageService
, and can be removed. -
May need to deal with cycle import issues between database and browserk namespaces. -
Remove the absurdly named builders with names such as WithHTTPResponseBodyHashHash