This page contains various non-HTTP(S) protocol links that should NOT be crawled as regular URLs. These links use special URI schemes like mailto:, tel:, javascript:, and data:.
mailto:info@example.com from being incorrectly resolved as /issues/non-http-protocols/info@example.comNo issues should be detected - all these protocol links should be filtered out during the link extraction phase.
Code location: apps/worker/src/crawler.ts:1604-1643 (extractLinks method)
The fix uses getAttribute('href') to get raw attribute values and filters out any protocols that aren't http: or https: before URL normalization.
To verify the fix is working correctly:
http://localhost:3030These should be crawled normally: