Python mechanize HTTP Error 403 request disallowed by robots.txt

When running following python script i get error “HTTP Error 403 request disallowed by robots.txt”

from mechanize import Browser

a = ['https://google.com', 'https://serverok.in', 'https://msn.com']

br = Browser()

for x in range(len(a)):
    br.open(a[x])
    print("Website title: ")
    print(br.title())
    print("\n")

To fix this, find

br = Browser()

Add below

Advertisement

br.set_handle_robots(False)
br.addheaders = [('User-agent', 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.77 Safari/537.36')]

set_handle_robots is used to disable robots.txt checking. Second line will set a User-agent, so remote won’t won’t block you as a robot.

Add a comment

Leave a Reply

Your email address will not be published. Required fields are marked *

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Advertisement