python - Using urlparse to remove a certain string? -
i have url:
www.domain.com/a/b/c/d,authorised=false.html and want convert
www.domain.com/a/b/c/d.html please note using python 2.7.
from urlparse import urlparse url = "www.domain.com/a/b/c/d,athorised=false.html_i_location=http%3a%2f%2fwww.domain.com%2fcms%2fs%2f0%2ff416e134-2484-11e4-ae78-00144feabdc0.html%3fsiteedition%3dintl&siteedition=intl&_i_referer=http%3a%2f%2fwww.domain.com%2fhome%2fus" o = urlparse(url) url = o.hostname + o.path print url returns www.domain.com/a/b/c/d,authorised=false.html don't know how remove authorised=false part url
import re print re.sub(r',.+\.', '.', 'www.domain.com/a/b/c/d,authorised=false.html') # www.domain.com/a/b/c/d.html
Comments
Post a Comment