Making wget to bypass index.html file -

July 15, 2010

i trying download images this link. want download images hydraulics section, used --no-parent , when run command

wget -r --no-parent -e robots=off --user-agent="mozilla/5.0 (windows nt 5.1; rv:31.0) gecko/20100101 firefox/31.0" -a png http://indiabix.com/civil-engineering/hydraulics/

it downloads index.html.

i searched issue on web, , stack overflow has 2 questions:

wget downloads 1 index.html file instead of other 500 html files
why wget download index.html websites?

but not help. started bounty on latter question, wonder if can suggest workaround in case?

quite simple:

there no images on link provided.

the tiny icons ("view answer" etc.) part of css definition anchor (background-image). per now, wget not parse external css , pick images there.

with -a png wget stop @ first file (.html) since doesn't match.

i've succeded downloading with

   lwp-rget --hier --nospace http://indiabix.com/civil-engineering/hydraulics/

the lwp cpan perl packages need installed: zypper se libwww

Search This Blog

O9

Making wget to bypass index.html file -

Comments

Post a Comment

Popular posts from this blog

java - How to specify maven bin in eclipse maven plugin? -

single sign on - Logging into Plone site with credentials passed through HTTP -

php - Why does AJAX not process login form? -