Making wget to bypass index.html file -
i trying download images this link. want download images hydraulics section, used --no-parent
, when run command
wget -r --no-parent -e robots=off --user-agent="mozilla/5.0 (windows nt 5.1; rv:31.0) gecko/20100101 firefox/31.0" -a png http://indiabix.com/civil-engineering/hydraulics/
it downloads index.html.
i searched issue on web, , stack overflow has 2 questions:
- wget downloads 1 index.html file instead of other 500 html files
- why wget download index.html websites?
but not help. started bounty on latter question, wonder if can suggest workaround in case?
quite simple:
- there no images on link provided.
the tiny icons ("view answer" etc.) part of css definition anchor (background-image). per now, wget not parse external css , pick images there.
with -a png wget stop @ first file (.html) since doesn't match.
i've succeded downloading with
lwp-rget --hier --nospace http://indiabix.com/civil-engineering/hydraulics/
the lwp cpan perl packages need installed: zypper se libwww
Comments
Post a Comment