c# - Extracting data from a large file with regex -
i have close 800 mb file consists of several (header followed content). header looks m=013;x=rast;645.jpg
while content binary of jpg file.
so file looks
m=013;x=rast;645.jpgnulœdüŠˆ.....m=217;x=rast;113.jpgnulÿñÿÿ&åbÿås....m=217;x=rast;1108.jpgnul]_ÿ×ÉcË/...
the header can occur in 1 line or across 2 lines.
i need parse file , pop out several jpg images.
since big file, please suggest efficient way? hoping use streamreader not have experience regular expressions use it.
regex:
/(m=.+?;x=.+?;.+?\.jpg)(.+?(?=(?1)|$))/gs
*with recursion (not supported in .net)
.net regex workaround:
/(m=.+?;x=.+?;.+?\.jpg)(.+?(?=m=.+?;x=.+?;.+?\.jpg|$))/gs
replaced (?1)
recursion group contents inside 1st capture group
live demo , explanation of regexp: http://regex101.com/r/nq3pe0/1
you'll want use 2nd capture group binary contents, 1st group match header , expression needs know stop.
*edited in italic
Comments
Post a Comment