Recent changes to bugs

Recent changes to bugshttps://sourceforge.net/p/pdftohtml/bugs/Recent changes to bugsenTue, 19 Jul 2016 00:37:33 -0000#94 Missing letters from converted PDFhttps://sourceforge.net/p/pdftohtml/bugs/94/?limit=25#d72b<div class="markdown_content"><p>Is this project dead? It is almost ten years and this is still not fixed. :(</p></div>Daniel BairTue, 19 Jul 2016 00:37:33 -0000https://sourceforge.neta77cf72f682e8c975e21732d3f875de0c1d5a253wrong XML open/close <a tag in xml generationhttps://sourceforge.net/p/pdftohtml/bugs/97/<div class="markdown_content"><p>The opening tag is <A while the closing tag is <a.<br /> This is not valid xml.</p> <p>Thank you for this great application by the way.</p></div>Louis HoeflerWed, 27 Oct 2010 15:17:49 -0000https://sourceforge.net55fb466a3cc9dd071751ce9b780fa158e5d5af85Fails to extract most of text from angled magazine columnshttps://sourceforge.net/p/pdftohtml/bugs/96/<div class="markdown_content"><p>When trying to extract test from angled magazine columns, leading words of paragraphs may be extracted, but most of the text is lost. Errors 'Illegal entry in bfrange block in ToUnicode CMap'</p></div>MBMTue, 12 Jan 2010 12:13:48 -0000https://sourceforge.netbb184ec952e5f9fd99e3b147c3ceb962dd8a2bd3Title property is not populated into all <title> tagshttps://sourceforge.net/p/pdftohtml/bugs/95/<div class="markdown_content"><p>I have seen this in many PDF documents, KPDF will show a title in the file properties that is not populated into all <title> tags, only the one on the .html file, but many people needs this in the _s.html file as well. When indexing PDF document, most people use the _s.html file as this is the one that has content, the .html file has only frameset. I have attached a document that is publicly available at info:http://www.scubenewmedia.biz/Modulo_domanda_riduzione_tasse_2008_.pdf (in case the document goes away some day)<br /> Versions: 0.33a and 0.36</p></div>Miguel Ángel VilelaWed, 04 Feb 2009 11:05:07 -0000https://sourceforge.net5d0ad28031b5b66679389aaae8acdcdea6778d3aMissing letters from converted PDFhttps://sourceforge.net/p/pdftohtml/bugs/94/<div class="markdown_content"><p>After converting a simple PDF, I find a number of letters are missing, most notably the second 'l' in double-l words like 'fellow'.</p> <p>As far as I can tell, the missing characters are represented normally if the PDF file. Certainly, I can copy and paste the relevant text from Acrobat into Text Edit? and the letters show up as expected.</p> <p>I have attached a one-page sample file that demonstrates the issue.</p></div>Kevin YankThu, 18 Dec 2008 04:59:56 -0000https://sourceforge.net4760546dc5b531d7c45e1f92bccd17c3438c96c1Building fails with GCC 4.2.3https://sourceforge.net/p/pdftohtml/bugs/93/<div class="markdown_content"><p>make[1]: Entering directory `/home/milanb/devel/temp/pdftohtml-0.39/src'<br /> g++ -g -O2 -DHAVE_CONFIG_H -DHAVE_DIRENT_H=1 -I.. -DHAVE_REWINDDIR=1 -DHAVE_POPEN=1 -I.. -I../goo -I../xpdf -I../fofi -I../splash -I -I/usr/X11R6/include -c HtmlOutputDev.cc<br /> In file included from HtmlOutputDev.h:22,<br /> from HtmlOutputDev.cc:31:<br /> HtmlLinks.h:22: error: extra qualification 'HtmlLink::' on member 'isEqualDest'<br /> HtmlOutputDev.cc: In member function 'void HtmlPage::dumpComplex(FILE*, int)':</p></div>AnonymousThu, 02 Oct 2008 16:19:02 -0000https://sourceforge.netbdb87227843acf978fe0cfd9d469fd1b278d16c5it crashes when filename has % in it.https://sourceforge.net/p/pdftohtml/bugs/92/<div class="markdown_content"><p>I try this on both 0.36 and 0.40a. When I put "pdftohtml -c filename%1.pdf" or "pdftohtml -c filename%1.pdf filename%1.html", it craches with following errors. No error if I rename the file to filename1.pdf.</p> <p>$ pdftohtml -c tb-%027.pdf<br /> Page-1<br /> Page-2<br /> Page-3<br /> Page-4<br /> Page-5<br /> Page-6<br /> Page-7<br /> Page-8<br /> Page-9<br /> Error: /undefinedfilename in --showpage--<br /> Operand stack:<br /> 1 true<br /> Execution stack:<br /> %interp_exit .runexec2 --nostringval-- --nostringval-- --nostringval-- 2 %stopped_push --nostringval-- --nostringval-- --nostringval-- false 1 %stopped_push 1889 1 3 %oparray_pop 1888 1 3 %oparray_pop 1872 1 3 %oparray_pop 1755 1 3 %oparray_pop --nostringval-- %errorexec_pop .runexec2 --nostringval-- --nostringval-- --nostringval-- 2 %stopped_push --nostringval-- 1761 0 5 %oparray_pop --nostringval-- --nostringval--<br /> Dictionary stack:<br /> --dict:1153/1684(ro)(G)-- --dict:0/20(G)-- --dict:93/200(L)-- --dict:65/75(L)-- --dict:18/25(L)--<br /> Current allocation mode is local<br /> Last OS error: 2<br /> Current file position is 38845<br /> GPL Ghostscript SVN PRE-RELEASE 8.61: Unrecoverable error, exit code 1<br /> Error: Failed to launch Ghostscript!</p></div>dealmakerSat, 28 Jun 2008 04:49:51 -0000https://sourceforge.net2390b69b02a43fc87b042b6bdb3b92c363a80171half-baked when convert pdf to xml using pdftohtml-0.39-win3https://sourceforge.net/p/pdftohtml/bugs/91/<div class="markdown_content"><p>I have been puzzled by the issue.When I use pdftohtml-0.39-win32 to convert pdf to xml (command:pdftohtml.exe -xml jstl-1_0-fr-spec.pdf test)<br /> then sometimes it can be converted perfectly,but mosttimes it is done partly and miss some information like:<br /> <text top="211<br /> tion</b></text><br /> <text top="143" left="168" width="23" height="10" font="23">key</text></p> <p>I try another pdf files, the issue remains sometimes.<br /> Anybody has the solution,contact me:<br /> huamarco@gmail.com<br /> Thank you very much :)</p></div>MarcoFri, 06 Jun 2008 10:28:38 -0000https://sourceforge.net68cd1f8608453d3064f0434d79f2e19d42e1926aFix for unquoted < > in XML/HTML outputhttps://sourceforge.net/p/pdftohtml/bugs/90/<div class="markdown_content"><p>In HtmlFont.cc, the HtmlFilter function does not quote " & > <, if returned from the mapUnicode function.</p> <p>I've patched that function to first map unicode and then quote the forbidden characters in xml.</p> <p>The method is marked with: // this method if plain wrong todo <br /> ... anyway.</p> <p>Probably related bug reports:<br /> 874352</p></div>jensqFri, 30 May 2008 11:52:13 -0000https://sourceforge.net9eeb46e62896925430f991462774c7d914694b0dletter case of closing tags in windows binaryhttps://sourceforge.net/p/pdftohtml/bugs/89/<div class="markdown_content"><p>I see this has been fixed in the CVS but for some reason this fix hasn't made it to the windows 0.39 release.</p> <p><A></a></p></div>Benjamin FadenWed, 14 May 2008 13:45:42 -0000https://sourceforge.net7de7eeb43f56a13547abcd9f3e16ba5f8c41f879