<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Recent changes to feature-requests</title><link>https://sourceforge.net/p/htdig/feature-requests/</link><description>Recent changes to feature-requests</description><atom:link href="https://sourceforge.net/p/htdig/feature-requests/feed.rss" rel="self"/><language>en</language><lastBuildDate>Fri, 21 Nov 2014 22:28:15 -0000</lastBuildDate><atom:link href="https://sourceforge.net/p/htdig/feature-requests/feed.rss" rel="self" type="application/rss+xml"/><item><title>#91 separate layout from controller code</title><link>https://sourceforge.net/p/htdig/feature-requests/91/?limit=250#dd17</link><description>&lt;div class="markdown_content"&gt;&lt;p&gt;We feed the xml output into other apps so that they can consume the search data.  Each app has it's own output and cannot use the htsearch html output.&lt;/p&gt;&lt;/div&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">JT Moree</dc:creator><pubDate>Fri, 21 Nov 2014 22:28:15 -0000</pubDate><guid>https://sourceforge.net970138793ddfc3dd1edf41fb61c2f4e95696fa4d</guid></item><item><title>separate layout from controller code</title><link>https://sourceforge.net/p/htdig/feature-requests/91/</link><description>&lt;div class="markdown_content"&gt;&lt;p&gt;The htsearch uses very old school template generation.  I propose breaking the output into its two parts: data and html.  htsearch should ONLY return data that can be consumed by other services (such as a template engine to generate html).  I have hacked htdig to output XML but it still has deficiencies. Specifically, passing of parameters to htsearch is prone to error and difficult when url and shell encoding are needed for handling special chars.  I'll attach the changes I have made but the engine itself needs rewriting.  Perhaps a program htxml that takes all available arguments individually and returns xml output is warranted?&lt;/p&gt;&lt;/div&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">JT Moree</dc:creator><pubDate>Fri, 21 Nov 2014 22:26:35 -0000</pubDate><guid>https://sourceforge.net3700d25cb8c169ae14a541fe2b71b26f9b860226</guid></item><item><title>htstat gets sigsegv when there are no words in database</title><link>https://sourceforge.net/p/htdig/feature-requests/90/</link><description>&lt;div class="markdown_content"&gt;&lt;p&gt;# htload&lt;br /&gt;
DocumentDB::LoadDB: opening /var/lib/htdig/db.docs for reading: No such file or directory&lt;br /&gt;
WordList::Load: opening /var/lib/htdig/db.worddump for reading: No such file or directory&lt;br /&gt;
# htstat&lt;br /&gt;
htstat: Total documents: 0&lt;br /&gt;
htstat: Total words: 0&lt;br /&gt;
Segmentation fault&lt;/p&gt;&lt;/div&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Adam Tkac</dc:creator><pubDate>Thu, 12 Feb 2009 14:35:26 -0000</pubDate><guid>https://sourceforge.net332d2a01943b79dc8e56a33920e421558fe41b56</guid></item><item><title>Mediawiki wrapper for ht:/Dig</title><link>https://sourceforge.net/p/htdig/feature-requests/89/</link><description>&lt;div class="markdown_content"&gt;&lt;p&gt;We've developped a mediawiki extension that can be used as htdig wrapper (over htsearch CGI url). With it you can integrate htsearch in your wiki respecting its skins.&lt;/p&gt;
&lt;p&gt;details : &lt;a href="http://www.mediawiki.org/wiki/Extension:HtdigSearch" rel="nofollow"&gt;http://www.mediawiki.org/wiki/Extension:HtdigSearch&lt;/a&gt;&lt;br /&gt;
download page : &lt;a href="http://www.nozicaa.com/en/page.content/Downloads" rel="nofollow"&gt;http://www.nozicaa.com/en/page.content/Downloads&lt;/a&gt;&lt;/p&gt;&lt;/div&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Cedric Chantepie</dc:creator><pubDate>Thu, 03 Jul 2008 07:28:19 -0000</pubDate><guid>https://sourceforge.net5ae9a0429843e6b9893bbee8c04568816b479f2e</guid></item><item><title>UTF8 support</title><link>https://sourceforge.net/p/htdig/feature-requests/88/</link><description>&lt;div class="markdown_content"&gt;&lt;p&gt;UTF8 (and further unicode) encoding support for htdig and associated stuff.&lt;/p&gt;&lt;/div&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Cedric Chantepie</dc:creator><pubDate>Wed, 02 Jul 2008 19:32:26 -0000</pubDate><guid>https://sourceforge.net89dfba68e5c8d01fd2edbdc61b63fb41ce9436dd</guid></item><item><title>more META information recognized</title><link>https://sourceforge.net/p/htdig/feature-requests/87/</link><description>&lt;div class="markdown_content"&gt;&lt;p&gt;On the site of the French Mathematical Society article abstracts are indexed by ht:://Dig. The problem is that although these files are sometimes modified, they should appear in ht://Dig search results with their original creation date. &lt;/p&gt;
&lt;p&gt;To do this it would be nice if one could add an htdig-* keyword in the META tag for specifying the creation date of the document.&lt;/p&gt;
&lt;p&gt;thanks in advance&lt;/p&gt;
&lt;p&gt;Yannis&lt;/p&gt;&lt;/div&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Anonymous</dc:creator><pubDate>Mon, 07 Apr 2008 20:10:43 -0000</pubDate><guid>https://sourceforge.net029d6bc852e66f0def7d666ab2b842094b02f618</guid></item><item><title>search mediawiki </title><link>https://sourceforge.net/p/htdig/feature-requests/86/</link><description>&lt;div class="markdown_content"&gt;&lt;p&gt;Currently htdig digging on mediawiki pages are possible. The thing is that there are a lot of "Edit" buttons whose  open the page in edit-mode and htdig runs into senseles loops. &lt;/p&gt;&lt;/div&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Helmut Arnold</dc:creator><pubDate>Wed, 31 Jan 2007 07:52:52 -0000</pubDate><guid>https://sourceforge.net863903fb88f61bd66d4b2bd7d489b421b48a7a77</guid></item><item><title>Format epoch for sorting.</title><link>https://sourceforge.net/p/htdig/feature-requests/85/</link><description>&lt;div class="markdown_content"&gt;&lt;p&gt;Sort results by meta tag date in format epoch.&lt;/p&gt;
&lt;p&gt;Using the use_doc_date and gathering the meta tags date, but in the epoch format, not MM-DD-YYYY.&lt;/p&gt;
&lt;p&gt;Giving the option  to choose whatever sorting date you want in the config file.&lt;/p&gt;
&lt;p&gt;Thanks&lt;/p&gt;&lt;/div&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">John Madrid</dc:creator><pubDate>Mon, 18 Dec 2006 12:21:23 -0000</pubDate><guid>https://sourceforge.net8bcdd8f19c21642fa87dade1539b30c51353cc5d</guid></item><item><title>URL rewriting not quite adequate.</title><link>https://sourceforge.net/p/htdig/feature-requests/84/</link><description>&lt;div class="markdown_content"&gt;&lt;p&gt;I'm having a problem with some URL rewriting when it &lt;br /&gt;
comes to the domain-name part.  Some servers have &lt;br /&gt;
aliases which are "partially separate" virtual hosts, &lt;br /&gt;
where some URL-paths are the same regardless of domain &lt;br /&gt;
actually used, while others are not.  For example, all &lt;br /&gt;
files under the URL location /foo/ are the same, but &lt;br /&gt;
under /bar/, they are not.  Of course, I don't want to &lt;br /&gt;
multiply index these identical resources.&lt;/p&gt;
&lt;p&gt;Server_aliases cannot be used, as these virtual hosts &lt;br /&gt;
do return different information for some URLs.  Also, &lt;br /&gt;
in some [other] cases (more than 5 aliases), a regex &lt;br /&gt;
expression would be nice, but not supported.  Note &lt;br /&gt;
also that it doesn't do subdomains (i.e. a convenience &lt;br /&gt;
to strip off a leading but optional "www."), so this &lt;br /&gt;
actually doubles the ruleset size needed. I mention &lt;br /&gt;
this because there are also some hostnames in the &lt;br /&gt;
spider-set which are identical virtual hosts which are &lt;br /&gt;
sometimes "wildcard-CNAMEd" at the DNS level.&lt;/p&gt;
&lt;p&gt;URL_rewrite_rules does support regex, but isn't &lt;br /&gt;
working in my case as it appears to be applied to pre-&lt;br /&gt;
normalized URLs, not post-normalized.  Therefore, &lt;br /&gt;
relative URLs escape their action, and as my usage is &lt;br /&gt;
indexing multiple domains, I cannot limit the rules to &lt;br /&gt;
just the domain it pertains to - I need the full, &lt;br /&gt;
normalized URL to rewrite.  If it were to apply to &lt;br /&gt;
post-normalized URLs, I could limit it.  I can't split &lt;br /&gt;
that one site off into its own htdig configuration &lt;br /&gt;
file as my intent is to crawl a set of sites that have &lt;br /&gt;
interreferencing URLs.&lt;/p&gt;
&lt;p&gt;I'm about to hack my source for htdig to give me &lt;br /&gt;
a "url_rewrite_normalized" equivalence, but would &lt;br /&gt;
prefer some sort of official change or consideration &lt;br /&gt;
for adding this feature.  It may be useful to someone &lt;br /&gt;
else.&lt;/p&gt;
&lt;p&gt;I'm using HTDig 3.1.6 with the https/SSL patch applied.&lt;/p&gt;&lt;/div&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Anonymous</dc:creator><pubDate>Sat, 18 Mar 2006 00:10:33 -0000</pubDate><guid>https://sourceforge.nete52da414d37b5c1334a783009e90cc3f5c1fe308</guid></item><item><title>URL rewriting not quite adequate.</title><link>https://sourceforge.net/p/htdig/feature-requests/83/</link><description>&lt;div class="markdown_content"&gt;&lt;p&gt;I'm having a problem with some URL rewriting when it &lt;br /&gt;
comes to the domain-name part.  Some servers have &lt;br /&gt;
aliases which are "partially separate" virtual hosts, &lt;br /&gt;
where some URL-paths are the same regardless of domain &lt;br /&gt;
actually used, while others are not.  For example, all &lt;br /&gt;
files under the URL location /foo/ are the same, but &lt;br /&gt;
under /bar/, they are not.  Of course, I don't want to &lt;br /&gt;
multiply index these identical resources.&lt;/p&gt;
&lt;p&gt;Server_aliases cannot be used, as these virtual hosts &lt;br /&gt;
do return different information for some URLs.  Also, &lt;br /&gt;
in some [other] cases (more than 5 aliases), a regex &lt;br /&gt;
expression would be nice, but not supported.  Note &lt;br /&gt;
also that it doesn't do subdomains (i.e. a convenience &lt;br /&gt;
to strip off a leading but optional "www."), so this &lt;br /&gt;
actually doubles the ruleset size needed. I mention &lt;br /&gt;
this because there are also some hostnames in the &lt;br /&gt;
spider-set which are identical virtual hosts which are &lt;br /&gt;
sometimes "wildcard-CNAMEd" at the DNS level.&lt;/p&gt;
&lt;p&gt;URL_rewrite_rules does support regex, but isn't &lt;br /&gt;
working in my case as it appears to be applied to pre-&lt;br /&gt;
normalized URLs, not post-normalized.  Therefore, &lt;br /&gt;
relative URLs escape their action, and as my usage is &lt;br /&gt;
indexing multiple domains, I cannot limit the rules to &lt;br /&gt;
just the domain it pertains to - I need the full, &lt;br /&gt;
normalized URL to rewrite.  If it were to apply to &lt;br /&gt;
post-normalized URLs, I could limit it.  I can't split &lt;br /&gt;
that one site off into its own htdig configuration &lt;br /&gt;
file as my intent is to crawl a set of sites that have &lt;br /&gt;
interreferencing URLs.&lt;/p&gt;
&lt;p&gt;I'm about to hack my source for htdig to give me &lt;br /&gt;
a "url_rewrite_normalized" equivalence, but would &lt;br /&gt;
prefer some sort of official change or consideration &lt;br /&gt;
for adding this feature.  It may be useful to someone &lt;br /&gt;
else.&lt;/p&gt;
&lt;p&gt;I'm using HTDig 3.1.6 with the https/SSL patch applied.&lt;/p&gt;&lt;/div&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Anonymous</dc:creator><pubDate>Fri, 17 Mar 2006 23:48:40 -0000</pubDate><guid>https://sourceforge.net017b273869944b9c0e9e5e35414180dd87c60ae6</guid></item></channel></rss>