Recent changes to feature-requests

#91 separate layout from controller code

JT Moree — Fri, 21 Nov 2014 22:28:15 -0000

We feed the xml output into other apps so that they can consume the search data. Each app has it's own output and cannot use the htsearch html output.

separate layout from controller code

JT Moree — Fri, 21 Nov 2014 22:26:35 -0000

The htsearch uses very old school template generation. I propose breaking the output into its two parts: data and html. htsearch should ONLY return data that can be consumed by other services (such as a template engine to generate html). I have hacked htdig to output XML but it still has deficiencies. Specifically, passing of parameters to htsearch is prone to error and difficult when url and shell encoding are needed for handling special chars. I'll attach the changes I have made but the engine itself needs rewriting. Perhaps a program htxml that takes all available arguments individually and returns xml output is warranted?

htstat gets sigsegv when there are no words in database

Adam Tkac — Thu, 12 Feb 2009 14:35:26 -0000

# htload
DocumentDB::LoadDB: opening /var/lib/htdig/db.docs for reading: No such file or directory
WordList::Load: opening /var/lib/htdig/db.worddump for reading: No such file or directory
# htstat
htstat: Total documents: 0
htstat: Total words: 0
Segmentation fault

Mediawiki wrapper for ht:/Dig

Cedric Chantepie — Thu, 03 Jul 2008 07:28:19 -0000

We've developped a mediawiki extension that can be used as htdig wrapper (over htsearch CGI url). With it you can integrate htsearch in your wiki respecting its skins.

details : http://www.mediawiki.org/wiki/Extension:HtdigSearch
download page : http://www.nozicaa.com/en/page.content/Downloads

UTF8 support

Cedric Chantepie — Wed, 02 Jul 2008 19:32:26 -0000

UTF8 (and further unicode) encoding support for htdig and associated stuff.

more META information recognized

Anonymous — Mon, 07 Apr 2008 20:10:43 -0000

On the site of the French Mathematical Society article abstracts are indexed by ht:://Dig. The problem is that although these files are sometimes modified, they should appear in ht://Dig search results with their original creation date.

To do this it would be nice if one could add an htdig-* keyword in the META tag for specifying the creation date of the document.

thanks in advance

Yannis

search mediawiki

Helmut Arnold — Wed, 31 Jan 2007 07:52:52 -0000

Currently htdig digging on mediawiki pages are possible. The thing is that there are a lot of "Edit" buttons whose open the page in edit-mode and htdig runs into senseles loops.

Format epoch for sorting.

John Madrid — Mon, 18 Dec 2006 12:21:23 -0000

Sort results by meta tag date in format epoch.

Using the use_doc_date and gathering the meta tags date, but in the epoch format, not MM-DD-YYYY.

Giving the option to choose whatever sorting date you want in the config file.

Thanks

URL rewriting not quite adequate.

Anonymous — Sat, 18 Mar 2006 00:10:33 -0000

I'm having a problem with some URL rewriting when it
comes to the domain-name part. Some servers have
aliases which are "partially separate" virtual hosts,
where some URL-paths are the same regardless of domain
actually used, while others are not. For example, all
files under the URL location /foo/ are the same, but
under /bar/, they are not. Of course, I don't want to
multiply index these identical resources.

Server_aliases cannot be used, as these virtual hosts
do return different information for some URLs. Also,
in some [other] cases (more than 5 aliases), a regex
expression would be nice, but not supported. Note
also that it doesn't do subdomains (i.e. a convenience
to strip off a leading but optional "www."), so this
actually doubles the ruleset size needed. I mention
this because there are also some hostnames in the
spider-set which are identical virtual hosts which are
sometimes "wildcard-CNAMEd" at the DNS level.

URL_rewrite_rules does support regex, but isn't
working in my case as it appears to be applied to pre-
normalized URLs, not post-normalized. Therefore,
relative URLs escape their action, and as my usage is
indexing multiple domains, I cannot limit the rules to
just the domain it pertains to - I need the full,
normalized URL to rewrite. If it were to apply to
post-normalized URLs, I could limit it. I can't split
that one site off into its own htdig configuration
file as my intent is to crawl a set of sites that have
interreferencing URLs.

I'm about to hack my source for htdig to give me
a "url_rewrite_normalized" equivalence, but would
prefer some sort of official change or consideration
for adding this feature. It may be useful to someone
else.

I'm using HTDig 3.1.6 with the https/SSL patch applied.

URL rewriting not quite adequate.

Anonymous — Fri, 17 Mar 2006 23:48:40 -0000

I'm using HTDig 3.1.6 with the https/SSL patch applied.