Mercurial > public > src > rhodecode
annotate pylons_app/lib/indexers/daemon.py @ 527:a9e50dce3081 celery
Removed config names from whoosh and celery,
celery is now configured based on the config name it's using
on celeryconfig. And whoosh uses it's own logger configured just for whoosh
Test creates a fresh whoosh index now, for more accurate checks
fixed tests for searching
author | Marcin Kuzminski <marcin@python-works.com> |
---|---|
date | Fri, 17 Sep 2010 22:54:30 +0200 |
parents | e01a85f9fc90 |
children | fefffd6fd5f4 |
rev | line source |
---|---|
439
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
1 #!/usr/bin/env python |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
2 # encoding: utf-8 |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
3 # whoosh indexer daemon for hg-app |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
4 # Copyright (C) 2009-2010 Marcin Kuzminski <marcin@python-works.com> |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
5 # |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
6 # This program is free software; you can redistribute it and/or |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
7 # modify it under the terms of the GNU General Public License |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
8 # as published by the Free Software Foundation; version 2 |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
9 # of the License or (at your opinion) any later version of the license. |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
10 # |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
11 # This program is distributed in the hope that it will be useful, |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
12 # but WITHOUT ANY WARRANTY; without even the implied warranty of |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
13 # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
14 # GNU General Public License for more details. |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
15 # |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
16 # You should have received a copy of the GNU General Public License |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
17 # along with this program; if not, write to the Free Software |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
18 # Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
19 # MA 02110-1301, USA. |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
20 """ |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
21 Created on Jan 26, 2010 |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
22 |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
23 @author: marcink |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
24 A deamon will read from task table and run tasks |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
25 """ |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
26 import sys |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
27 import os |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
28 from os.path import dirname as dn |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
29 from os.path import join as jn |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
30 |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
31 #to get the pylons_app import |
447
9b67cebe6609
some fixes to whoosh indexer daemon
Marcin Kuzminski <marcin@python-works.com>
parents:
441
diff
changeset
|
32 project_path = dn(dn(dn(dn(os.path.realpath(__file__))))) |
9b67cebe6609
some fixes to whoosh indexer daemon
Marcin Kuzminski <marcin@python-works.com>
parents:
441
diff
changeset
|
33 sys.path.append(project_path) |
439
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
34 |
447
9b67cebe6609
some fixes to whoosh indexer daemon
Marcin Kuzminski <marcin@python-works.com>
parents:
441
diff
changeset
|
35 from pidlock import LockHeld, DaemonLock |
439
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
36 from pylons_app.model.hg_model import HgModel |
482
e5157e2a530e
added safe unicode funtion, and implemented it in whoosh indexer
Marcin Kuzminski <marcin@python-works.com>
parents:
480
diff
changeset
|
37 from pylons_app.lib.helpers import safe_unicode |
439
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
38 from whoosh.index import create_in, open_dir |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
39 from shutil import rmtree |
527
a9e50dce3081
Removed config names from whoosh and celery,
Marcin Kuzminski <marcin@python-works.com>
parents:
508
diff
changeset
|
40 from pylons_app.lib.indexers import INDEX_EXTENSIONS, IDX_LOCATION, SCHEMA, IDX_NAME |
447
9b67cebe6609
some fixes to whoosh indexer daemon
Marcin Kuzminski <marcin@python-works.com>
parents:
441
diff
changeset
|
41 |
439
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
42 import logging |
527
a9e50dce3081
Removed config names from whoosh and celery,
Marcin Kuzminski <marcin@python-works.com>
parents:
508
diff
changeset
|
43 |
447
9b67cebe6609
some fixes to whoosh indexer daemon
Marcin Kuzminski <marcin@python-works.com>
parents:
441
diff
changeset
|
44 log = logging.getLogger('whooshIndexer') |
527
a9e50dce3081
Removed config names from whoosh and celery,
Marcin Kuzminski <marcin@python-works.com>
parents:
508
diff
changeset
|
45 # create logger |
a9e50dce3081
Removed config names from whoosh and celery,
Marcin Kuzminski <marcin@python-works.com>
parents:
508
diff
changeset
|
46 log.setLevel(logging.DEBUG) |
a9e50dce3081
Removed config names from whoosh and celery,
Marcin Kuzminski <marcin@python-works.com>
parents:
508
diff
changeset
|
47 |
a9e50dce3081
Removed config names from whoosh and celery,
Marcin Kuzminski <marcin@python-works.com>
parents:
508
diff
changeset
|
48 # create console handler and set level to debug |
a9e50dce3081
Removed config names from whoosh and celery,
Marcin Kuzminski <marcin@python-works.com>
parents:
508
diff
changeset
|
49 ch = logging.StreamHandler() |
a9e50dce3081
Removed config names from whoosh and celery,
Marcin Kuzminski <marcin@python-works.com>
parents:
508
diff
changeset
|
50 ch.setLevel(logging.DEBUG) |
a9e50dce3081
Removed config names from whoosh and celery,
Marcin Kuzminski <marcin@python-works.com>
parents:
508
diff
changeset
|
51 |
a9e50dce3081
Removed config names from whoosh and celery,
Marcin Kuzminski <marcin@python-works.com>
parents:
508
diff
changeset
|
52 # create formatter |
a9e50dce3081
Removed config names from whoosh and celery,
Marcin Kuzminski <marcin@python-works.com>
parents:
508
diff
changeset
|
53 formatter = logging.Formatter("%(asctime)s - %(name)s - %(levelname)s - %(message)s") |
a9e50dce3081
Removed config names from whoosh and celery,
Marcin Kuzminski <marcin@python-works.com>
parents:
508
diff
changeset
|
54 |
a9e50dce3081
Removed config names from whoosh and celery,
Marcin Kuzminski <marcin@python-works.com>
parents:
508
diff
changeset
|
55 # add formatter to ch |
a9e50dce3081
Removed config names from whoosh and celery,
Marcin Kuzminski <marcin@python-works.com>
parents:
508
diff
changeset
|
56 ch.setFormatter(formatter) |
a9e50dce3081
Removed config names from whoosh and celery,
Marcin Kuzminski <marcin@python-works.com>
parents:
508
diff
changeset
|
57 |
a9e50dce3081
Removed config names from whoosh and celery,
Marcin Kuzminski <marcin@python-works.com>
parents:
508
diff
changeset
|
58 # add ch to logger |
a9e50dce3081
Removed config names from whoosh and celery,
Marcin Kuzminski <marcin@python-works.com>
parents:
508
diff
changeset
|
59 log.addHandler(ch) |
439
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
60 |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
61 def scan_paths(root_location): |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
62 return HgModel.repo_scan('/', root_location, None, True) |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
63 |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
64 class WhooshIndexingDaemon(object): |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
65 """Deamon for atomic jobs""" |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
66 |
447
9b67cebe6609
some fixes to whoosh indexer daemon
Marcin Kuzminski <marcin@python-works.com>
parents:
441
diff
changeset
|
67 def __init__(self, indexname='HG_INDEX', repo_location=None): |
439
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
68 self.indexname = indexname |
447
9b67cebe6609
some fixes to whoosh indexer daemon
Marcin Kuzminski <marcin@python-works.com>
parents:
441
diff
changeset
|
69 self.repo_location = repo_location |
508
e01a85f9fc90
fixed initial whoosh indexer. Build full index on first run even with incremental flag
Marcin Kuzminski <marcin@python-works.com>
parents:
494
diff
changeset
|
70 self.initial = False |
e01a85f9fc90
fixed initial whoosh indexer. Build full index on first run even with incremental flag
Marcin Kuzminski <marcin@python-works.com>
parents:
494
diff
changeset
|
71 if not os.path.isdir(IDX_LOCATION): |
e01a85f9fc90
fixed initial whoosh indexer. Build full index on first run even with incremental flag
Marcin Kuzminski <marcin@python-works.com>
parents:
494
diff
changeset
|
72 os.mkdir(IDX_LOCATION) |
e01a85f9fc90
fixed initial whoosh indexer. Build full index on first run even with incremental flag
Marcin Kuzminski <marcin@python-works.com>
parents:
494
diff
changeset
|
73 log.info('Cannot run incremental index since it does not' |
e01a85f9fc90
fixed initial whoosh indexer. Build full index on first run even with incremental flag
Marcin Kuzminski <marcin@python-works.com>
parents:
494
diff
changeset
|
74 ' yet exist running full build') |
e01a85f9fc90
fixed initial whoosh indexer. Build full index on first run even with incremental flag
Marcin Kuzminski <marcin@python-works.com>
parents:
494
diff
changeset
|
75 self.initial = True |
439
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
76 |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
77 def get_paths(self, root_dir): |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
78 """recursive walk in root dir and return a set of all path in that dir |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
79 excluding files in .hg dir""" |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
80 index_paths_ = set() |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
81 for path, dirs, files in os.walk(root_dir): |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
82 if path.find('.hg') == -1: |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
83 for f in files: |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
84 index_paths_.add(jn(path, f)) |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
85 |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
86 return index_paths_ |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
87 |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
88 def add_doc(self, writer, path, repo): |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
89 """Adding doc to writer""" |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
90 |
474
28f19fa562df
updated config files,
Marcin Kuzminski <marcin@python-works.com>
parents:
447
diff
changeset
|
91 ext = unicode(path.split('/')[-1].split('.')[-1].lower()) |
28f19fa562df
updated config files,
Marcin Kuzminski <marcin@python-works.com>
parents:
447
diff
changeset
|
92 #we just index the content of choosen files |
28f19fa562df
updated config files,
Marcin Kuzminski <marcin@python-works.com>
parents:
447
diff
changeset
|
93 if ext in INDEX_EXTENSIONS: |
28f19fa562df
updated config files,
Marcin Kuzminski <marcin@python-works.com>
parents:
447
diff
changeset
|
94 log.debug(' >> %s [WITH CONTENT]' % path) |
439
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
95 fobj = open(path, 'rb') |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
96 content = fobj.read() |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
97 fobj.close() |
482
e5157e2a530e
added safe unicode funtion, and implemented it in whoosh indexer
Marcin Kuzminski <marcin@python-works.com>
parents:
480
diff
changeset
|
98 u_content = safe_unicode(content) |
439
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
99 else: |
474
28f19fa562df
updated config files,
Marcin Kuzminski <marcin@python-works.com>
parents:
447
diff
changeset
|
100 log.debug(' >> %s' % path) |
28f19fa562df
updated config files,
Marcin Kuzminski <marcin@python-works.com>
parents:
447
diff
changeset
|
101 #just index file name without it's content |
28f19fa562df
updated config files,
Marcin Kuzminski <marcin@python-works.com>
parents:
447
diff
changeset
|
102 u_content = u'' |
480
c59c4d4323e7
added support for broken symlinks in whoosh indexer
Marcin Kuzminski <marcin@python-works.com>
parents:
474
diff
changeset
|
103 |
c59c4d4323e7
added support for broken symlinks in whoosh indexer
Marcin Kuzminski <marcin@python-works.com>
parents:
474
diff
changeset
|
104 |
c59c4d4323e7
added support for broken symlinks in whoosh indexer
Marcin Kuzminski <marcin@python-works.com>
parents:
474
diff
changeset
|
105 |
c59c4d4323e7
added support for broken symlinks in whoosh indexer
Marcin Kuzminski <marcin@python-works.com>
parents:
474
diff
changeset
|
106 try: |
c59c4d4323e7
added support for broken symlinks in whoosh indexer
Marcin Kuzminski <marcin@python-works.com>
parents:
474
diff
changeset
|
107 os.stat(path) |
c59c4d4323e7
added support for broken symlinks in whoosh indexer
Marcin Kuzminski <marcin@python-works.com>
parents:
474
diff
changeset
|
108 writer.add_document(owner=unicode(repo.contact), |
439
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
109 repository=u"%s" % repo.name, |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
110 path=u"%s" % path, |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
111 content=u_content, |
474
28f19fa562df
updated config files,
Marcin Kuzminski <marcin@python-works.com>
parents:
447
diff
changeset
|
112 modtime=os.path.getmtime(path), |
480
c59c4d4323e7
added support for broken symlinks in whoosh indexer
Marcin Kuzminski <marcin@python-works.com>
parents:
474
diff
changeset
|
113 extension=ext) |
c59c4d4323e7
added support for broken symlinks in whoosh indexer
Marcin Kuzminski <marcin@python-works.com>
parents:
474
diff
changeset
|
114 except OSError, e: |
c59c4d4323e7
added support for broken symlinks in whoosh indexer
Marcin Kuzminski <marcin@python-works.com>
parents:
474
diff
changeset
|
115 import errno |
c59c4d4323e7
added support for broken symlinks in whoosh indexer
Marcin Kuzminski <marcin@python-works.com>
parents:
474
diff
changeset
|
116 if e.errno == errno.ENOENT: |
c59c4d4323e7
added support for broken symlinks in whoosh indexer
Marcin Kuzminski <marcin@python-works.com>
parents:
474
diff
changeset
|
117 log.debug('path %s does not exist or is a broken symlink' % path) |
c59c4d4323e7
added support for broken symlinks in whoosh indexer
Marcin Kuzminski <marcin@python-works.com>
parents:
474
diff
changeset
|
118 else: |
c59c4d4323e7
added support for broken symlinks in whoosh indexer
Marcin Kuzminski <marcin@python-works.com>
parents:
474
diff
changeset
|
119 raise e |
c59c4d4323e7
added support for broken symlinks in whoosh indexer
Marcin Kuzminski <marcin@python-works.com>
parents:
474
diff
changeset
|
120 |
439
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
121 |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
122 def build_index(self): |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
123 if os.path.exists(IDX_LOCATION): |
474
28f19fa562df
updated config files,
Marcin Kuzminski <marcin@python-works.com>
parents:
447
diff
changeset
|
124 log.debug('removing previos index') |
439
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
125 rmtree(IDX_LOCATION) |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
126 |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
127 if not os.path.exists(IDX_LOCATION): |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
128 os.mkdir(IDX_LOCATION) |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
129 |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
130 idx = create_in(IDX_LOCATION, SCHEMA, indexname=IDX_NAME) |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
131 writer = idx.writer() |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
132 |
447
9b67cebe6609
some fixes to whoosh indexer daemon
Marcin Kuzminski <marcin@python-works.com>
parents:
441
diff
changeset
|
133 for cnt, repo in enumerate(scan_paths(self.repo_location).values()): |
439
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
134 log.debug('building index @ %s' % repo.path) |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
135 |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
136 for idx_path in self.get_paths(repo.path): |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
137 self.add_doc(writer, idx_path, repo) |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
138 writer.commit(merge=True) |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
139 |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
140 log.debug('>>> FINISHED BUILDING INDEX <<<') |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
141 |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
142 |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
143 def update_index(self): |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
144 log.debug('STARTING INCREMENTAL INDEXING UPDATE') |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
145 |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
146 idx = open_dir(IDX_LOCATION, indexname=self.indexname) |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
147 # The set of all paths in the index |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
148 indexed_paths = set() |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
149 # The set of all paths we need to re-index |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
150 to_index = set() |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
151 |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
152 reader = idx.reader() |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
153 writer = idx.writer() |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
154 |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
155 # Loop over the stored fields in the index |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
156 for fields in reader.all_stored_fields(): |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
157 indexed_path = fields['path'] |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
158 indexed_paths.add(indexed_path) |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
159 |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
160 if not os.path.exists(indexed_path): |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
161 # This file was deleted since it was indexed |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
162 log.debug('removing from index %s' % indexed_path) |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
163 writer.delete_by_term('path', indexed_path) |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
164 |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
165 else: |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
166 # Check if this file was changed since it |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
167 # was indexed |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
168 indexed_time = fields['modtime'] |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
169 |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
170 mtime = os.path.getmtime(indexed_path) |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
171 |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
172 if mtime > indexed_time: |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
173 |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
174 # The file has changed, delete it and add it to the list of |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
175 # files to reindex |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
176 log.debug('adding to reindex list %s' % indexed_path) |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
177 writer.delete_by_term('path', indexed_path) |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
178 to_index.add(indexed_path) |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
179 #writer.commit() |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
180 |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
181 # Loop over the files in the filesystem |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
182 # Assume we have a function that gathers the filenames of the |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
183 # documents to be indexed |
447
9b67cebe6609
some fixes to whoosh indexer daemon
Marcin Kuzminski <marcin@python-works.com>
parents:
441
diff
changeset
|
184 for repo in scan_paths(self.repo_location).values(): |
439
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
185 for path in self.get_paths(repo.path): |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
186 if path in to_index or path not in indexed_paths: |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
187 # This is either a file that's changed, or a new file |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
188 # that wasn't indexed before. So index it! |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
189 self.add_doc(writer, path, repo) |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
190 log.debug('reindexing %s' % path) |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
191 |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
192 writer.commit(merge=True) |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
193 #idx.optimize() |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
194 log.debug('>>> FINISHED <<<') |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
195 |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
196 def run(self, full_index=False): |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
197 """Run daemon""" |
508
e01a85f9fc90
fixed initial whoosh indexer. Build full index on first run even with incremental flag
Marcin Kuzminski <marcin@python-works.com>
parents:
494
diff
changeset
|
198 if full_index or self.initial: |
439
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
199 self.build_index() |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
200 else: |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
201 self.update_index() |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
202 |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
203 if __name__ == "__main__": |
493
d726f62f886e
updated whoosh indexer to take index building argument type
Marcin Kuzminski <marcin@python-works.com>
parents:
482
diff
changeset
|
204 arg = sys.argv[1:] |
494
f19d3ee89335
updated whoosh indexer to take path as second argument
Marcin Kuzminski <marcin@python-works.com>
parents:
493
diff
changeset
|
205 if len(arg) != 2: |
f19d3ee89335
updated whoosh indexer to take path as second argument
Marcin Kuzminski <marcin@python-works.com>
parents:
493
diff
changeset
|
206 sys.stderr.write('Please specify indexing type [full|incremental]' |
f19d3ee89335
updated whoosh indexer to take path as second argument
Marcin Kuzminski <marcin@python-works.com>
parents:
493
diff
changeset
|
207 'and path to repositories as script args \n') |
493
d726f62f886e
updated whoosh indexer to take index building argument type
Marcin Kuzminski <marcin@python-works.com>
parents:
482
diff
changeset
|
208 sys.exit() |
494
f19d3ee89335
updated whoosh indexer to take path as second argument
Marcin Kuzminski <marcin@python-works.com>
parents:
493
diff
changeset
|
209 |
f19d3ee89335
updated whoosh indexer to take path as second argument
Marcin Kuzminski <marcin@python-works.com>
parents:
493
diff
changeset
|
210 |
493
d726f62f886e
updated whoosh indexer to take index building argument type
Marcin Kuzminski <marcin@python-works.com>
parents:
482
diff
changeset
|
211 if arg[0] == 'full': |
d726f62f886e
updated whoosh indexer to take index building argument type
Marcin Kuzminski <marcin@python-works.com>
parents:
482
diff
changeset
|
212 full_index = True |
d726f62f886e
updated whoosh indexer to take index building argument type
Marcin Kuzminski <marcin@python-works.com>
parents:
482
diff
changeset
|
213 elif arg[0] == 'incremental': |
d726f62f886e
updated whoosh indexer to take index building argument type
Marcin Kuzminski <marcin@python-works.com>
parents:
482
diff
changeset
|
214 # False means looking just for changes |
d726f62f886e
updated whoosh indexer to take index building argument type
Marcin Kuzminski <marcin@python-works.com>
parents:
482
diff
changeset
|
215 full_index = False |
d726f62f886e
updated whoosh indexer to take index building argument type
Marcin Kuzminski <marcin@python-works.com>
parents:
482
diff
changeset
|
216 else: |
d726f62f886e
updated whoosh indexer to take index building argument type
Marcin Kuzminski <marcin@python-works.com>
parents:
482
diff
changeset
|
217 sys.stdout.write('Please use [full|incremental]' |
494
f19d3ee89335
updated whoosh indexer to take path as second argument
Marcin Kuzminski <marcin@python-works.com>
parents:
493
diff
changeset
|
218 ' as script first arg \n') |
493
d726f62f886e
updated whoosh indexer to take index building argument type
Marcin Kuzminski <marcin@python-works.com>
parents:
482
diff
changeset
|
219 sys.exit() |
d726f62f886e
updated whoosh indexer to take index building argument type
Marcin Kuzminski <marcin@python-works.com>
parents:
482
diff
changeset
|
220 |
494
f19d3ee89335
updated whoosh indexer to take path as second argument
Marcin Kuzminski <marcin@python-works.com>
parents:
493
diff
changeset
|
221 if not os.path.isdir(arg[1]): |
f19d3ee89335
updated whoosh indexer to take path as second argument
Marcin Kuzminski <marcin@python-works.com>
parents:
493
diff
changeset
|
222 sys.stderr.write('%s is not a valid path \n' % arg[1]) |
f19d3ee89335
updated whoosh indexer to take path as second argument
Marcin Kuzminski <marcin@python-works.com>
parents:
493
diff
changeset
|
223 sys.exit() |
f19d3ee89335
updated whoosh indexer to take path as second argument
Marcin Kuzminski <marcin@python-works.com>
parents:
493
diff
changeset
|
224 else: |
f19d3ee89335
updated whoosh indexer to take path as second argument
Marcin Kuzminski <marcin@python-works.com>
parents:
493
diff
changeset
|
225 if arg[1].endswith('/'): |
f19d3ee89335
updated whoosh indexer to take path as second argument
Marcin Kuzminski <marcin@python-works.com>
parents:
493
diff
changeset
|
226 repo_location = arg[1] + '*' |
f19d3ee89335
updated whoosh indexer to take path as second argument
Marcin Kuzminski <marcin@python-works.com>
parents:
493
diff
changeset
|
227 else: |
f19d3ee89335
updated whoosh indexer to take path as second argument
Marcin Kuzminski <marcin@python-works.com>
parents:
493
diff
changeset
|
228 repo_location = arg[1] + '/*' |
493
d726f62f886e
updated whoosh indexer to take index building argument type
Marcin Kuzminski <marcin@python-works.com>
parents:
482
diff
changeset
|
229 |
439
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
230 try: |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
231 l = DaemonLock() |
474
28f19fa562df
updated config files,
Marcin Kuzminski <marcin@python-works.com>
parents:
447
diff
changeset
|
232 WhooshIndexingDaemon(repo_location=repo_location)\ |
28f19fa562df
updated config files,
Marcin Kuzminski <marcin@python-works.com>
parents:
447
diff
changeset
|
233 .run(full_index=full_index) |
439
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
234 l.release() |
527
a9e50dce3081
Removed config names from whoosh and celery,
Marcin Kuzminski <marcin@python-works.com>
parents:
508
diff
changeset
|
235 reload(logging) |
439
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
236 except LockHeld: |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
237 sys.exit(1) |
b153a51b1d3b
Implemented search using whoosh. Still as experimental option.
Marcin Kuzminski <marcin@python-works.com>
parents:
diff
changeset
|
238 |