support support for ngram indexer by jomart1985 · Pull Request #197 · luceneplusplus/LucenePlusPlus

jomart1985 · 2023-04-11T07:15:19Z

i hope if you support support for ngram indexer

Alain Barbet <alian@amisw.com>

Also added a note regarding the current incompatibility with Boost 1.50 and newer (issue #30). See: #30

Improved the README file's build instructions

Update for boost-1.5

- Thanks to kojik1010.

closes #40

closes #39

Commit e3f8992 ported the code to use Boost.Filesystem V3 API, so the warning about not being able to use Boost 1.50+ is no longer true.

Remove outdated note about Boost incompatibility from the README.

I've observed crashes where one thread is in the middle of initializing ZZ_CMAP and other is trying to use the partially initialized array and crashes. Use boost::once to ensure that only one thread handles the initialization and no thread uses the data until it is fully initialized.

Fix races when initializing static arrays in StandardTokenizerImpl

The argument should be named startOffset, not endOffset, otherwise the function is a no-op.

Use StringUtils::toString() before trying to concatenate or use operator<< which is more type-safe.

Fix some issues identified using clang's static analysis

Add support for compiling with -std=c++11.

Before 1.54, there was no support for varadic calls to boost:call_once(), so make the arrays static members to avoid the need to pass them to the static init methods.

Fix compatibility with Boost versions before 1.54

8628278 broke compilation due to a typo (boost:call_once instead of boost::call_once). Additionally, VC++ compilation with precompiled header was broken, because LuceneInc.h must be included as the very first header.

There was a typo in the output expression, appending a number to a string, instead of concatenating them as indented.

Fix accidental use of operator+ instead of operator<<.

Lucene++ keeps paths around as wide strings, but uses narrow char APIs (e.g. std::ifstream) when accessing files, using conversion to UTF-8 to get char* strings. This is correct on OS X and usually(!) correct on modern Unix systems, but is completely wrong on Windows, which _never_ uses UTF-8 for filenames. Fix this using boost::filesystem classes (path and streams) and appropriate conversions. In one place, use a Windows-specific workaround to deal with lack of wide char boost API. In particular: - Use boost::filesystem::*fstream classes that accept Unicode paths. - Use boost::filesystem::(w)path for manual conversion when needed. - When using char* only API (interprocess::file_lock), use GetShortPathName() as a workaround.

Fix incorrect paths handling on Windows.

…letest

Boost.System has been header only since Boost 1.69.0

Handle file enumeration exceptions in FileUtils::listDirectory

Fix build new cmake

Fix typo in MAX_VARINT32_LENGTH constant in BufferedIndexInput.cpp

Update DefaultSimilarity.cpp

Fix old comment about C++ standard

BitSet: Partial fix for Boost 1.90

Use conditional compilation to support both old and new Boost.Bind API: - Boost >= 1.73.0: Use boost/bind/bind.hpp - Boost < 1.73.0: Use boost/bind.hpp This approach maintains backward compatibility while fixing deprecation warnings in newer Boost versions.

Use new Boost.Bind API to fix deprecation warnings

Also remove Boost_SYSTEM_LIBRARIES, removed in #219

Several tests have custom mock classes. Unfortunately these frequently have identical names across tests, which creates problems when building with LTO, as everything is merged into a single test executable. GCC rightfully complains about this, since classes with the same name are assumed to have the same shape, and that is just not true here. Therefore rename the mock classes with the initials of the containing test. With this the entire test suite compiles and passes when built with LTO.

While trying to verify tests previously excluded in Gentoo (gentoo/gentoo@b9d1c7a) I noticed that ParallelMultiSearcherTest & SortTest would work, but hang in ~ThreadPool() on threadGroup.join_all(), preventing the test executable from terminating cleanly. Stopping the io_context makes join_all() work immediately.

Stop io_context before joining threads in ThreadPool destructor

Use unique class names for inner test mock classes

1. Added NGramAnalyzer, NGramTokenFilter, and NGramTokenizer classes for n-gram text analysis 2. Implemented configurable min/max gram sizes with validation 3. Added preserve original token option to NGramTokenFilter 4. Included comprehensive test cases for all new components

Update dependencies.cmake for new boost 1.90 without system-libraries

Add N-Gram analyzer components

alanw · 2026-05-25T18:01:58Z

Closed by #228

Alan Wright and others added 30 commits March 11, 2013 13:49

Fix for copy of compound file >2Go.

ca102b0

Alain Barbet <alian@amisw.com>

Improved the README file's build instructions.

21645ed

Also added a note regarding the current incompatibility with Boost 1.50 and newer (issue #30). See: #30

Merge pull request #34 from bendiken/readme-improvements

d63ba39

Improved the README file's build instructions

Update for boost-1.5

e3f8992

Update include/Config.h.cmake to have correct boost filesystem version.

f185807

Add CMake artifacts to gitignore

6251374

Merge pull request #35 from upthere/boost-1.5x

33e5aba

Update for boost-1.5

The right hand side needs to be cast to int64_t before being shifted.

99facd7

- Thanks to kojik1010.

Fixed unit tests.

5958503

Disable pch in waf.

ea93d35

Release version 3.0.4.

3b76d7c

Fix segfault when recording directory listing (Marcin Junczys-Dowmunt).

277b8d1

Turn off custom allocator by default.

65c63d0

closes #40

Strip symbols from shared libraries.

3d8a008

closes #39

Add support for compiling with c++11.

f4f7a4e

Remove outdated note about Boost incompatibility from the README.

4777666

Commit e3f8992 ported the code to use Boost.Filesystem V3 API, so the warning about not being able to use Boost 1.50+ is no longer true.

Merge pull request #41 from vslavik/patch-1

147c555

Remove outdated note about Boost incompatibility from the README.

Merge pull request #42 from upthere/tokenizer-init-race

ce10f9a

Fix races when initializing static arrays in StandardTokenizerImpl

Fix copy-and-paste error in TermVectorOffsetInfo::setStartOffset()

07d8d7b

The argument should be named startOffset, not endOffset, otherwise the function is a no-op.

Fix some cases of adding integers to string literals.

9386377

Use StringUtils::toString() before trying to concatenate or use operator<< which is more type-safe.

Merge pull request #43 from upthere/clang-static-analysis

5911693

Fix some issues identified using clang's static analysis

Merge pull request #44 from upthere/c++11-support

8b174c5

Add support for compiling with -std=c++11.

Fix compatibility with Boost versions before 1.54

e82a0fc

Before 1.54, there was no support for varadic calls to boost:call_once(), so make the arrays static members to avoid the need to pass them to the static init methods.

Merge pull request #45 from upthere/boost-once-backwards-compat

d009b33

Fix compatibility with Boost versions before 1.54

Fix StandardTokenizerImpl.cpp compilation.

523d07a

8628278 broke compilation due to a typo (boost:call_once instead of boost::call_once). Additionally, VC++ compilation with precompiled header was broken, because LuceneInc.h must be included as the very first header.

Fix accidental use of operator+ instead of operator<<.

2ac8183

There was a typo in the output expression, appending a number to a string, instead of concatenating them as indented.

Merge pull request #48 from vslavik/patch-1

7a9c11e

Fix accidental use of operator+ instead of operator<<.

Merge pull request #50 from vslavik/windows-paths-fixes

6a3254a

Fix incorrect paths handling on Windows.

LocutusOfBorg and others added 28 commits September 8, 2025 15:05

Bump minimum std-version to 17, fixing FTBFS with new gcc-15 and goog…

6e678a9

…letest

Boost.System has been header only since Boost 1.69.0

07d8426

Boost.System has been header only since Boost 1.69.0

8c9dca6

Update CMakeLists.txt

32039e5

Update CMakeLists.txt

7425cb1

Update CMakeLists.txt

2adbf0d

Update CMakeLists.txt

6af2547

Update CMakeLists.txt

39a3d7c

Merge pull request #219 from arjendekorte/patch-1

d65b887

Boost.System has been header only since Boost 1.69.0

Merge pull request #217 from wangrong1069/pr0821

cfd01d8

Handle file enumeration exceptions in FileUtils::listDirectory

Merge pull request #218 from LocutusOfBorg/fix-build-new-cmake

c5df275

Fix build new cmake

Merge pull request #216 from Johnson-zs/fix_bug_0

cbf7074

Fix typo in MAX_VARINT32_LENGTH constant in BufferedIndexInput.cpp

Merge pull request #200 from LocutusOfBorg/patch-1

26685c4

Update DefaultSimilarity.cpp

Fix old comment about C++ standard

913e7ad

Merge pull request #220 from ryonakano/fix-comment

47ce8e7

Fix old comment about C++ standard

BitSet: Partial fix for Boost 1.90

146a62a

BitSet: prefer builtin boost function

55a1238

Merge pull request #222 from sgn/boost-1.90

f11e089

BitSet: Partial fix for Boost 1.90

Merge pull request #223 from wineee/bind

29fcfed

Use new Boost.Bind API to fix deprecation warnings

Update dependencies.cmake

5b670a0

Also remove Boost_SYSTEM_LIBRARIES, removed in #219

Merge pull request #227 from hhoffstaette/threadpool-shutdown

5448b3d

Stop io_context before joining threads in ThreadPool destructor

Merge pull request #226 from hhoffstaette/odr-fixes

36c2a6f

Use unique class names for inner test mock classes

Merge pull request #224 from LocutusOfBorg/patch-1

090499e

Update dependencies.cmake for new boost 1.90 without system-libraries

Merge pull request #228 from Johnson-zs/ngram

48040f7

Add N-Gram analyzer components

alanw closed this May 25, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

support support for ngram indexer#197

support support for ngram indexer#197
jomart1985 wants to merge 274 commits into
devfrom
master

jomart1985 commented Apr 11, 2023

Uh oh!

alanw commented May 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

Conversation

jomart1985 commented Apr 11, 2023

Uh oh!

alanw commented May 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants