In both cases create_function accepts up to four arguments: Performance-wise collations have some peculiarities, which we will discuss further.Īnother way to achieve case-insensitive search is to create an application-defined SQL function (documentation): It is especially easy to get if you define the collation in the table schema. EXPLAIN QUERY PLAN SELECT * FROM test WHERE text = 'something' - Output: SCAN TABLE test EXPLAIN QUERY PLAN SELECT * FROM test WHERE text = 'something' COLLATE NOCASE - Output: SEARCH TABLE test USING COVERING INDEX idx1 (text=?)Ĭollation provides case-insensitive sorting with ORDER BY out of the box. In a particular query: SELECT * FROM items WHERE text = "Text in AnY case" COLLATE UNICODE_NOCASE - In an index: CREATE INDEX IF NOT EXISTS idx1 ON test ( text COLLATE UNICODE_NOCASE ) - Word of caution: your query and index - must match exactly,including collation, - otherwise, SQLite will perform a full table scan. You can specify collation in the table schema and it will be automatically applied to all queries and indexes on this field unless you specify otherwise:ĬREATE TABLE test ( text VARCHAR COLLATE UNICODE_NOCASE ) įor the sake of completeness, let's look at two more ways to use collations: For the sake of an example we will use a new name:Ĭollations have several advantages compared to the next solutions: Here you have a choice – overload the built-in NOCASE or create your own – we will discuss the pros and cons below. To perform a case-insensitive search for all Unicode symbols we need to define a new collation in the application after connecting to the database (documentation). Therefore, below we will use the str.casefold() function for all conversions and comparisons. Since it is already lowercase, lower() would do nothing to 'ß' casefold() converts it to "ss". For example, the German lowercase letter 'ß' is equivalent to "ss". Casefolded strings may be used for caseless matching.Ĭasefolding is similar to lowercasing but more aggressive because it is intended to remove all case distinctions in a string. It will work in most circumstances, but it is not the proper way. My first instinct was to use str.lower() for this. To perform case-insensitive comparison and search we need to normalize strings to one case. ICU does not itself implement Unicode conversions, but relies on the underline operating system – I have seen multiple mentions of OS-specific issues, especially with Windows and macOS.Īll other solutions will depend on your Python code to perform the comparison, so it is important to choose the right approach to converting and comparing strings.Ĭhoosing the right python function for case-insensitive comparison ICU needs to be compiled before use, potentially for different OS and platforms (not tested). It is a new type of dependency: not a Python library, but an extension that should be distributed together with the application. It may even be faster than some of the later solutions since it is written in C and is more tightly integrated with SQLite. ICU solves the problems of both case-insensitive LIKE and comparison/search, plus adds support for different collations for a good measure. ICU stands for International Components for Unicode. Official SQLite documentation mentions the ICU extension as a way to add complete support for Unicode in SQLite. We will look at the pros and cons of each approach, implementation details, and, finally, at indexes and performance considerations. Some of these solutions can be adapted to other databases and for implementing Unicode-aware LIKE, REGEXP, MATCH, and other functions, although these topics are out of the scope of this post. I needed the full Unicode spectrum, so a better solution was in order.īelow I summarize five ways to achieve case insensitive search/comparison in SQLite for all Unicode symbols. It is not a problem if you plan to work with strings that contain only English alphabet letters, numbers, etc. For example, the expression 'a' LIKE 'A' is TRUE but 'æ' LIKE 'Æ' is FALSE. The LIKE operator is case sensitive by default for unicode characters that are beyond the ASCII range. SQLite only understands upper/lower case for ASCII characters by default. Second, and more importantly, they have a rather limited understanding of what case-insensitive mean: More on the use of indexes for case-insensitive queries is below. First, without special considerations they do not make use of indexes on the field they are working on, with LIKE being the worst offender: in most cases it is incapable of using indexes. Enter fullscreen mode Exit fullscreen modeĪll these approaches are not ideal.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |