SearchCop

search_cop

SearchCop 扩展了您的 ActiveRecord 模型，以支持通过简单的查询字符串和基于哈希的查询来实现类似搜索引擎的查询。假设您有一个 Book 模型，具有各种属性如 title、author、stock、price、available。使用 SearchCop，您可以执行：

Book.search("Joanne Rowling Harry Potter")
Book.search("author: Rowling title:'Harry Potter'")
Book.search("price > 10 AND price < 20 -stock:0 (Potter OR Rowling)")
# ...

因此，您可以向模型提供搜索查询字符串，您、您的应用程序管理员和/或用户将获得强大的查询功能，而无需集成额外的第三方搜索服务器，因为 SearchCop 可以以数据库无关的方式使用您的 RDBMS 的全文索引功能（目前支持 MySQL 和 PostgreSQL 全文索引），并优化查询以最佳地利用它们。更多详情请阅读下文。

也支持复杂的基于哈希的查询：

Book.search(author: "Rowling", title: "Harry Potter")
Book.search(or: [{author: "Rowling"}, {author: "Tolkien"}])
Book.search(and: [{price: {gt: 10}}, {not: {stock: 0}}, or: [{title: "Potter"}, {author: "Rowling"}]])
Book.search(or: [{query: "Rowling -Potter"}, {query: "Tolkien -Rings"}])
Book.search(title: {my_custom_sql_query: "Rowl"}})
# ...

安装

将此行添加到您的应用程序的 Gemfile 中：

gem 'search_cop'

然后执行：

$ bundle

或者自己安装：

$ gem install search_cop

使用方法

要为模型启用 SearchCop，请 include SearchCop 并在 search_scope 中指定您想要暴露给搜索查询的属性：

class Book < ActiveRecord::Base
  include SearchCop

  search_scope :search do
    attributes :title, :description, :stock, :price, :created_at, :available
    attributes comment: ["comments.title", "comments.message"]
    attributes author: "author.name"
    # ...
  end

  has_many :comments
  belongs_to :author
end

当然，您也可以根据需要指定多个 search_scope 块：

search_scope :admin_search do
  attributes :title, :description, :stock, :price, :created_at, :available

  # ...
end

search_scope :user_search do
  attributes :title, :description

  # ...
end

它是如何工作的

SearchCop 解析查询并以数据库无关的方式将其映射到 SQL 查询。因此，SearchCop 不受特定 RDBMS 的限制。

Book.search("stock > 0")
# ... WHERE books.stock > 0

Book.search("price > 10 stock > 0")
# ... WHERE books.price > 10 AND books.stock > 0

Book.search("Harry Potter")
# ... WHERE (books.title LIKE '%Harry%' OR books.description LIKE '%Harry%' OR ...) AND (books.title LIKE '%Potter%' OR books.description LIKE '%Potter%' ...)

Book.search("available:yes OR created_at:2014")
# ... WHERE books.available = 1 OR (books.created_at >= '2014-01-01 00:00:00.00000' and books.created_at <= '2014-12-31 23:59:59.99999')

SearchCop 使用 ActiveSupport 的 beginning_of_year 和 end_of_year 方法来获取用于构建此情况的 SQL 查询的值。

当然，这些 LIKE '%...%' 查询不会达到最佳性能，但请查看下面关于 SearchCop 全文功能的部分，以了解如何优化结果查询。

由于 Book.search(...) 返回一个 ActiveRecord::Relation，您可以自由地以任何可能的方式预处理或后处理搜索结果：

Book.where(available: true).search("Harry Potter").order("books.id desc").paginate(page: params[:page])

安全性

当您将查询字符串传递给 SearchCop 时，它会被解析、分析并映射，最终构建 SQL 查询。更准确地说，当 SearchCop 解析查询时，它会创建对象（节点），这些对象代表查询表达式（And-、Or-、Not-、String-、Date- 等节点）。为了构建 SQL 查询，SearchCop 使用了类似于 Arel 中使用的访问者概念，因此，对于每个节点，必须有一个访问者，将节点转换为 SQL。如果没有访问者，当查询构建器尝试"访问"节点时，将引发异常。访问者负责净化用户提供的输入。这主要通过引用（字符串、表名、列引用等）来完成。SearchCop 使用 ActiveRecord 连接适配器提供的方法进行净化/引用以防止 SQL 注入。虽然我们永远不能 100% 确保没有安全问题，但 SearchCop 严肃对待安全问题。如果您发现任何与安全相关的问题，请通过 security at flakks dot com 负责任地报告。

json/jsonb/hstore

SearchCop 支持 MySQL 的 json 字段，以及 postgres 的 json、jsonb 和 hstore 字段。目前，字段值始终被期望为字符串，不支持数组。您可以通过以下方式指定 json 属性：

search_scope :search do
  attributes user_agent: "context->browser->user_agent"

  # ...
end

其中 context 是一个 json/jsonb 列，例如包含：

{
  "browser": {
    "user_agent": "Firefox ..."
  }
}

全文索引功能

默认情况下，如果您没有告诉SearchCop关于全文索引的信息，SearchCop将使用LIKE '%...%'查询。不幸的是，除非您创建一个trigram索引（仅适用于postgres），否则这些查询无法使用SQL索引，因此当您搜索Book.search("Harry Potter")或类似内容时，您的RDBMS需要扫描每一行。为了避免LIKE查询的性能损失，SearchCop可以利用MySQL和PostgreSQL的全文索引功能。要使用已存在的全文索引，只需通过以下方式告诉SearchCop使用它们：

class Book < ActiveRecord::Base
  # ...

  search_scope :search do
    attributes :title, :author

    options :title, :type => :fulltext
    options :author, :type => :fulltext
  end

  # ...
end

然后，SearchCop将透明地将具有全文索引的属性的SQL查询更改为：

Book.search("Harry Potter")
# MySQL: ... WHERE (MATCH(books.title) AGAINST('+Harry' IN BOOLEAN MODE) OR MATCH(books.author) AGAINST('+Harry' IN BOOLEAN MODE)) AND (MATCH(books.title) AGAINST ('+Potter' IN BOOLEAN MODE) OR MATCH(books.author) AGAINST('+Potter' IN BOOLEAN MODE))
# PostgreSQL: ... WHERE (to_tsvector('simple', books.title) @@ to_tsquery('simple', 'Harry') OR to_tsvector('simple', books.author) @@ to_tsquery('simple', 'Harry')) AND (to_tsvector('simple', books.title) @@ to_tsquery('simple', 'Potter') OR to_tsvector('simple', books.author) @@ to_tsquery('simple', 'Potter'))

显然，这些查询并不总是会返回与通配符LIKE查询相同的结果，因为我们搜索的是单词而不是子字符串。然而，全文索引通常会提供更好的性能。

此外，上面的查询还不是完美的。为了进一步改进它，SearchCop尝试优化查询，以最佳利用全文索引，同时仍允许将它们与非全文属性混合使用。要进一步改进查询，您可以对属性进行分组并指定默认搜索字段，这样SearchCop就不必再搜索所有字段：

search_scope :search do
  attributes all: [:author, :title]

  options :all, :type => :fulltext, default: true

  # 使用default: true显式启用字段作为默认字段（白名单方法）
  # 使用default: false显式禁用字段作为默认字段（黑名单方法）
end

现在SearchCop可以优化以下尚未优化的查询：

Book.search("Rowling OR Tolkien stock > 1")
# MySQL: ... WHERE ((MATCH(books.author) AGAINST('+Rowling' IN BOOLEAN MODE) OR MATCH(books.title) AGAINST('+Rowling' IN BOOLEAN MODE)) OR (MATCH(books.author) AGAINST('+Tolkien' IN BOOLEAN MODE) OR MATCH(books.title) AGAINST('+Tolkien' IN BOOLEAN MODE))) AND books.stock > 1
# PostgreSQL: ... WHERE ((to_tsvector('simple', books.author) @@ to_tsquery('simple', 'Rowling') OR to_tsvector('simple', books.title) @@ to_tsquery('simple', 'Rowling')) OR (to_tsvector('simple', books.author) @@ to_tsquery('simple', 'Tolkien') OR to_tsvector('simple', books.title) @@ to_tsquery('simple', 'Tolkien'))) AND books.stock > 1

优化为以下更高性能的查询：

Book.search("Rowling OR Tolkien stock > 1")
# MySQL: ... WHERE MATCH(books.author, books.title) AGAINST('Rowling Tolkien' IN BOOLEAN MODE) AND books.stock > 1
# PostgreSQL: ... WHERE to_tsvector('simple', books.author || ' ' || books.title) @@ to_tsquery('simple', 'Rowling | Tokien') and books.stock > 1

这里发生了什么？我们将all指定为由author和title组成的属性组的名称。另外，由于我们将all指定为全文属性，SearchCop假定存在一个复合全文索引，包含author和title，因此查询被相应地优化。最后，我们将all指定为默认搜索属性，这样只要其他属性没有在查询中直接指定（如stock > 0），SearchCop就可以忽略它们。

其他查询将以类似的方式进行优化，使得SearchCop尽量减少查询中的全文约束，即MySQL的MATCH() AGAINST()和PostgreSQL的to_tsvector() @@ to_tsquery()。

Book.search("(Rowling -Potter) OR Tolkien")
# MySQL: ... WHERE MATCH(books.author, books.title) AGAINST('(+Rowling -Potter) Tolkien' IN BOOLEAN MODE)
# PostgreSQL: ... WHERE to_tsvector('simple', books.author || ' ' || books.title) @@ to_tsquery('simple', '(Rowling & !Potter) | Tolkien')

要在MySQL中为books.title创建全文索引，只需使用：

add_index :books, :title, :type => :fulltext

对于复合索引，例如我们上面已经指定的默认字段all，使用：

add_index :books, [:author, :title], :type => :fulltext

请注意，MySQL支持MyISAM的全文索引，从MySQL 5.6+版本开始，也支持InnoDB的全文索引。有关MySQL全文索引的更多详细信息，请访问 http://dev.mysql.com/doc/refman/5.6/en/fulltext-search.html

对于PostgreSQL，有更多创建全文索引的方法。然而，最简单的方法之一是：

ActiveRecord::Base.connection.execute "CREATE INDEX fulltext_index_books_on_title ON books USING GIN(to_tsvector('simple', title))"

此外，对于PostgreSQL，您应该在config/application.rb中更改schema格式：

config.active_record.schema_format = :sql

对于PostgreSQL的复合索引，使用：

ActiveRecord::Base.connection.execute "CREATE INDEX fulltext_index_books_on_title ON books USING GIN(to_tsvector('simple', author || ' ' || title))"

要正确处理PostgreSQL中的NULL值，请在创建索引时和指定search_scope时使用COALESCE：

ActiveRecord::Base.connection.execute "CREATE INDEX fulltext_index_books_on_title ON books USING GIN(to_tsvector('simple', COALESCE(author, '') || ' ' || COALESCE(title, '')))"

以及：

search_scope :search do
  attributes :title

  options :title, :type => :fulltext, coalesce: true
end

要使用除simple之外的PostgreSQL字典，您需要相应地创建索引，并告诉SearchCop：

search_scope :search do
  attributes :title

  options :title, :type => :fulltext, dictionary: "english"
end

关于PostgreSQL全文索引的更多详情,请访问 http://www.postgresql.org/docs/9.3/static/textsearch.html

其他索引

如果你在搜索查询中暴露非全文属性(如价格、库存等),相应的查询(如Book.search("stock > 0"))将受益于常规的非全文索引。因此,你应该为每个暴露给搜索查询的列添加常规索引,并为每个全文属性添加全文索引。

如果你无法使用全文索引(例如,你仍在使用MySQL 5.5的InnoDB或其他不支持全文的RDBMS),你可以让RDBMS对字符串列使用常规非全文索引,前提是你不需要在LIKE查询中使用左通配符。只需提供以下选项:

class User < ActiveRecord::Base
  include SearchCop

  search_scope :search do
    attributes :username

    options :username, left_wildcard: false
  end

  # ...

这样SearchCop就会省略最左边的通配符。

User.search("admin")
# ... WHERE users.username LIKE 'admin%'

同样,你也可以禁用右通配符:

search_scope :search do
  attributes :username

  options :username, right_wildcard: false
end

默认运算符

当你在搜索范围中定义多个字段时,SearchCop默认会使用AND运算符来连接条件,例如:

class User < ActiveRecord::Base
  include SearchCop

  search_scope :search do
    attributes :username, :fullname
  end

  # ...
end

因此,像User.search("something")这样的搜索将生成一个包含以下条件的查询:

... WHERE username LIKE '%something%' AND fullname LIKE '%something%'

然而,在某些情况下,使用AND作为默认运算符并不合适,所以SearchCop允许你覆盖它并使用OR作为默认运算符。像User.search("something", default_operator: :or)这样的查询将使用OR来连接条件生成查询

... WHERE username LIKE '%something%' OR fullname LIKE '%something%'

最后,请注意你也可以将其应用于全文索引/查询。

关联

如果你指定了来自另一个模型的可搜索属性,比如

class Book < ActiveRecord::Base
  # ...

  belongs_to :author

  search_scope :search do
    attributes author: "author.name"
  end

  # ...
end

当你执行Book.search(...)时,SearchCop默认会eager_load引用的关联。如果你不想自动eager_load或需要执行特殊操作,请指定一个scope:

class Book < ActiveRecord::Base
  # ...

  search_scope :search do
    # ...

    scope { joins(:author).eager_load(:comments) } # 等等
  end

  # ...
end

这样,SearchCop将跳过任何关联的自动加载,而使用指定的scope。你也可以将scope与aliases一起使用,以执行任意复杂的连接并在连接的模型/表中搜索:

class Book < ActiveRecord::Base
  # ...

  search_scope :search do
    attributes similar: ["similar_books.title", "similar_books.description"]

    scope do
      joins "left outer join books similar_books on ..."
    end

    aliases similar_books: Book # 告诉SearchCop如何将SQL别名映射到模型
  end

  # ...
end

关联的关联也可以被引用和使用:

class Book < ActiveRecord::Base
  # ...

  has_many :comments
  has_many :users, :through => :comments

  search_scope :search do
    attributes user: "users.username"
  end

  # ...
end

自定义表名和关联

SearchCop试图从指定的属性中推断模型的类名和SQL别名,以自动检测数据类型定义等。这通常工作得很好。但是,如果你使用self.table_name = ...自定义表名,或者一个模型被多次关联,SearchCop就无法推断类和SQL别名,例如:

class Book < ActiveRecord::Base
  # ...

  has_many :users, :through => :comments
  belongs_to :user

  search_scope :search do
    attributes user: ["user.username", "users_books.username"]
  end

  # ...
end

在这里,为了使查询正常工作,你必须使用users_books.username,因为ActiveRecord在其SQL查询中为users分配了不同的SQL别名,因为user模型被多次关联。然而,由于SearchCop现在无法从users_books推断出User模型,你必须添加:

class Book < ActiveRecord::Base
  # ...

  search_scope :search do
    # ...

    aliases :users_books => :users
  end

  # ...
end

来告诉SearchCop自定义SQL别名和映射。此外,你总是可以通过scope {}块加aliases自己做连接,并使用你自己的自定义SQL别名,以不依赖ActiveRecord自动分配的名称。

支持的运算符

查询字符串查询支持AND/and、OR/or、:、=、!=、<、<=、>、>=、NOT/not/-、()、"..."和'...'。默认运算符是AND和matches,OR优先于AND。NOT只能作为单个属性的中缀运算符使用。

基于哈希的查询支持and: [...]和or: [...],它们接受一个包含not: {...}、matches: {...}、eq: {...}、not_eq: {...}、lt: {...}、lteq: {...}、gt: {...}、gteq: {...}和query: "..."参数的数组。此外,query: "..."使得创建子查询成为可能。查询字符串查询的其他规则也适用于基于哈希的查询。

自定义运算符(基于哈希的查询)

SearchCop还提供了在search_scope中定义generator来定义自定义运算符的能力。然后可以在基于哈希的查询搜索中使用它们。当你想使用SearchCop不支持的数据库运算符时,这很有用。

请注意,使用生成器时,你有责任对值进行清理/引用(参见下面的示例)。否则,你的生成器将允许SQL注入。因此,请仅在你知道自己在做什么时使用生成器。

例如,如果你想执行一个LIKE查询,其中书籍标题以一个字符串开头,你可以这样定义搜索范围:

search_scope :search do
  attributes :title
```ruby
generator :starts_with do |column_name, raw_value|
  pattern = "#{raw_value}%"
  "#{column_name} LIKE #{quote pattern}"
end

当你想执行搜索时,可以像这样使用:

Book.search(title: { starts_with: "The Great" })

安全说明:生成器返回的查询将直接插入到发送到数据库的查询中。这在你的应用中会产生潜在的SQL注入漏洞。如果你使用这个功能,你需要确保你返回的查询是安全可执行的。

映射

当在布尔、日期时间、时间戳等字段中搜索时,SearchCop会执行一些映射。以下查询是等效的:

Book.search("available:true")
Book.search("available:1")
Book.search("available:yes")

以及

Book.search("available:false")
Book.search("available:0")
Book.search("available:no")

对于日期时间和时间戳字段,SearchCop会将某些值扩展为范围:

Book.search("created_at:2014")
# ... WHERE created_at >= '2014-01-01 00:00:00' AND created_at <= '2014-12-31 23:59:59'

Book.search("created_at:2014-06")
# ... WHERE created_at >= '2014-06-01 00:00:00' AND created_at <= '2014-06-30 23:59:59'

Book.search("created_at:2014-06-15")
# ... WHERE created_at >= '2014-06-15 00:00:00' AND created_at <= '2014-06-15 23:59:59'

链式调用

搜索的链式调用是可能的。然而,链式调用目前不允许SearchCop为全文索引优化各个查询。

Book.search("Harry").search("Potter")

将生成

# MySQL: ... WHERE MATCH(...) AGAINST('+Harry' IN BOOLEAN MODE) AND MATCH(...) AGAINST('+Potter' IN BOOLEAN MODE)
# PostgreSQL: ... WHERE to_tsvector(...) @@ to_tsquery('simple', 'Harry') AND to_tsvector(...) @@ to_tsquery('simple', 'Potter')

而不是

# MySQL: ... WHERE MATCH(...) AGAINST('+Harry +Potter' IN BOOLEAN MODE)
# PostgreSQL: ... WHERE to_tsvector(...) @@ to_tsquery('simple', 'Harry & Potter')

因此,如果你使用全文索引,最好避免链式调用。

调试

当使用Model#search时,SearchCop会方便地阻止某些异常在传递给它的查询字符串无效时被抛出(解析错误、不兼容的数据类型错误等)。相反,Model#search会返回一个空的关联。但是,如果你需要调试某些情况,可以使用Model#unsafe_search,它会抛出这些异常。

Book.unsafe_search("stock: None") # => 抛出 SearchCop::IncompatibleDatatype

反射

SearchCop提供了反射方法,即#attributes、#default_attributes、#options和#aliases。你可以使用这些方法来为你的模型提供一个个性化的搜索帮助工具,列出可搜索的属性以及默认属性等。

class Product < ActiveRecord::Base
  include SearchCop

  search_scope :search do
    attributes :title, :description

    options :title, default: true
  end
end

Product.search_reflection(:search).attributes
# {"title" => ["products.title"], "description" => ["products.description"]}

Product.search_reflection(:search).default_attributes
# {"title" => ["products.title"]}

# ...