UWInfo Blog

搜尋

搜尋意見

文章分類-#Author#

[所有文章分類]

最新回應

Newtonsoft.Json.JsonConvert.DeserializeObject 失敗的情況
test...more
dotnet ef dbcontext scaffold
...more
[ASP.NET] 利用 aspnet_regiis 加密 web.config
...more
IIS ARR (reverse proxy) 服務安裝
...more
[錯誤訊息] 請加入 ScriptResourceMapping 命名的 jquery (區分大小寫)
...more
用 Javascript 跨網頁讀取 cookie (Cookie cross page, path of cookie)
...more
線上客服 - MSN
本人信箱被盜用以致資料外洩，是否可以請貴平台予以協助刪除該信箱之使用謝謝囉...more
插入文字到游標或選取處
aaaaa...more
IIS 配合 AD (Active Directory) 認証, 使用 .Net 6.0
太感謝你了~~~你救了我被windows 認證卡了好幾天QQ...more
PostgreSQL 的 monitor trigger
FOR EACH ROW 可能要改為 FOR EACH STATEMENT ...more

標籤

搜尋 ddd 結果：

mysql 使用 EF model 要注意的地方

近日處理我們公司某個使用 my sql 的專案，已經被雷的幾次，紀錄一下

1. 使用 EF Scaffolding 指令去更新 Model ，有些欄位原本是明確型別變成可null型別
也就是本來是 datetime 變成 datetime?, int 變成 int?
看起來是一些 view 的欄位會有這些狀況，這時就要去 git revert ，避免build專案又一堆錯誤
2. 使用 IQueryable 語法處理時間要小心，不要在裡面計算時間。

DateTime shouldBeGiveCoinDateTime = DateTime.Now.AddDays(-14);

return dbContext.ViewOrderForCoinCronJob.AsNoTracking().Where(x =>

    x.CompletedAt <= shouldBeGiveCoinDateTime

    && x.CreatedAt >= DateTime.Now.AddYears(-1) // -> 這裡轉換mysql指令會出錯

    && x.DeletedAt == null

    && x.CoinAmount > 0

    && x.Status == (byte)ORDER_STATUS.COMPLETED

    && dbContext.PackingMain.Any(y => y.OrderId == x.Id)

    && x.IsGiveCoin != 0

    && !dbContext.CoinLog.Any(y =>

        y.OrderId == x.Id

        && y.StrKey.Contains(COIN_LOG.STRING_KEY_PREFIX.ORDER_ACCUMULATION)

    )

).ToList();

要把時間計算先算出來，再帶入式子

DateTime shouldBeGiveCoinDateTime = DateTime.Now.AddDays(-14);

DateTime createDateStartAt = DateTime.Now.AddYears(-1); // -> 先算出日期

return dbContext.ViewOrderForCoinCronJob.AsNoTracking().Where(x =>

    x.CompletedAt <= shouldBeGiveCoinDateTime

    && x.CreatedAt >= createDateStartAt // -> 帶入算好的日期

    && x.DeletedAt == null

    && x.CoinAmount > 0

    && x.Status == (byte)ORDER_STATUS.COMPLETED

    && dbContext.PackingMain.Any(y => y.OrderId == x.Id)

    && x.IsGiveCoin != 0

    && !dbContext.CoinLog.Any(y =>

        y.OrderId == x.Id

        && y.StrKey.Contains(COIN_LOG.STRING_KEY_PREFIX.ORDER_ACCUMULATION)

    )

).ToList();

More...

darren, 2025/3/10 下午 05:43:38

使用Lucene.Net達成全文檢索！基礎解說(二)

上一集當中我們完成了Lucene基本操作中的Create與Read，這一集會將CRUD中的Update與Delete的操作方法告訴你，並且本集會著重於講解關於"Norms"與權重(Boost)在Lucene中的運作概念。

首先我們建立一個.Net 6的主控台應用程式

建立好後於右側專案右鍵選擇"管理Nuget套件"，並選擇"瀏覽">於搜索列中搜尋"Lucene">安裝3.0.3最新穩定版與 "System.Configuration.ConfigurationManager"

安裝好後就可以於專案內使用Lucene套件囉!
再來依照上一篇的教學建立一套簡單的Lucene查詢

using Lucene.Net.Analysis.Standard;

using Lucene.Net.Documents;

using Lucene.Net.Index;

using Lucene.Net.QueryParsers;

using Lucene.Net.Search;

using Lucene.Net.Store;



var _dir = new DirectoryInfo("LuceneDocument");

if (!File.Exists(_dir.FullName))

{

    System.IO.Directory.CreateDirectory(_dir.FullName);

}

var analyzer = new StandardAnalyzer(Lucene.Net.Util.Version.LUCENE_CURRENT);

CreateIndex(GetProductsInformation(), _dir, analyzer);



while (true)

{

    Console.Write("請輸入欲查詢字串 :");

    var searchValue = Console.ReadLine();

    Search(searchValue, _dir, analyzer);

}



void CreateIndex(List<Product> information, DirectoryInfo dir, StandardAnalyzer analyzer)

{

    using (var directory = FSDirectory.Open(dir))

    {

        using (var indexWriter = new IndexWriter(directory, analyzer, true, IndexWriter.MaxFieldLength.LIMITED))

        {

            foreach (var index in information)

            {

                var document = new Document();

                document.Add(new Field("Id", index.Id.ToString(), Field.Store.YES, Field.Index.NO));

                document.Add(new Field("Name", index.Name, Field.Store.YES, Field.Index.ANALYZED));

                document.Add(new Field("Description", index.Description, Field.Store.YES, Field.Index.ANALYZED));

                indexWriter.AddDocument(document);

            }

            indexWriter.Optimize();

            indexWriter.Commit();

        }

    }

}

void Search(string searchValue, DirectoryInfo dir, StandardAnalyzer analyzer)

{

    using (var directory = FSDirectory.Open(_dir))

    {

        var parser = new QueryParser(Lucene.Net.Util.Version.LUCENE_CURRENT, "Description", analyzer).Parse(searchValue);

        using (var indexSearcher = new IndexSearcher(directory))

        {

            var queryLimit = 20;

            var hits = indexSearcher.Search(parser, queryLimit);

            if (!hits.ScoreDocs.Any())

            {

                Console.WriteLine("查無相關結果。");

                return;

            }

            Document doc;

            foreach (var hit in hits.ScoreDocs)

            {

                doc = indexSearcher.Doc(hit.Doc);

                Console.WriteLine("Score :" + hit.Score + ", Id :" + doc.Get("Id") + ", Name :" + doc.Get("Name") + ", Description :" + doc.Get("Description"));

            }

        }

    }

}



List<Product> GetProductsInformation()

{

    return new List<Product> {

            new Product{ Id = 1, Name = "蘋果", Description = "一天一蘋果，醫生遠離我。"},

            new Product{ Id = 2, Name = "橘子", Description = "醫生給娜美最珍貴的寶藏。"},

            new Product{ Id = 3, Name = "梨子", Description = "我是梨子，比蘋果好吃多囉!"},

            new Product{ Id = 4, Name = "葡萄", Description = "吃葡萄不吐葡萄皮，不吃葡萄倒吐葡萄皮"},

            new Product{ Id = 5, Name = "榴槤", Description = "水果界的珍寶!好吃一直吃。"}

        };

}

class Product

{

    public long Id { get; set; }

    public string Name { get; set; } = null!;

    public string Description { get; set; } = null!;

}

好囉! 接下來我們要如何更新索引呢?
更新其實就是將存在的索引刪除並重新建立Document，不存在的則直接新增。
首先準備一組資料準備更新

List<Product> GetUpdateProductsInformation()

{

    return new List<Product>

    {

        new Product{ Id = 6, Name = "香蕉", Description = "運動完後吃根香蕉補充養分。"},

        new Product{ Id = 2, Name = "橘子", Description = "橘子跟柳丁你分得出來嗎?"}

    };

}

*欲更新的Document必須與創建所引時使用的Document欄位相同*

void Update(string key, List<Product> information, DirectoryInfo dir, StandardAnalyzer analyzer)

{

    using( var directory = FSDirectory.Open(dir))

    {

        using(var indexWriter = new IndexWriter(directory, analyzer, false, IndexWriter.MaxFieldLength.LIMITED))

        {

            foreach (var index in information)

            {

                var document = new Document();

                document.Add(new Field("Id", index.Id.ToString(), Field.Store.YES, Field.Index.NO));

                document.Add(new Field("Name", index.Name, Field.Store.YES, Field.Index.ANALYZED));

                document.Add(new Field("Description", index.Description, Field.Store.YES, Field.Index.ANALYZED));

                indexWriter.UpdateDocument(new Term("Name", key) ,document);

            }

        }

    }

}

來測試看看

可以看見 Name = 橘子的索引已經改為我們新準備的資料囉。
再來是刪除!
與更新非常相似，只需要使用deleteDocument()就可以了。

void Delete(string key, DirectoryInfo dir, StandardAnalyzer analyzer)

{

    using (var directory = FSDirectory.Open(dir))

    {

        using (var indexWriter = new IndexWriter(directory, analyzer, false, IndexWriter.MaxFieldLength.LIMITED))

        {

            indexWriter.DeleteDocuments(new Term("Name", key));

            indexWriter.Optimize();

            indexWriter.Commit();

        }

    }

}

再來看看輸出結果
可以發現 Score :0.7554128, Id :2, Name :橘子, Description :醫生給娜美最珍貴的寶藏。這筆索引已經被移除囉!

可以發現筆者於更新或刪除時都是輸入單一字來做異動，除了表達可以對索引做複合更動外，
是因為更新與刪除索引同樣會使用到分詞器(analyzer)，
*所輸入的索引值非ID等數值時必須要配合分詞器的分詞能力*才能取得所想異動的索引喔!

Boost是什麼呢?
Boost 分為 :
1. Index Time Boost : 在建立索引時就計算好的值。例如上一篇中提到的(NORMS)
2. Query Time Boost : 查詢時賦與搜尋條件不同的值以影響結果。
我們先來測試Index Time Boost的部分

void CreateIndexWithBoost(List<Product> information, DirectoryInfo dir, StandardAnalyzer analyzer)

{

    using (var directory = FSDirectory.Open(dir))

    {

        using (var indexWriter = new IndexWriter(directory, analyzer, true, IndexWriter.MaxFieldLength.LIMITED))

        {

            foreach (var index in information)

            {

                var document = new Document();

                document.Add(new Field("Id", index.Id.ToString(), Field.Store.YES, Field.Index.NO));

                document.Add(new Field("Name", index.Name, Field.Store.YES, Field.Index.ANALYZED));

                document.Add(new Field("Description", index.Description, Field.Store.YES, Field.Index.ANALYZED));

                document.GetField("Name").Boost = 1.5F;

                document.GetField("Description").Boost = 0.5F;



                indexWriter.AddDocument(document);

            }

            indexWriter.Optimize();

            indexWriter.Commit();

        }

    }

}

並記得重新CreateIndex才能刷新欄位的權重值喔。

很明顯的搜尋出來的Score分數變動了! 但是有沒有發現明明Name欄位的Boost改成了1.5，蘋果的數值卻仍然只有一半呢?
這是因為我們的Search中所參照的欄位為Description，所以在計算Score的時候其實是完全沒有參與的喔!
另外要記得，使用Index Time Boost的時候，欲給予銓重分配的欄位Field.Index不能使用NO_NORMS，不然這個欄位並不會紀錄權重的資料。

再來我們試試看Query Time Boost

void SearchWithBoost(string searchValue, DirectoryInfo dir, StandardAnalyzer analyzer)

{

    using (var directory = FSDirectory.Open(_dir))

    {

        using (var indexSearcher = new IndexSearcher(directory))

        {

            var query = new QueryParser(Lucene.Net.Util.Version.LUCENE_CURRENT, "Name", analyzer).Parse(searchValue);

            var query2 = new QueryParser(Lucene.Net.Util.Version.LUCENE_CURRENT, "Description", analyzer).Parse(searchValue);



            query.Boost = 2.0F;

            query2.Boost = 0.5F;



            BooleanQuery booleanQuery = new BooleanQuery();

            booleanQuery.Add(query, Occur.SHOULD);

            booleanQuery.Add(query2, Occur.SHOULD);



            var hits = indexSearcher.Search(booleanQuery, 20);

            if (!hits.ScoreDocs.Any())

            {

                Console.WriteLine("查無相關結果。");

                return;

            }

            Document doc;

            foreach (var hit in hits.ScoreDocs)

            {

                doc = indexSearcher.Doc(hit.Doc);

                Console.WriteLine("Score :" + hit.Score + ", Id :" + doc.Get("Id") + ", Name :" + doc.Get("Name") + ", Description :" + doc.Get("Description"));

            }

        }

    }

}

這次我們搜尋兩個欄位"Name"與"Description"，並使用 BooleanQuery來將其組合。
BooleanQuery中的 Occur有三種參數 : "MUST","MUST_NOT","SHOULD"，功能與字面上的意思一樣為"必須要有","必須沒有"與"有無都包含"。

查詢出來的分數就不一樣囉!

以上就是這一次的分享，Lucene是一款容易入門但是要實際上戰場卻又十分複雜的功能，想要達成真正高效能的全文檢索，在前期的文件規畫配置與資料的權重配比都是一個巨大的挑戰。未來會繼續分享關於Lucene的其他有趣功能，還請繼續期待呦!
另外也可以到GitHub下載我的範例來參考呦!
GitHub: https://github.com/g13579112000/Lucene

參考文件:
1. 黑暗大大的全文檢索筆記 : https://blog.darkthread.net/blog/lucene-net-notes-1/
2. Makble : http://makble.com/lucene-field-boost-example
3. CSDN Jack2013tong 文章 : https://blog.csdn.net/huwei2003/article/details/53408388

More...

梨子, 2022/4/20 下午 09:34:03

使用Lucene.Net達成全文檢索！基礎解說(一)

Lucene.Net是一套C#開源全文索引庫，其主要包含了:
· Index : 提供索引的管理與詞組的排序
· Search : 提供查詢相關功能
· Store : 支援資料儲存管理，包括I/O操作
· Util : 共用套件
· Documents : 負責描述索引儲存時的文件結構管理
· QueryParsers : 提供查詢語法
· Analysis : 負責分析內容
要達到高效能的全文檢索讓機器可以明白我們的語言，最重要的關鍵就是"分詞器"了。
試想一下這一句話你會如何拆分成一段一段的關鍵字呢?
"一天一蘋果，醫生遠離我"
還有英文版本
"An apple a day, doctor keep me away."
中文版本的拆分:
"一天"、"一"、"蘋果"、"醫生"、"遠離"、"我"
英文版本的拆分:
"apple"、"day"、"doctor"、"keep"、"me"、"away"
有沒有注意到不同語系所分析出來的關鍵字有一點不一樣呢?
而在Lucene中分詞的工作會交給Analysis來完成，
不過我們可以依照不同的語系去選擇想使用的分詞器(Analyzer)！

首先簡單說明一下Lucene的實作流程
1. 確認主要搜尋的語系來決定使用的分詞器(analyzer)
2. 建立Document依照analyzer匯入資料
(前置完成)
3. 建立IndexSearcher導入準備好的Document
4. 建立Parser來分析SearchValue
5. 使用IndexSearcher分析Parser取得結果(Hits)
*本專案使用的是Lucene.Net 3.0.3*
接下來我們來建立一個提供查詢使用的Document。

 // 取得或建立Lucene文件資料夾

        if (!File.Exists(_dir.FullName))

        {

            System.IO.Directory.CreateDirectory(_dir.FullName);

        }

        // Asp.Net Core需要於Nuget安裝System.Configuration.ConfigurationManager提供用戶端應用程式的組態檔存取

        Lucene.Net.Store.Directory directory = FSDirectory.Open(_dir);

        // 選擇分詞器

        var analyzer = new StandardAnalyzer(Lucene.Net.Util.Version.LUCENE_CURRENT);

        // 資料來源

        var repository = new Repository();

        // 依照指定的文件結構來建立

        var indexWriter = new IndexWriter(directory, analyzer, true, IndexWriter.MaxFieldLength.LIMITED);

        foreach (var index in repository)

        {

            var document = new Document();

            document.Add(new Field("Id", index.Id.ToString(), Field.Store.YES, Field.Index.NO));

            document.Add(new Field("Name", index.Name, Field.Store.YES, Field.Index.ANALYZED));

            document.Add(new Field("Description", index.Description, Field.Store.NO, Field.Index.ANALYZED));

            indexWriter.AddDocument(document);

        }

        indexWriter.Optimize();

        indexWriter.Commit();

        indexWriter.Dispose();

如此一來我們就建立好Lucene的基本配備囉！
其中analyzer的部分我們使用Lucene.Net預設，
要特別注意的是，其處理中文語系的能力非常之爛！
之後再寫一篇文章深入探討。
再來值得一提的是

document.Add(new Field("Id", index.Id.ToString(), Field.Store.YES, Field.Index.NO));

前兩個參數就是Key跟Value，可以簡單理解為欄位與其內容。
後面兩個參數是重點！
Store: 代表是否儲存這個Key的Value
例如在google打上台南美食會搜索出許多不同的文章連結，
不過google給你的資料中最重要的不是文章內容(Description)，
而是哪一篇文章(Name)與台南美食最有關係。
假如今天我只要回傳一個列表而不用提示文章中有哪些內容，
那麼我就可以選擇給"Description" Field.Store.No來節省空間。
Index:
· NO - 不加入索引，這個內容只需要隨著結果出爐，不需要在查詢的時候被考慮。
· ANALYZED、NOT_ANALYZED - 是否使用分詞
· NO_NORMS - 關閉權重功能
或許許多人會對權重功能(NORMS)感到疑惑，
簡單的舉個例子
{ Id=1, Key="蘋果", Value="一天一蘋果，醫生遠離我。"}
{ Id=2, Key="橘子", Value="醫生給娜美最珍貴的寶藏。"}
{ Id=3, Key="梨子", Value="我是梨子，比蘋果蘋果好吃多囉！"}
當我搜尋"蘋果"的時候結果會是
{ Id=1, MatchKey=1, MatchValue=1, Score=(1*5) + (1*2) = 7}
{ Id=3, MatchKey=0, MatchValue=1, Score=(0*5) + (2*2) = 4}
有發現了嗎？
雖然同樣都對中兩個結果但是Id 1的資料Key值中有包含關鍵字，
因此得到較高的分數排在Id 3前方
準備好Document了，我們可以開始來實際使用看看囉！

// 決定所要搜索的欄位

        var parser = new QueryParser(Lucene.Net.Util.Version.LUCENE_CURRENT, "Description", analyzer).Parse(searchValue);

        // 提供剛剛建立的Document

        var indexSearcher = new IndexSearcher(directory);

        // 搜尋取出結果的數量

        var queryLimit = 20;

        // 開始搜尋！

        var hits = indexSearcher.Search(parser, queryLimit);

        if (!hits.ScoreDocs.Any())

        {

            Console.WriteLine("查無相關結果。");

            return;

        }

        Document doc;

        foreach (var hit in hits.ScoreDocs)

        {

            doc = indexSearcher.Doc(hit.Doc);

            Console.WriteLine("Score :" + hit.Score + ", Id :" + doc.Get("Id") + ", Name :" + doc.Get("Name") + ", Description :" + doc.Get("Description"));

        }

最後的結果(Hits)，是需要再回到Document去撈出對應的資料喔！
是不是非常簡單呢？
筆者寫了一個簡單的範例在GitHub上，秉持著追求新技術的心使用了.Net 6，還請各位大大多多包涵。
有中英文兩種Repository，只需要在上方的DI注入切換就可以囉！
GitHub連結: https://github.com/g13579112000/Lucene
筆者第一次撰寫這種教學文章，有哪邊錯誤的非常歡迎一起來討論指教。
之後有機會再撰寫Lucene更深入的應用方面，
例如權重的分配與分詞器的選擇與使用。
感謝您的閱讀。

參考文獻：
1.黑暗大大的全文檢索筆記: https://blog.darkthread.net/blog/lucene-net-notes-1/
2.使用.Net實現全文檢索: https://blog.csdn.net/huwei2003/article/details/53408388
3.伊凡的部落格: http://irfen.me/5-lucene4-9-learning-record-lucene-analysis-tokenizer/
4.純淨天空代碼範例: https://vimsky.com/zh-tw/examples/detail/csharp-ex-Lucene.Net.Documents-Document---class.html

More...

梨子, 2022/2/24 下午 08:23:46