UWInfo Blog

搜尋

搜尋意見

最新回應

標籤

搜尋結果：

改寫成可輸入多參數，效能也比較好的版本。
以下為測試碼，請自行依照專案需求做修改。

var root = "C://wdqd/qwewq";

var addPath = @"//\\/fwef/qwf";

var addPath2 = @"5fwfef/qwf";

var addPath3 = @"//fwef/qwf";

var addPath4 = @"\\\fwef/qwf";

var addPath5 = @"\\\\\/fwef/qwf";





var result = root.AddPath(addPath, addPath2, addPath3, addPath4, addPath5);



Console.WriteLine(result);



public static class Helper

{

    public static string AddPath(this string value, params string[] addPaths)

    {

        if (string.IsNullOrEmpty(value))

        {

            throw new Exception("起始目錄不可以為空字串");

        }



        if (value.Contains("..") || addPaths.Any(x => x.Contains("..")))

        {

            throw new Exception($"value: {value}, addPaths: {addPaths.Where(x => x.Contains("..")).ToOneString()} 檔名與路徑不可包含 ..");

        }



        var paths = addPaths.Select(x => x.Substring(x.FindLastContinuousCharPosition('/', '\\') + 1).SafeFilename()).ToList();



        if (paths.Any(x => System.IO.Path.IsPathRooted(x)))

        {

            throw new Exception("不可併入完整路徑 ..");

        }



        paths.Insert(0, value.SafeFilename());



        return System.IO.Path.Combine(paths.ToArray());

    }



    public static string ToOneString<T>(this IEnumerable<T> list, string separator = ",")

    {

        var strList = list.Select(x => x.ToString());

        return string.Join(separator, strList);

    }



    public static int FindLastContinuousCharPosition(this string input, params char[] targets)

    {

        int lastPosition = -1;



        for (int i = 0; i < input.Length; i++)

        {

            if (targets.Contains(input[i]))

            {

                lastPosition = i;

            }

            else

            {

                break;

            }

        }



        return lastPosition;

    }



    public static string SafeFilename(this string value)

    {

        return GetValidFilename(value);

    }



    public static string GetValidFilename(string value)

    {

        string ValidFilenameCharacters = @"0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ\-_$.@:/# ";

        if (value.Contains(".."))

        {

            throw new Exception("路徑中不可包含 .. ");

        }



        string newUrl = "";

        for (int i = 0; i < value.Length; i++)

        {

            var c = value.Substring(i, 1);

            int k = ValidFilenameCharacters.IndexOf(c);

            if (k < 0)

            {

                throw new Exception($"檔名 '{value}' 中有非法的字元 '" + c + "'。");

            }

            newUrl += ValidFilenameCharacters.Substring(k, 1);

        }



        return newUrl;

    }

}

More...

梨子, 2023/8/28 上午 09:43:49

使用Parallel來平行處理迭代!!!

發現了一個很新奇的東西 Parallel
他可以做到平行處理迭代這件事
只要迴圈邏輯上不需要有先後順序的情況就可以使用來大幅度提高效能!

輸出值

需要特別注意如果於非同步中使用lock應使用SemaphoreSlim
如果要使用Dictionay應使用ConcurrentDictionary

More...

梨子, 2023/4/11 下午 12:49:47

使用SemaphoreSlim來對非同步情況下使用Lock

由於在使用Await時，系統的Thread並不保證會與該function使用同一個Thread。
因此使用lock將單一Thread鎖住會導致出現 'Cannot await in the body of a lock statement'的錯誤

將Lock物件改成使用SemaphoreSlim，於function起始與結束加入WaitAsync與Release即可達到Lock的效果。

SemaphoreSlim完整介紹 : 用 SemaphoreSlim 來做 async/await 的鎖定
使用如下圖↓

More...

梨子, 2023/3/23 下午 03:00:41

Swashbuckle XML檔設定顯示文件註解

在註冊Swagger時如圖中
1. 將各自action名稱代用不同編碼，避免名稱相撞導致swagger不能正確辨識。
2. 將專案XML輸出改為專案名稱($(AssemblyName))供註冊時指定反射用
↓ (對專案點擊右鍵 => 屬性 => 生成(Build) => 輸出(Output) => XML 文件名稱(XML documentation file path))

More...

梨子, 2023/3/23 上午 10:20:46

C# 元組與分解 (Tuple And Deconstruct)

*分解元組(Deconstructing tuples)的支援必須要至少C# 7.0以上*

以前一個funtion如果需要回傳複數個參數你會怎麼做呢?

1.可能會定義一個物件，在回傳的時候宣告並將值塞進去return。(為了一個function而去定義，好醜...)
2.又或者是將回傳參數型別直接改成dynamic，回傳時直接使用匿名物件。(performance炸裂!!!)

如果你使用了上面兩種方式...請一定!一定!!一定!!!要把這篇文章看完。

先來說說在C#7.0以前的處理方法，在呼叫function之前先將欲接收的參數宣告好並傳入。

但是這樣做有幾個缺點，
1. 當傳入的參數與傳出的參數一多起來，function定義時的參數會又臭又長。
2. 使用這個function前每次都需要宣告變數，不累嗎?
3. 不管你要不要使用這個參數，你都一定要傳入，因為ref與out不能有預設值。

接下來...讓我與各位介紹 " 元組元素 "!!! (Tuple)

早上好台灣，現在我有Tuple。我很喜歡Tuple，但是速度與激情9比...咳咳...沒事。

先來個簡單的範例

定義一個funciton JohnCena

然後我們就可以很漂亮的一次性取得JohnCena的姓名、出生日期、國籍、年齡等等資料!

假如你的參數需要在外面先行宣告也沒問題!

而從C#10開始，甚至可以混合使用也沒關係!

這樣我們就輕鬆解決了以前的兩個缺點了，而要解決最後一個缺點更是簡單，
只需要將你想要丟棄(discard)或是用不到的那個參數使用UnderLine( _ )就可以囉!

元組元素的基礎講完了再來說說解構元素的部分
建構式想必大家都已經很熟悉了

但是'分解式'你有聽過嗎?
顧名思義就是可以定義物件被指派出去的參數!
驚不驚喜! 意不意外?

使用的方法就是在class底下新增一個Deconstruct function

如此一來我們就可以來試試看將John Cena解ㄊ一.... 分解!

建構式可以多載，分解式當然也可以!

以上就是這次的內容，有沒有覺得Coding時的可玩性又更好了呢?
不過要提醒一點，目前解構式並無法使用查看定義來移置相對應的位置，
對於可讀性會有不小的影響，但如果你想要把某個東西藏起來不讓你同事找到... (請不要這樣做!)
這次的範例也歡迎到我的GitHub參考囉!

06/09/2022 由Bike 補充

可以將function的回傳參數先行命名!!
直接當成物件使用，對開發效率與可讀性有著明顯的提升，相信會是未來的趨勢。

另外要道歉並修正之前將Deconstruct翻譯成解構式的錯誤
Destruct (解構式)指的是class生命周期結束前執行的function
使用方式是class名稱為dunction名稱並在前面加上波浪符號 '~'
Deconstruct的翻譯應該為分解式

More...

梨子, 2022/5/26 下午 07:58:00

使用Lucene.Net達成全文檢索！基礎解說(二)

上一集當中我們完成了Lucene基本操作中的Create與Read，這一集會將CRUD中的Update與Delete的操作方法告訴你，並且本集會著重於講解關於"Norms"與權重(Boost)在Lucene中的運作概念。

首先我們建立一個.Net 6的主控台應用程式

建立好後於右側專案右鍵選擇"管理Nuget套件"，並選擇"瀏覽">於搜索列中搜尋"Lucene">安裝3.0.3最新穩定版與 "System.Configuration.ConfigurationManager"

安裝好後就可以於專案內使用Lucene套件囉!
再來依照上一篇的教學建立一套簡單的Lucene查詢

using Lucene.Net.Analysis.Standard;

using Lucene.Net.Documents;

using Lucene.Net.Index;

using Lucene.Net.QueryParsers;

using Lucene.Net.Search;

using Lucene.Net.Store;



var _dir = new DirectoryInfo("LuceneDocument");

if (!File.Exists(_dir.FullName))

{

    System.IO.Directory.CreateDirectory(_dir.FullName);

}

var analyzer = new StandardAnalyzer(Lucene.Net.Util.Version.LUCENE_CURRENT);

CreateIndex(GetProductsInformation(), _dir, analyzer);



while (true)

{

    Console.Write("請輸入欲查詢字串 :");

    var searchValue = Console.ReadLine();

    Search(searchValue, _dir, analyzer);

}



void CreateIndex(List<Product> information, DirectoryInfo dir, StandardAnalyzer analyzer)

{

    using (var directory = FSDirectory.Open(dir))

    {

        using (var indexWriter = new IndexWriter(directory, analyzer, true, IndexWriter.MaxFieldLength.LIMITED))

        {

            foreach (var index in information)

            {

                var document = new Document();

                document.Add(new Field("Id", index.Id.ToString(), Field.Store.YES, Field.Index.NO));

                document.Add(new Field("Name", index.Name, Field.Store.YES, Field.Index.ANALYZED));

                document.Add(new Field("Description", index.Description, Field.Store.YES, Field.Index.ANALYZED));

                indexWriter.AddDocument(document);

            }

            indexWriter.Optimize();

            indexWriter.Commit();

        }

    }

}

void Search(string searchValue, DirectoryInfo dir, StandardAnalyzer analyzer)

{

    using (var directory = FSDirectory.Open(_dir))

    {

        var parser = new QueryParser(Lucene.Net.Util.Version.LUCENE_CURRENT, "Description", analyzer).Parse(searchValue);

        using (var indexSearcher = new IndexSearcher(directory))

        {

            var queryLimit = 20;

            var hits = indexSearcher.Search(parser, queryLimit);

            if (!hits.ScoreDocs.Any())

            {

                Console.WriteLine("查無相關結果。");

                return;

            }

            Document doc;

            foreach (var hit in hits.ScoreDocs)

            {

                doc = indexSearcher.Doc(hit.Doc);

                Console.WriteLine("Score :" + hit.Score + ", Id :" + doc.Get("Id") + ", Name :" + doc.Get("Name") + ", Description :" + doc.Get("Description"));

            }

        }

    }

}



List<Product> GetProductsInformation()

{

    return new List<Product> {

            new Product{ Id = 1, Name = "蘋果", Description = "一天一蘋果，醫生遠離我。"},

            new Product{ Id = 2, Name = "橘子", Description = "醫生給娜美最珍貴的寶藏。"},

            new Product{ Id = 3, Name = "梨子", Description = "我是梨子，比蘋果好吃多囉!"},

            new Product{ Id = 4, Name = "葡萄", Description = "吃葡萄不吐葡萄皮，不吃葡萄倒吐葡萄皮"},

            new Product{ Id = 5, Name = "榴槤", Description = "水果界的珍寶!好吃一直吃。"}

        };

}

class Product

{

    public long Id { get; set; }

    public string Name { get; set; } = null!;

    public string Description { get; set; } = null!;

}

好囉! 接下來我們要如何更新索引呢?
更新其實就是將存在的索引刪除並重新建立Document，不存在的則直接新增。
首先準備一組資料準備更新

List<Product> GetUpdateProductsInformation()

{

    return new List<Product>

    {

        new Product{ Id = 6, Name = "香蕉", Description = "運動完後吃根香蕉補充養分。"},

        new Product{ Id = 2, Name = "橘子", Description = "橘子跟柳丁你分得出來嗎?"}

    };

}

*欲更新的Document必須與創建所引時使用的Document欄位相同*

void Update(string key, List<Product> information, DirectoryInfo dir, StandardAnalyzer analyzer)

{

    using( var directory = FSDirectory.Open(dir))

    {

        using(var indexWriter = new IndexWriter(directory, analyzer, false, IndexWriter.MaxFieldLength.LIMITED))

        {

            foreach (var index in information)

            {

                var document = new Document();

                document.Add(new Field("Id", index.Id.ToString(), Field.Store.YES, Field.Index.NO));

                document.Add(new Field("Name", index.Name, Field.Store.YES, Field.Index.ANALYZED));

                document.Add(new Field("Description", index.Description, Field.Store.YES, Field.Index.ANALYZED));

                indexWriter.UpdateDocument(new Term("Name", key) ,document);

            }

        }

    }

}

來測試看看

可以看見 Name = 橘子的索引已經改為我們新準備的資料囉。
再來是刪除!
與更新非常相似，只需要使用deleteDocument()就可以了。

void Delete(string key, DirectoryInfo dir, StandardAnalyzer analyzer)

{

    using (var directory = FSDirectory.Open(dir))

    {

        using (var indexWriter = new IndexWriter(directory, analyzer, false, IndexWriter.MaxFieldLength.LIMITED))

        {

            indexWriter.DeleteDocuments(new Term("Name", key));

            indexWriter.Optimize();

            indexWriter.Commit();

        }

    }

}

再來看看輸出結果
可以發現 Score :0.7554128, Id :2, Name :橘子, Description :醫生給娜美最珍貴的寶藏。這筆索引已經被移除囉!

可以發現筆者於更新或刪除時都是輸入單一字來做異動，除了表達可以對索引做複合更動外，
是因為更新與刪除索引同樣會使用到分詞器(analyzer)，
*所輸入的索引值非ID等數值時必須要配合分詞器的分詞能力*才能取得所想異動的索引喔!

Boost是什麼呢?
Boost 分為 :
1. Index Time Boost : 在建立索引時就計算好的值。例如上一篇中提到的(NORMS)
2. Query Time Boost : 查詢時賦與搜尋條件不同的值以影響結果。
我們先來測試Index Time Boost的部分

void CreateIndexWithBoost(List<Product> information, DirectoryInfo dir, StandardAnalyzer analyzer)

{

    using (var directory = FSDirectory.Open(dir))

    {

        using (var indexWriter = new IndexWriter(directory, analyzer, true, IndexWriter.MaxFieldLength.LIMITED))

        {

            foreach (var index in information)

            {

                var document = new Document();

                document.Add(new Field("Id", index.Id.ToString(), Field.Store.YES, Field.Index.NO));

                document.Add(new Field("Name", index.Name, Field.Store.YES, Field.Index.ANALYZED));

                document.Add(new Field("Description", index.Description, Field.Store.YES, Field.Index.ANALYZED));

                document.GetField("Name").Boost = 1.5F;

                document.GetField("Description").Boost = 0.5F;



                indexWriter.AddDocument(document);

            }

            indexWriter.Optimize();

            indexWriter.Commit();

        }

    }

}

並記得重新CreateIndex才能刷新欄位的權重值喔。

很明顯的搜尋出來的Score分數變動了! 但是有沒有發現明明Name欄位的Boost改成了1.5，蘋果的數值卻仍然只有一半呢?
這是因為我們的Search中所參照的欄位為Description，所以在計算Score的時候其實是完全沒有參與的喔!
另外要記得，使用Index Time Boost的時候，欲給予銓重分配的欄位Field.Index不能使用NO_NORMS，不然這個欄位並不會紀錄權重的資料。

再來我們試試看Query Time Boost

void SearchWithBoost(string searchValue, DirectoryInfo dir, StandardAnalyzer analyzer)

{

    using (var directory = FSDirectory.Open(_dir))

    {

        using (var indexSearcher = new IndexSearcher(directory))

        {

            var query = new QueryParser(Lucene.Net.Util.Version.LUCENE_CURRENT, "Name", analyzer).Parse(searchValue);

            var query2 = new QueryParser(Lucene.Net.Util.Version.LUCENE_CURRENT, "Description", analyzer).Parse(searchValue);



            query.Boost = 2.0F;

            query2.Boost = 0.5F;



            BooleanQuery booleanQuery = new BooleanQuery();

            booleanQuery.Add(query, Occur.SHOULD);

            booleanQuery.Add(query2, Occur.SHOULD);



            var hits = indexSearcher.Search(booleanQuery, 20);

            if (!hits.ScoreDocs.Any())

            {

                Console.WriteLine("查無相關結果。");

                return;

            }

            Document doc;

            foreach (var hit in hits.ScoreDocs)

            {

                doc = indexSearcher.Doc(hit.Doc);

                Console.WriteLine("Score :" + hit.Score + ", Id :" + doc.Get("Id") + ", Name :" + doc.Get("Name") + ", Description :" + doc.Get("Description"));

            }

        }

    }

}

這次我們搜尋兩個欄位"Name"與"Description"，並使用 BooleanQuery來將其組合。
BooleanQuery中的 Occur有三種參數 : "MUST","MUST_NOT","SHOULD"，功能與字面上的意思一樣為"必須要有","必須沒有"與"有無都包含"。

查詢出來的分數就不一樣囉!

以上就是這一次的分享，Lucene是一款容易入門但是要實際上戰場卻又十分複雜的功能，想要達成真正高效能的全文檢索，在前期的文件規畫配置與資料的權重配比都是一個巨大的挑戰。未來會繼續分享關於Lucene的其他有趣功能，還請繼續期待呦!
另外也可以到GitHub下載我的範例來參考呦!
GitHub: https://github.com/g13579112000/Lucene

參考文件:
1. 黑暗大大的全文檢索筆記 : https://blog.darkthread.net/blog/lucene-net-notes-1/
2. Makble : http://makble.com/lucene-field-boost-example
3. CSDN Jack2013tong 文章 : https://blog.csdn.net/huwei2003/article/details/53408388

More...

梨子, 2022/4/20 下午 09:34:03

使用Lucene.Net達成全文檢索！基礎解說(一)

Lucene.Net是一套C#開源全文索引庫，其主要包含了:
· Index : 提供索引的管理與詞組的排序
· Search : 提供查詢相關功能
· Store : 支援資料儲存管理，包括I/O操作
· Util : 共用套件
· Documents : 負責描述索引儲存時的文件結構管理
· QueryParsers : 提供查詢語法
· Analysis : 負責分析內容
要達到高效能的全文檢索讓機器可以明白我們的語言，最重要的關鍵就是"分詞器"了。
試想一下這一句話你會如何拆分成一段一段的關鍵字呢?
"一天一蘋果，醫生遠離我"
還有英文版本
"An apple a day, doctor keep me away."
中文版本的拆分:
"一天"、"一"、"蘋果"、"醫生"、"遠離"、"我"
英文版本的拆分:
"apple"、"day"、"doctor"、"keep"、"me"、"away"
有沒有注意到不同語系所分析出來的關鍵字有一點不一樣呢?
而在Lucene中分詞的工作會交給Analysis來完成，
不過我們可以依照不同的語系去選擇想使用的分詞器(Analyzer)！

首先簡單說明一下Lucene的實作流程
1. 確認主要搜尋的語系來決定使用的分詞器(analyzer)
2. 建立Document依照analyzer匯入資料
(前置完成)
3. 建立IndexSearcher導入準備好的Document
4. 建立Parser來分析SearchValue
5. 使用IndexSearcher分析Parser取得結果(Hits)
*本專案使用的是Lucene.Net 3.0.3*
接下來我們來建立一個提供查詢使用的Document。

 // 取得或建立Lucene文件資料夾

        if (!File.Exists(_dir.FullName))

        {

            System.IO.Directory.CreateDirectory(_dir.FullName);

        }

        // Asp.Net Core需要於Nuget安裝System.Configuration.ConfigurationManager提供用戶端應用程式的組態檔存取

        Lucene.Net.Store.Directory directory = FSDirectory.Open(_dir);

        // 選擇分詞器

        var analyzer = new StandardAnalyzer(Lucene.Net.Util.Version.LUCENE_CURRENT);

        // 資料來源

        var repository = new Repository();

        // 依照指定的文件結構來建立

        var indexWriter = new IndexWriter(directory, analyzer, true, IndexWriter.MaxFieldLength.LIMITED);

        foreach (var index in repository)

        {

            var document = new Document();

            document.Add(new Field("Id", index.Id.ToString(), Field.Store.YES, Field.Index.NO));

            document.Add(new Field("Name", index.Name, Field.Store.YES, Field.Index.ANALYZED));

            document.Add(new Field("Description", index.Description, Field.Store.NO, Field.Index.ANALYZED));

            indexWriter.AddDocument(document);

        }

        indexWriter.Optimize();

        indexWriter.Commit();

        indexWriter.Dispose();

如此一來我們就建立好Lucene的基本配備囉！
其中analyzer的部分我們使用Lucene.Net預設，
要特別注意的是，其處理中文語系的能力非常之爛！
之後再寫一篇文章深入探討。
再來值得一提的是

document.Add(new Field("Id", index.Id.ToString(), Field.Store.YES, Field.Index.NO));

前兩個參數就是Key跟Value，可以簡單理解為欄位與其內容。
後面兩個參數是重點！
Store: 代表是否儲存這個Key的Value
例如在google打上台南美食會搜索出許多不同的文章連結，
不過google給你的資料中最重要的不是文章內容(Description)，
而是哪一篇文章(Name)與台南美食最有關係。
假如今天我只要回傳一個列表而不用提示文章中有哪些內容，
那麼我就可以選擇給"Description" Field.Store.No來節省空間。
Index:
· NO - 不加入索引，這個內容只需要隨著結果出爐，不需要在查詢的時候被考慮。
· ANALYZED、NOT_ANALYZED - 是否使用分詞
· NO_NORMS - 關閉權重功能
或許許多人會對權重功能(NORMS)感到疑惑，
簡單的舉個例子
{ Id=1, Key="蘋果", Value="一天一蘋果，醫生遠離我。"}
{ Id=2, Key="橘子", Value="醫生給娜美最珍貴的寶藏。"}
{ Id=3, Key="梨子", Value="我是梨子，比蘋果蘋果好吃多囉！"}
當我搜尋"蘋果"的時候結果會是
{ Id=1, MatchKey=1, MatchValue=1, Score=(1*5) + (1*2) = 7}
{ Id=3, MatchKey=0, MatchValue=1, Score=(0*5) + (2*2) = 4}
有發現了嗎？
雖然同樣都對中兩個結果但是Id 1的資料Key值中有包含關鍵字，
因此得到較高的分數排在Id 3前方
準備好Document了，我們可以開始來實際使用看看囉！

// 決定所要搜索的欄位

        var parser = new QueryParser(Lucene.Net.Util.Version.LUCENE_CURRENT, "Description", analyzer).Parse(searchValue);

        // 提供剛剛建立的Document

        var indexSearcher = new IndexSearcher(directory);

        // 搜尋取出結果的數量

        var queryLimit = 20;

        // 開始搜尋！

        var hits = indexSearcher.Search(parser, queryLimit);

        if (!hits.ScoreDocs.Any())

        {

            Console.WriteLine("查無相關結果。");

            return;

        }

        Document doc;

        foreach (var hit in hits.ScoreDocs)

        {

            doc = indexSearcher.Doc(hit.Doc);

            Console.WriteLine("Score :" + hit.Score + ", Id :" + doc.Get("Id") + ", Name :" + doc.Get("Name") + ", Description :" + doc.Get("Description"));

        }

最後的結果(Hits)，是需要再回到Document去撈出對應的資料喔！
是不是非常簡單呢？
筆者寫了一個簡單的範例在GitHub上，秉持著追求新技術的心使用了.Net 6，還請各位大大多多包涵。
有中英文兩種Repository，只需要在上方的DI注入切換就可以囉！
GitHub連結: https://github.com/g13579112000/Lucene
筆者第一次撰寫這種教學文章，有哪邊錯誤的非常歡迎一起來討論指教。
之後有機會再撰寫Lucene更深入的應用方面，
例如權重的分配與分詞器的選擇與使用。
感謝您的閱讀。

參考文獻：
1.黑暗大大的全文檢索筆記: https://blog.darkthread.net/blog/lucene-net-notes-1/
2.使用.Net實現全文檢索: https://blog.csdn.net/huwei2003/article/details/53408388
3.伊凡的部落格: http://irfen.me/5-lucene4-9-learning-record-lucene-analysis-tokenizer/
4.純淨天空代碼範例: https://vimsky.com/zh-tw/examples/detail/csharp-ex-Lucene.Net.Documents-Document---class.html

More...

梨子, 2022/2/24 下午 08:23:46