背景描述:
系统内部有医院,案例,医生,特卖产品的实体,需要给这些实体打标签,即在数据库中存一个标签字段,比如为一些医院,案例打上双眼皮的标签。打上的标签是供APP搜索使用的。目前的做法是让运营人员通过CMS系统手动给这些实体添加标签。但效率低下。怎样能够自动给这些实体打上标签,运营人员只需要配置标签即可。打标签的规则可以是匹配如医院介绍的文字,医院名称等字符串匹配。但存在比如需要给双眼皮的案例打上如杨庆峰(一个做双眼皮非常厉害的医生)的标签,这就麻烦了。目前这几种实体的数据记录总数在8000左右,为了让搜索时候各种记录有大致相同的被搜索到的机会,仅靠运营人员给部分记录手动打标签就不太合适了,会导致大部分记录无法被搜索出来。
技术咨询:
使用怎样的思想处理这样的问题,使用怎样的技术进行实际操作?
You can use text classification for tagging
Use a word segmentation algorithm to segment the content collection and then extract the high-frequency words and certain specified words as labels for the text
In my personal opinion, it is best to have a tag library, and use the text in the tag library to match the hospital introduction text, hospital name, etc. you mentioned. You can use regular rules to achieve this. If you want to match double eyelids to a person's name, you can only match it yourself. Define a matching rule.
Tell me some of your own opinions. I don’t understand JAVA
What you need is word segmentation, and segment it based on the useful information you can get. I have had a rough understanding of the natural language processing related libraries of python before, which should be able to solve the original poster’s problem.
I’m not sure what the problem is with the trouble you mentioned. Is it because you can’t get the doctor information corresponding to this case, or is there something wrong with your data structure?
Finally, let’s talk about the source of the vocabulary here. In addition to the word segmentation based on existing information mentioned above, it can also be industry-related search terms in search engines, on-site search terms, and related search terms obtained from competitors; In fact, as long as 80% of the words (words with large search volume) are solved, the user experience will rise to a higher level