Table of Contents

利用pymongo操作mongoDB数据库

Home

Database

Mysql Tutorial

利用pymongo操作mongoDB数据库

Jun 07, 2016 pm 03:17 PM

mongodb p use operate database

利用pymongo操作mongoDB数据库 #连接数据库def get_db(): from pymongo import MongoClient client = MongoClient(localhost:27017) db = client.examples #examples here is the database name.it will be created if it does not exist. #如果 examples不

利用pymongo操作mongoDB数据库

#连接数据库

def get_db():

    from pymongo import MongoClient 
    
    client = MongoClient('localhost:27017')
    
    db = client.examples #'examples' here is the database name.it will be created if it does not exist.
    #如果 examples不存在，那么就会新建它
    return db
#插入操作
def add_city(db):

    db.cities.insert({'name':'Chicago'}) #inser 插入一个字典
#获取数据
def get_city(db):
    return db.cities.find_one()#从cities中返回任意一个数据
if __name__ == '__main__':
    db = get_db()
    add_city(db)
    print get_city(db)

Copy after login

上面只是操作mongoDB数据库的最简单的一个例子。

我们基于mongoDB的应用(APP)，pymongo模块，与mongoDB数据库，三者之间是什么关系呢？

我觉得可以表示为： APP pymongomongoDB
其中：BSON 为Binary Json
有了这个概念后，你就会理解为什么mongoDB是字典家族。

所以在mongoDB的操作中一定要建立一切皆为字典的基本认识。

步入正题，先说一下Query操作 query = {'manuafacturer':'Porsche'}

projection = {'_id':0,'name':1}#显示为1，不显示为0

db.myautos.find(query,projection)#查找制造商为保时捷的数据，但是不显示'_id',显示'name'

db.myautos.find(query,projection).count()#返回满足条件的数据的数量

Copy after login

从json文件导入数据库：

在terminal下：

$mongoimport -db dbname -c collectionname --file inputfile.json

Copy after login

比较操作符：

$gt $lt $lte $gte $ne 分别对应为：

大于(greater than) 小于(less than) 小于等于(less than equal) 大于等于(greater than equal) 不等于(not equal)

query = {'population':{'$gt':10000}}  #人口大于10000

query = {'population':{'$gt':10000, '$lte':20000}} #人口大于10000小于等于20000

query = {'name':{'$gt':'X', '$lte':'Z'}}#name 头字母介于X Z之间

from datetime import datetime

query = {'foundationDate':{'$gt':datetime(1840,1,1), '$lte':datetime(2049,10,1)}}
#介于1840,1,1日和2049,10,1 的时间

Copy after login

存在操作符$exist

query = {'governmentType':{'$exist':1}} #1表示存在
query = {'governmentType':{'$exist':0}} #0表示不存在

Copy after login

正则表达式操作符$regex

query = {'motto':{'$regex':'[Ff]riendship'}}

Copy after login

$in 与 $all

query = {'modelYears':{'$in':[1965,1967,1977,1987]}}#只要存在一个就可以
query = {'modelYears':{'$all':[1965,1967,1977,1987]}}#四个必须全部同时存在

Copy after login

如果数据结构为：

{'dimension':{'width':25,
              'height'：30，
              'length':89}
 ........
 }

Copy after login

Query 字典可以为：

query = {'dimension.width':25}

city = db.cities.find(query)

for ele in city:
    city['dimension'] = 66

#保存修改
db.cities.save(city)

Copy after login

update操作

db.cities.update({'name':'michael',
                  'country':'china'},#条件
                  {'$set':{'iso':1978}})#满足条件的条目中,有'iso'属性的,其值改为1978

db.cities.update({'name':'michael',
                  'country':'china'},#条件
                  {'$unset':{'iso':1978}}) #满足条件的条目中,有'iso'属性的,删除'iso'属性 
#多个修改
db.cities.update({'name':'michael',
                  'country':'china'},#条件
                  {'$set':{'iso':1978}}, multi = True)

Copy after login

aggregate操作

我们考虑如下的数据结构：

{
    "_id" : ObjectId("5304e2e3cc9e684aa98bef97"),
    "text" : "First week of school is over :P",
    "in_reply_to_status_id" : null,
    "retweet_count" : null,
    "contributors" : null,
    "created_at" : "Thu Sep 02 18:11:25 +0000 2010",
    "geo" : null,
    "source" : "web",
    "coordinates" : null,
    "in_reply_to_screen_name" : null,
    "truncated" : false,
    "entities" : {
        "user_mentions" : [ ],
        "urls" : [ ],
        "hashtags" : [ ]
    },
    "retweeted" : false,
    "place" : null,
    "user" : {
        "friends_count" : 145,
        "profile_sidebar_fill_color" : "E5507E",
        "location" : "Ireland :)",
        "verified" : false,
        "follow_request_sent" : null,
        "favourites_count" : 1,
        "profile_sidebar_border_color" : "CC3366",
        "profile_image_url" : "http://a1.twimg.com/profile_images/1107778717/phpkHoxzmAM_normal.jpg",
        "geo_enabled" : false,
        "created_at" : "Sun May 03 19:51:04 +0000 2009",
        "description" : "",
        "time_zone" : null,
        "url" : null,
        "screen_name" : "Catherinemull",
        "notifications" : null,
        "profile_background_color" : "FF6699",
        "listed_count" : 77,
        "lang" : "en",
        "profile_background_image_url" : "http://a3.twimg.com/profile_background_images/138228501/149174881-8cd806890274b828ed56598091c84e71_4c6fd4d8-full.jpg",
        "statuses_count" : 2475,
        "following" : null,
        "profile_text_color" : "362720",
        "protected" : false,
        "show_all_inline_media" : false,
        "profile_background_tile" : true,
        "name" : "Catherine Mullane",
        "contributors_enabled" : false,
        "profile_link_color" : "B40B43",
        "followers_count" : 169,
        "id" : 37486277,
        "profile_use_background_image" : true,
        "utc_offset" : null
    },
    "favorited" : false,
    "in_reply_to_user_id" : null,
    "id" : NumberLong("22819398300")
}

Copy after login

$group 操作

group = {'$group':{'_id':'$user.screen_name','count':{'$sum':1}}}

#group操作必须有个键是'_id'表示操作的对象，'$sum'表示求和操作

#上面这一行代码的意思是，统计各个'user.screen_name'的个数

Copy after login

$sort 操作，顾名思义，排序操作，其对某个键值进行升序或是降序操作

#接上段代码
sort = {'$sort:{'count':-1}} #按照'count'对应的值得降序排序

Copy after login

将group,sort整合到aggregate函数中，就能得到我们想要的结果

pipeline = [group,sort]
result = db.tweets.aggregate(pipeline)
#result 是一个字典。result['result'] 包含处理好的数据的列表
#整个操作就是，统计各个user.screen_name的数量，并倒序排列

Copy after login

上面仅仅是最简单的例子

下面我们继续讨论其他操作：

$match ,顾名思义，我更愿意叫他“过滤器”

好吧让我举个例子，我想找出数据库中谁的人气最旺！你给我个建议，怎么找到这个逗比？

￥￥￥￥￥￥￥￥￥￥￥￥￥￥￥￥￥￥￥￥￥￥￥好好想想￥￥￥￥￥￥￥￥￥￥￥￥￥￥￥￥￥￥￥￥￥￥￥￥￥

￥￥￥￥￥￥￥￥￥￥￥￥￥是国民老公，王思聪￥￥￥￥￥￥￥￥￥￥￥￥￥还是，臭脚，杨幂￥￥￥￥￥￥￥￥￥

￥￥￥￥￥￥￥￥￥是某个微博卖肉的小明星？￥￥￥￥￥￥￥还是传媒达人，谷大白话？￥￥￥￥￥￥￥￥￥￥￥￥

好吧，我只想到了个比值，用比值表示，是比值，不是逼值

比值 = 粉丝数/好友数

当然，关注数也还行，可我就是这么任性，像姜文大叔一样，怎么滴？

奥，你说，你要这样我不看了啦！

好！你不看就不看吧，小弟看了姜文大叔的电影，学了一个本事儿。

我悄悄告诉你：姜文的意思是：我拍电影不是给你看的，我是给自己看的。小弟不才，没钱拍电影，》》写博客不是给别人看的，是给自己看的。对！我在自言自语。另一个我在看博客。

扯回来$match ，不，扯回来找比值，看看我怎么找比值吧，不，是最大比值

match = {'$match':{'user.friends_count':{'$gt':0},'user.followers_count':{'$gt':0}}}
#确保 好友数和粉丝数都是正数
project = {'$project':{'ratio':{'$divide':['$user.followers_count','$user.friends_count']},
                        'screen_name':'$user.screen_name'}}
#创建'ratio'和'screen_name'两个键值，其中，'ratio'利用了'$divide'除法，对两个变量进行除法操作，当然，
#这个列表有先后顺序
#下面进行排序
sort = {'$sort':{'ratio':-1}} #降序排列
#选取第一位
limit = {'$limit':1}
#$limit 限制选择结果的个数

pipeline = [match, project, sort, limit]
result = db.tweets.aggregate(pipeline)

Copy after login

$unwind 操作, 举例如下:

假设有这样的字典结构：

{
	'id':'1',
	'author':'jone',
	'tags':['good','fun','good']
}

Copy after login

进行db.article.aggregate操作

db.article.aggregate([{'$prject':{'author':1,'tags':1}},{'$unwind':'tags'}])

Copy after login

结果为：

{'result':[{'_id':'XXXX','author':'jone','tags':'good'},
           {'_id':'XXXX','author':'jone','tags':'fun'},
           {'_id':'XXXX','author':'jone','tags':'good'}],
 'ok':1}

Copy after login

所以$unwind操作的操作对象是数组，如果不是数组会报错。他的作用就是将数组中的每个元素代替数组本身，最后产生多个item，新产生的item的数目自然就是

原来数组的长度。

$group操作

我们考虑最开始的twitter数据，如果我要找到哪一个微博文本被转发的平均次数最多，该如何写我们的aggregate呢？

首先要找到推文的hashtag，这里补充一下，上文中的twitter数据中的

"entities" : {
        "user_mentions" : [ ],
        "urls" : [ ],
        "hashtags" : [ ]

Copy after login

结构中，‘entiyies.hashtags’是个列表。所以我们可以进行$unwind操作

而’retweet_count‘标明了被转发的次数，进行平均计算就可以了。

 unwind = {'$unwind':'$entities.hashtags'}

Copy after login

 group = {'$group':{'_id':'entities.hashtags.text','retweet_avg':{'$avg':'$retweet_avg'}}}

Copy after login

注意：$group操作必须有'_id'属性，其次

'entities.hashtags.text'

Copy after login

还可以是自己起的名字，如’txt‘。’$avg‘是进行求平均值操作。类似的还有：

'$sum' '$first' '$last' '$max' '$min' 等

接着进行排序操作，这样所有操作就是如下：

unwind = {'$unwind':'$entities.hashtags'}

Copy after login

group = {'$group':{'_id':'$entities.hashtags.text','retweet_avg':{'$avg':'$retweet_avg'}}}

Copy after login

sort = {'$sort':{'retweet_avg':-1}}

Copy after login

limit = {'$limit':1}

Copy after login

pipeLine = [unwind,group,sort,limit]

Copy after login

db.article.aggregate(pipeLine)

Copy after login

毛主席教导的好，有矛就有盾，既有$unwind拆分数组，就收神器组成数组。你猜他会是什么呢？

$push, $addToSet

顾名思义，push和addToSet都是将元素组合到数组中，但是addToSet更加高级，Set是集合，所以addToSet形成的数组中没有重复元素。

push形成的数组中是可以有重复元素的。

这就是二者的不同之处。

好了，利用pymongo处理mongoDB是不是很简单呢？大家都这么说哒。

我也觉得不难啦。

Statement of this Website

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn