Commit 9d293f1f authored by Dorian's avatar Dorian
Browse files

go potelets

parent fb81b486
This diff is collapsed.
# work.potelets
# Potelet / Paaltje / Bollard
Creating a printed object of the potelets incidents of the fixmystreet.brussels app, for MAMA.
\ No newline at end of file
Creating a printed object of the potelets incidents of the [fixmystreet.brussels](http://fixmystreet.brussels) app, for [MAMA](http://mama.brussels).
Information on the API can be found here:
* https://data.gov.be/fr/dataset/01593a26-ed57-498e-bec0-13011a75a773
* https://api.brussels/store/apis/info?name=fixmystreet&version=1.0.0&provider=admin
## data structures
There is two subcategories of potelet: *dommaged* and *missing*.
There is respectively *4515* and *1833* items in each.
For an idea the whole website contains *129781* items.
<!-- We can only make a request by category if we send a subcategory with no subcategory.
So we have to make a different request for every subcategory of potelet. -->
<!-- We can get all the potelet in every of those category with one call. -->
For every potelet there is an **attachments** thread/section, that contains the *pictures* and *comments*.
Those are like a thread of different actors interacting by posting images or texts, every attachment has a date of publication.
<!-- We have to do a request for every potelet to get the attachments and iterate through it. -->
For every potelet there is an **history** thread/section, that contains its changes of status.
The status changes can give us information on how much times did it took before a responsible decided to take in charge the incident. This also contains the comments that were not the creation of the incident as *updates*.
For every entries in the attachment or history section, there is an **actor**. Those actors are either:
* *CITIZEN* (users with their private data hidden),
* *PROFESSIONAL* (a municipality or organisation that accepts to 'fix' the incident)
* *SYSTEM* (Région Bruxelles-Capitale).
This makes a structure of 4 tables plus an image folder.
![alt text](datastructure.png "Title")
A special case is the **duplicate** [à completer]
## stats on images
```
Total number of Potelet: 6349
Total number of Images: 6727
Max number of images by potelet: 23
Average number of images by potelet: 1.0595369349503858
```
* Most of potelet - around 77% - have 0 or 1 image.
* Some - around 22% - have between 2 and 6 images.
* Then only 0.003% have 7 images or more.
```
0: 1867
1: 3079
2: 930
3: 295
4: 96
5: 41
6: 20
7: 9
8: 5
10: 2
11: 1
12: 2
13: 1
23: 1
```
## classification and ordering possibilities
### classification
* separate by **categories**: *dommaged* and *missing*.
* separate by **by responsible/process** (if there is): *responsible organisation* and/or *responsible departement*, and by status: *CREATED*, *PROCESSING*, *CLOSED*
* separate by **location**: *municipality*.
* separate by **type of location**: analysis on the picture and/or location to detect if it's a crossection, a small street, etc.
### ordering
* order by **creation date**
* order by **last updated date**
* order like **a walk** that **minimise** the change of location and/or time.
## other notes
For every incident there is already an automaticaly generated .pdf everyone can download on the website.
We can get it with the API: ```https://fixmystreet.brussels/api/incidents/{id}/pdf?lang={lang}&addressLang={addressLang}&type={type}```
A data that is important but doesn't appear in the individual information about potelet incidents is **the macro repartition** of the incident on a map with it's *cluster* and *hole*.
This could be explicitly represented in the final object if there is one section by municipality and some are nearly empty while others are big.
import json
import requests
# TODO:
# detect duplicates
# precise actors names
# (?) separate creation comments and new comments
# begin SQL database structures
# (the goal is not to duplicate their database!
# but to make something easy for us to layout the data)
url = 'http://fixmystreet.brussels/api/'
u_cat = 'categories'
u_inc = 'incidents'
headers = {'Accept': 'application/hal+json'}
originDate = '2000-01-01'
mobilierurbain_catid = 1007
potelet_catid = 2030
itemsbypages = 12
def getPoteletCat():
# get the potelet category json object with their subcategory
potelet_cat = {}
r = requests.get(url+u_cat, headers = headers)
categories = r.json()['response']['categories']
for category in categories:
if category['id'] == mobilierurbain_catid:
# print(category['nameFr'] + str(category['id']))
for subcategory in category['subCategories']:
# print(subcategory['nameFr'] + str(subcategory['id']))
if subcategory['id'] == potelet_catid:
potelet_cat = subcategory
return potelet_cat
def getNumberOfIncidents(category=''):
params = { 'startDate': originDate, 'size': itemsbypages }
if category:
params['category'] = category
r = requests.get(url+u_inc, headers = headers, params = params)
number = r.json()['page']['totalElements']
pages = r.json()['page']['totalPages']
return number, pages
def getPotelets():
# get all the poteleeeeets
# the category in params must be a subcategory with no subcategory
# so either we make a different request for every subcat of potelet <----
# or we make a global request and then retrieve in it every potelet object
# (if we put a big size number we get all the items in one requests)
potelets = []
# number = 1000
for potelet_subcat in potelet_cat['subCategories']:
total, pages = getNumberOfIncidents(potelet_subcat['id'])
print(str(total) + ' / ' + str(pages) + ' / ' + str(itemsbypages))
for i in range(pages):
params = { 'startDate': originDate,
'category': potelet_subcat['id'],
'size': itemsbypages,
'page': i }
print('requesting potelets (pages:' + str(i) + '/' + str(pages) + ').....')
r = requests.get(url+u_inc, headers = headers, params = params)
print('got it!')
potelets += r.json()['_embedded']['response']
return potelets
def getAttachments(id):
# get the attachments list (COMMENTS and PICTURES)
# of a potelet with its id
url_attachments = potelet['_links']['attachments']['href']
attachments = requests.get(url_attachments, headers = headers).json()
#sometimes it's just an empty dict...
if attachments:
attachments = attachments['response']
else:
#make it an empty array instead of empty dict
attachments = []
return attachments
def getHistory(id):
# get the history list (evolution of status and acceptations)
# of a potelet with its id
url_history = potelet['_links']['history']['href']
history = requests.get(url_history, headers = headers).json()
#sometimes it's just an empty dict...
if history:
history = history['response']
else:
#make it an empty array instead of empty dict
history = []
return history
def getActor(story):
# for the attachment its in ['reporter']
# for the history its in ['information']
# there can be contacts informations further in
actor = story['information']['actorType']
if actor == 'PROFESSIONAL':
corporation = story['information']['corporation']['nameEn']
team = story['information']['team']['nameEn']
actor = actor + ' (' + corporation + ' // ' + team + ')'
# there can be more infos!!
elif actor == 'SYSTEM':
corporation = story['information']['corporation']['nameEn']
actor = actor + ' (' + corporation + ')'
return actor
#-------------------------
print('~!~ POTELETS ~!~')
print('Total number of incidents: ' + str(getNumberOfIncidents()[0]))
potelet_cat = getPoteletCat()
# print(json.dumps(potelet_cat, indent=2))
for potelet_subcat in potelet_cat['subCategories']:
print(potelet_subcat['nameEn'] + ': ' + str(getNumberOfIncidents(potelet_subcat['id'])[0]))
print('\n')
potelets = getPotelets()
# print(json.dumps(potelets, indent=2))
img_list = []
for potelet in potelets:
img_list += [0]
# header
id = potelet['id']
subcat = potelet['category']['category']['nameEn']
adress = (potelet['location']['address']['streetNameFr'] + ' ' +
potelet['location']['address']['streetNumber'] + ', ' +
potelet['location']['address']['postalCode'])
creationdate = potelet['creationDate']
updateddate = potelet['updatedDate']
print('Potelet id: ' + str(id))
print('Status: ' + potelet['status'])
print('Category: ' + subcat)
print('Adress: ' + adress)
# print('creation date: ' + creationdate)
# print('updated date: ' + updateddate)
# --> those are in the history!
# attachments
attachments = getAttachments(id)
# print(json.dumps(attachments, indent=2))
if attachments:
print('---[ attachments ]---')
for attachment in attachments:
actor = attachment['reporter']['type']
date = attachment['creationDate']
if attachment['type']=='PICTURE':
img = attachment['_links']['content']['href']
img_list[-1] += 1
print('• ' + date + ' | ' + actor + ': ' + str(img))
elif attachment['type']=='COMMENT':
comment = attachment['content']
print('• ' + date + ' | ' + actor + ': ' + comment)
# history
# history = getHistory(id)
# # print(json.dumps(attachments, indent=2))
# if history:
# print('---[ history ]---')
# for story in history:
# actor = getActor(story)
# date = story['historyDate']
# type = story['historyType']
# print('• ' + date + ' | ' + actor + ': ' + type)
# if actor not in ['CITIZEN','PROFESSIONAL','SYSTEM']:
# print(id + ' -----> ' + actor)
# else:
# print(actor)
print('\n')
#-------------
print(img_list)
print( 'Total number of Potelets: ' + str(len(img_list)))
print( 'Total number of Images: ' + str(sum(img_list)))
print( 'Max number of images by potelet: ' + str(max(img_list)))
print( 'Average number of images by potelet: ' + str(sum(img_list) / len(img_list)))
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment