Commit cdfdc49b authored by Dorian's avatar Dorian
Browse files

update doc

parent 7ed6e965
......@@ -22,6 +22,8 @@ python3 extract.py
To look at db:
* ```sqlite3```
* ```.open potelets.db```
* ```.header on```
* ```.mode column```
* ```select * from potelets;```
* ```select * from attachments where potelet_id=265057;```
* ```select * from history where potelet_id=265057;```
......
# potelets re-opened case
#-----------------
# show all possibile history type
SELECT type FROM history GROUP BY type;
# show all type of actors
SELECT actor_type, potelet_id FROM attachments GROUP BY actor_type;
SELECT actor_type, potelet_id FROM history GROUP BY actor_type;
# show all the text from COMMENTS
# or all the href of PICTURES
SELECT text
FROM attachments
WHERE type='COMMENT';
# sort the actors by number of cases
SELECT responsibleOrganisation, count(*) AS c
FROM potelets
GROUP BY responsibleOrganisation
ORDER BY c;
# sort the actors by 'answering the most'
SELECT organisation, count(*) AS c
FROM attachments
GROUP BY organisation
ORDER BY c;
#-----------------
# potelets re-opened case by looking at attachment
SELECT potelet_id FROM attachment WHERE type='REOPEN_COMMENT';
# potelets re-opened case by looking at history
SELECT potelet_id FROM history WHERE type='INCIDENT_ASKED_REOPENING';
SELECT potelet_id FROM history WHERE type='INCIDENT_REOPENED';
# potelets marked as duplicates or severalOccurrence
SELECT * FROM potelets WHERE duplicates>0;
SELECT * FROM potelets WHERE severalOccurrence>0;
# => pas d'accès a ceux aux autres occurences ou duplicates...
# potelets where there is a lot of attachments
......@@ -12,8 +43,6 @@ SELECT potelet_id, count(*) AS c
FROM attachments
GROUP BY potelet_id
HAVING c > 5;
# then get the attachments
SELECT * FROM attachments WHERE potelet_id=199249;
# potelets where there is a lot of pictures/comment
SELECT potelet_id, count(*) AS c
......@@ -22,10 +51,10 @@ WHERE type='COMMENT'
GROUP BY potelet_id
HAVING c > 3;
# => got interesting result with this method (the one with a lot of comments)! look at 254080
# show all the text from COMMENTS
SELECT content
SELECT potelet_id, count(*) AS c
FROM attachments
WHERE type='COMMENT'
SORT BY potelet_id;
WHERE type='PICTURE'
GROUP BY potelet_id
HAVING c > 6;
# => got interesting result with this method (the one with a lot of comments)! look at 254080
......@@ -10,8 +10,8 @@ from sqlite3 import Error
db_file = 'potelets.db'
url = 'http://fixmystreet.brussels/api/'
ucat = 'categories'
uinc = 'incidents'
urlcat = 'categories'
urlinc = 'incidents'
headers = {'Accept': 'application/hal+json'}
originDate = '2000-01-01'
......@@ -403,7 +403,7 @@ def addActorFromHistory(conn, story):
def getPoteletCat():
# get the potelet category json object with their subcategory
potelet_cat = {}
r = requests.get(url+ucat, headers = headers)
r = requests.get(url+urlcat, headers = headers)
categories = r.json()['response']['categories']
for category in categories:
if category['id'] == mobilierurbain_catid:
......@@ -418,7 +418,7 @@ def getNumberOfIncidents(category=''):
params = { 'startDate': originDate, 'size': itemsbypages }
if category:
params['category'] = category
r = requests.get(url+uinc, headers = headers, params = params)
r = requests.get(url+urlinc, headers = headers, params = params)
number = r.json()['page']['totalElements']
pages = r.json()['page']['totalPages']
return number, pages
......@@ -442,7 +442,7 @@ def getPotelets(number_limit=0):
'size': itemsbypages,
'page': i }
print('requesting potelets (pages: ' + str(i+1) + '/' + str(pages) + ').....')
r = requests.get(url+uinc, headers = headers, params = params)
r = requests.get(url+urlinc, headers = headers, params = params)
potelets += r.json()['_embedded']['response']
print('')
return potelets
......@@ -491,6 +491,7 @@ if __name__ == '__main__':
print('Total number of potelets: ' + str(total_number_of_potelet))
potelets = getPotelets(poteletbycategory)
i = 0;
for potelet in potelets:
addPotelet(conn, potelet)
......@@ -510,7 +511,8 @@ if __name__ == '__main__':
# we commit the change to the db at only the end of every potelets
# to not loose too much time
conn.commit()
print("(+) potelets added in db")
i++
print("+ added in db (" + str(i) + "/" + str(len(potelets)) + ")")
print('')
conn.close()
......
No preview for this file type
File added
......@@ -2,46 +2,63 @@
## data structures
There is two subcategories of potelet: *dommaged* and *missing*.
There is two subcategories of potelet: *damaged* and *missing*.
There is respectively *4515* and *1833* items in each.
For an idea the whole website contains *129781* items.
<!-- We can only make a request by category if we send a subcategory with no subcategory.
So we have to make a different request for every subcategory of potelet. -->
<!-- We can get all the potelet in every of those category with one call. -->
Every incidents contains mainly an adress, a location in x,y coordinate, a status (*CREATED*, *PROCESSING* and *CLOSED*), a creation and last updated date, and some responsible organisation/department linked to it.
### attachments
For every potelet there is an **attachments** thread/section, that contains the *pictures* and *comments*.
For every incident there is an **attachments** thread/section, that contains the **pictures** and **comments**.
Those are like a thread of different actors interacting by posting images or texts, every attachment has a date of publication.
<!-- We have to do a request for every potelet to get the attachments and iterate through it. -->
Types of attachment entries:
COMMENT, DISMISSAL_COMMENT, FILE, MERGE_COMMENT, PICTURE, REOPEN_COMMENT, SYSTEM_COMMENT.
<!-- SELECT type FROM attachments GROUP BY type; -->
### history
For every potelet there is an **history** thread/section, that contains its changes of status.
The status changes can give us information on how much times did it took before a responsible decided to take in charge the incident. This also contains the comments that were not the creation of the incident as *updates*.
The status changes can give us informations about the progression of the incidents on a timeline.
<!-- permet de retrouver ceux qui ont eux des updates ou des transfers etc -->
Types of history entries:
INCIDENT_ACCEPTED, INCIDENT_ASKED_REOPENING, INCIDENT_CLOSED, INCIDENT_CREATED, INCIDENT_DISMISSED, INCIDENT_DISMISSED_BY_MERGE, INCIDENT_GRANTEE_ASSIGNED, INCIDENT_MERGED, INCIDENT_PRIVATIZED, INCIDENT_PUBLISHED, INCIDENT_REOPENED, INCIDENT_UPDATE_PUBLISHED.
<!-- SELECT type FROM history GROUP BY type; -->
### actors
For every entries in the attachment or history section, there is an **actor**. Those actors are of three **types**:
* *CITIZEN* (users with their private data hidden),
* *PROFESSIONAL* (a municipality or organisation that accepts to 'fix' the incident)
* *SYSTEM* (Région Bruxelles-Capitale).
For every entries in the attachment or history section, there is an **actor**. Those actors are of **types**:
* *CITIZEN* (users with their private data hidden by the API),
* *PROFESSIONAL* (a *organisation* or a *departement*)
* *SYSTEM* (the admins: "Région Bruxelles-Capitale (Bruxelles Mobilité)", "Bruxelles-Propreté" (?), "Sibelga Eclairage" (?))
* *UNKNOWN* (rare and only for early entries...)
For *PROFESSIONAL* and *SYSTEM*, there is two category of actors: *organisation* and *department* (**with different id**)
The departments are always linked to an organisation, giving more precision about who it is.
For *PROFESSIONAL*, there are two subcategories of actors: *organisation* and *department*.
Every *department* is linked to a certain *organisation*. So *organisation* and *departement* have **separate id systems** and are from different tables.
<!-- in fact organisation and departement must be two different tables.
https://fixmystreet.brussels/api/incidents/267692/ (resp org id = 4)
https://fixmystreet.brussels/api/incidents/266302/ (resp dep id = 4) -->
An actor is automaticaly assigned at the creation of an incident (*organisation* and *department*), even before it is marked *ACCEPTED/PROCESSING*.
<!-- There is no visible pointers of how the *department* and *organisation* are linked, but by iterating through the incidents we can create connections. -->
Attachments entries, when the type of the actor is *PROFESSIONAL*, are always marked has made by an *organisation*, whithout precision of the *departement*, and accompagnied with the *organisation* id.
There is no *SYSTEM* type entries in attachments.
<!-- SELECT * FROM attachments WHERE department is not null;-->
<!-- SELECT * FROM attachments GROUP BY actor_type; -->
History entries, when the type of the actor is *PROFESSIONAL*, are marked has made by both an *organisation* and a *departement*, but no id is precised...
For the *SYSTEM* ones, then only an *organisation* (and no *department*) is precised.
<!-- SELECT * FROM history WHERE actor_type="SYSTEM" and department is not null; -->
<!-- For the attachments, they only reference the *organisation*.
Two actors are also automatically assigned at the creation of an incident (one *organisation* and one *department*), even before it is marked has *PROCESSING* (when it is still marked has *CREATED*). It may be associated autimatically according the postal code?
Sometimes the responsible *organisation* is the *SYSTEM* one ("Région Bruxelles-Capitale (Bruxelles Mobilité)"), in this case the *department* may vary though.
We also have separate contact informations for the organisation and the department here (not yet extracted from the db).
<!-- select * from potelets where responsibleOrganisation="Région Bruxelles-Capitale (Bruxelles Mobilité)" GROUP BY responsibleDepartment; -->
For the history, they reference *department* linked to an *organisation*, but their id is not precised. -->
### schemes
......@@ -53,17 +70,16 @@ A special case is the **duplicate** [à completer]
## stats on images
Most of potelet (around 77%) have 0 or 1 image.
Some (around 22%) have between 2 and 6 images.
Then only 0.003% have 7 images or more.
```
Total number of Potelet: 6349
Total number of Images: 6727
Max number of images by potelet: 23
Average number of images by potelet: 1.0595369349503858
```
* Most of potelet - around 77% - have 0 or 1 image.
* Some - around 22% - have between 2 and 6 images.
* Then only 0.003% have 7 images or more.
```
0: 1867
1: 3079
......@@ -84,21 +100,30 @@ Average number of images by potelet: 1.0595369349503858
## classification and ordering possibilities
### classification
* separate by **categories**: *dommaged* and *missing*.
* separate by **by responsible/process** (if there is): *responsible organisation* and/or *responsible departement*, and/or by status: *CREATED*, *PROCESSING*, *CLOSED*
* separate by **categories**: *damaged* and *missing*.
* separate by **by actors**: *responsible organisation* and/or *responsible departement* in charge of the incident.
* separate by **status**: *CREATED*, *PROCESSING*, *CLOSED*
* separate by **location**: *municipality*.
* separate by **type of location**: analysis on the picture and/or location to detect if it's a crossection, a small street, etc.
### filtering
* only those with marked with the **severalOccurrence** or **duplicates** special tag. Though there seems to be no way to access the one with which there are duplicate.
* only those at specific **urbanistic contexts** (need to be computed in some ways?)
* only those with certain attachment type ("REOPEN_COMMENT" for example)
* only those with certain history type ("INCIDENT_ASKED_REOPENING" for example)
* only those with a certain number of **attachments**, or number of comments?.
* only those with at least an **attachment** from a *PROFESSIONAL*, meaning: a start of response/conversation.
* only those with marked as **re-opened** or **duplicates**
* only those with lots of **attachments**
* only those at specific **urbanistic contexts**
### ordering
* order by **creation date**
* order by **last updated date**
* order like **a walk** that **minimise** the change of location and/or time.
* order like **a walk** that **minimise** the change of lo for location and/or time.
### WHICH ONES INTEREST US?
1. The one with at the same time images and text.
2. To compare the one in a same cluster/location (look where there is a condensation of incidents?)
## other notes
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment