Understanding spatial developments within the location of Tokyo comfort shops
When strolling round Tokyo you’ll usually cross quite a few comfort shops, regionally often known as “konbinis”, which is smart since there are over 56,000 comfort shops in Japan. Typically there will likely be completely different chains of comfort retailer situated very shut to at least one one other; it’s not unusual to see shops across the nook from one another or on reverse sides of the road. Given Tokyo’s inhabitants density, it’s comprehensible for competing companies to be pressured nearer to one another, nevertheless, may there be any relationships between which chains of comfort shops are discovered close to one another?
The aim will likely be to gather location knowledge from quite a few comfort retailer chains in a Tokyo neighbourhood, to grasp if there are any relationships between which chains are co-located with one another. To do that would require:
- Means to question the situation of various comfort shops in Tokyo, so as to retrieve every retailer’s identify and site
- Discovering which comfort shops are co-located with one another inside a pre-defined radius
- Utilizing the information on co-located shops to derive affiliation guidelines
- Plotting and visualising outcomes for inspection
Let’s start!
For our use case we need to discover comfort shops in Tokyo, so first we’ll have to do some homework on what are the widespread retailer chains. A fast Google search tells me that the primary shops are FamilyMart, Lawson, 7-Eleven, Ministop, Every day Yamazaki and NewDays.
Now we all know what we’re looking out, lets go to OSMNX; an excellent Python bundle for looking out knowledge in OpenStreetMap (OSM). In accordance the OSM’s schema, we should always be capable of discover the shop identify in both the ‘model:en’ or ‘model’ discipline.
We are able to begin by importing some helpful libraries for getting our knowledge, and defining a perform to return a desk of places for a given comfort retailer chain inside a specified space:
import geopandas as gpd
from shapely.geometry import Level, Polygon
import osmnx
import shapely
import pandas as pd
import numpy as np
import networkx as nxdef point_finder(place, tags):
'''
Returns a dataframe of coordinates of an entity from OSM.
Parameters:
place (str): a location (i.e., 'Tokyo, Japan')
tags (dict): key worth of entity attribute in OSM (i.e., 'Identify') and worth (i.e., amenity identify)
Returns:
outcomes (DataFrame): desk of latitude and longitude with entity worth
'''
gdf = osmnx.geocode_to_gdf(place)
#Getting the bounding field of the gdf
bounding = gdf.bounds
north, south, east, west = bounding.iloc[0,3], bounding.iloc[0,1], bounding.iloc[0,2], bounding.iloc[0,0]
location = gdf.geometry.unary_union
#Discovering the factors inside the space polygon
level = osmnx.geometries_from_bbox(north,
south,
east,
west,
tags=tags)
level.set_crs(crs=4326)
level = level[point.geometry.within(location)]
#Ensuring we're coping with factors
level['geometry'] = level['geometry'].apply(lambda x : x.centroid if sort(x) == Polygon else x)
level = level[point.geom_type != 'MultiPolygon']
level = level[point.geom_type != 'Polygon']
outcomes = pd.DataFrame({'identify' : record(level['name']),
'longitude' : record(level['geometry'].x),
'latitude' : record(level['geometry'].y)}
)
outcomes['name'] = record(tags.values())[0]
return outcomes
convenience_stores = place_finder(place = 'Shinjuku, Tokyo',
tags={"model:en" : " "})
We are able to cross every comfort retailer identify and mix the outcomes right into a single desk of retailer identify, longitude and latitude. For our use case we will deal with the Shinjuku neighbourhood in Tokyo, and see what the abundance of every comfort retailer appears like:
Clearly FamilyMart and 7-Eleven dominate within the frequency of shops, however how does this look spatially? Plotting geospatial knowledge is fairly simple when utilizing Kepler.gl, which features a good interface for creating visualisations which may be saved as html objects or visualised straight in Jupyter notebooks:
Now that we now have our knowledge, the following step will likely be to search out nearest neighbours for every comfort retailer. To do that, we will likely be utilizing Scikit Be taught’s ‘BallTree’ class to search out the names of the closest comfort shops inside a two minute strolling radius. We’re not keen on what number of shops are thought of nearest neighbours, so we are going to simply take a look at which comfort retailer chains are inside the outlined radius.
# Convert location to radians
places = convenience_stores[["latitude", "longitude"]].values
locations_radians = np.radians(places)# Create a balltree to go looking places
tree = BallTree(locations_radians, leaf_size=15, metric='haversine')
# Discover nearest neighbours in a 2 minute strolling radius
is_within, distances = tree.query_radius(locations_radians, r=168/6371000, count_only=False, return_distance=True)
# Substitute the neighbour indices with retailer names
df = pd.DataFrame(is_within)
df.columns = ['indices']
df['indices'] = [[val for val in row if val != idx] for idx, row in enumerate(df['indices'])]
# create non permanent index column
convenience_stores = convenience_stores.reset_index()
# set non permanent index column as index
convenience_stores = convenience_stores.set_index('index')
# create index-name mapping
index_name_mapping = convenience_stores['name'].to_dict()
# change index values with names and take away duplicates
df['indices'] = df['indices'].apply(lambda lst: record(set(map(index_name_mapping.get, set(lst)))))
# Append again to unique df
convenience_stores['neighbours'] = df['indices']
# Determine when a retailer has no neighbours
convenience_stores['neighbours'] = [lst if lst else ['no-neighbours'] for lst in convenience_stores['neighbours']]
# Distinctive retailer names
unique_elements = set([item for sublist in convenience_stores['neighbours'] for merchandise in sublist])
# Depend every shops frequency within the set of neighbours per location
counts = [dict(Counter(row)) for row in convenience_stores['neighbours']]
# Create a brand new dataframe with the counts
output_df = pd.DataFrame(counts).fillna(0)[sorted(unique_elements)]
If we need to enhance the accuracy of our work, we may change the haversine distance measure for one thing extra correct (i.e., strolling instances calculated utilizing networkx), however we’ll hold issues easy.
This can give us a DataFrame the place every row corresponds to a location, and a binary rely of which comfort retailer chains are close by:
We now have a dataset able to carry out affiliation rule mining. Utilizing the mlxtend library we will derive affiliation guidelines utilizing the Apriori algorithm. There’s a minimal assist of 5%, in order that we will look at solely the principles associated to frequent occurrences in our dataset (i.e., co-located comfort retailer chains). We use the metric ‘raise’ when deriving guidelines; raise is the ratio of the proportion of places that comprise each the antecedent and consequent relative to the anticipated assist underneath the idea of independence.
from mlxtend.frequent_patterns import association_rules, apriori# Calculate apriori
frequent_set = apriori(output_df, min_support = 0.05, use_colnames = True)
# Create guidelines
guidelines = association_rules(frequent_set, metric = 'raise')
# Kind guidelines by the assist worth
guidelines.sort_values(['support'], ascending=False)
This offers us the next outcomes desk:
We are going to now interpret these affiliation guidelines to make some excessive degree takeaway learnings. To interpret this desk its greatest to learn extra about Affiliation Guidelines, utilizing these hyperlinks:
Okay, again to the desk.
Help is telling us how usually completely different comfort retailer chains are literally discovered collectively. Due to this fact we will say that 7-Eleven and FamilyMart are discovered collectively in ~31% of the information. A raise over 1 signifies that the presence of the antecedent will increase the probability of the ensuing, suggesting that the places of the 2 chains are partially dependent. Then again, the affiliation between 7-Eleven and Lawson reveals the next raise however with a decrease confidence.
Every day Yamazaki has a low assist close to our cutoff and reveals a weak relationship with the situation of FamilyMart, given by a raise barely above 1.
Different guidelines are referring to combos of comfort shops. For instance when a 7-Eleven and FamilyMart are already co-located, there’s a excessive raise worth of 1.42 that means a powerful affiliation with Lawson.
If we had simply stopped at discovering the closest neighbours for every retailer location, we might not have been in a position to decide something concerning the relationships between these shops.
An instance of why geospatial affiliation guidelines may be insightful for companies is in figuring out new retailer places. If a comfort retailer chain is opening a brand new location, affiliation guidelines may also help to establish which shops are prone to co-occur.
The worth on this turns into clear when tailoring advertising and marketing campaigns and pricing methods, because it supplies quantitative relationships about which shops are prone to compete. Since we all know that FamilyMart and 7-Eleven usually co-occur, which we display with affiliation guidelines, it could make sense for each of those chains to pay extra consideration to how their merchandise compete relative to different chains akin to Lawson and Every day Yamazaki.
On this article we now have created geospatial affiliation guidelines for comfort retailer chains in a Tokyo neighbourhood. This was completed utilizing knowledge extraction from OpenStreetMap, discovering nearest neighbour comfort retailer chains, visualising knowledge on maps, and creating affiliation guidelines utilizing an Apriori algorithm.
Thanks for studying!