Utility

These methods are mostly utility methods written in python in pyscan.py.

pyscan.close_region(region)[source]

Ensures that this region starts and ends at the same point. This is useful for using trajectory methods for regions. :param region: List of points :return: list of points

pyscan.evaluate_range(range, mp, bp, disc_f)[source]

Evaluates this range to compute the total discrepancy over a set of points.

Parameters:
  • range – Some arbitrary range.
  • mp – measured set
  • bp – baseline set
  • disc_f – Discrepancy function.
Returns:

Discrepancy function value.

pyscan.evaluate_range_trajectory(range, mp, bp, disc_f)[source]

Evaluates this range to compute the total discrepancy over a set of trajectories.

Parameters:
  • range – Some arbitrary range.
  • mp – measured set
  • bp – baseline set
  • disc_f – Discrepancy function.
Returns:

Discrepancy function value.

pyscan.max_disk_region(net, red_sample, red_weight, blue_sample, blue_weight, min_disk_r, max_disk_r, alpha, disc, fast_disk=True)[source]

Computes the highest discrepancy disk over a set of trajectories. Executes at multiple scales using the grid directional compression method and internally compresses the trajectories if fast_disk is enabled.

Parameters:
  • net – A list of trajectories
  • red_sample – A list of trajectories
  • blue_sample – A list of trajectories
  • min_disk_r – The minimum disk radius to consider.
  • max_disk_r – The maximum disk radius to consider.
  • alpha – The spatial error with which to approximate the trajectories.
  • disc – The discrepancy function to use.
  • fast_disk – Default True.
Returns:

pyscan.max_disk_trajectory(net, red_sample, blue_sample, min_disk_r, max_disk_r, alpha, disc, fast_disk=True)[source]

Computes the highest discrepancy disk over a set of trajectories. Executes at multiple scales using the grid directional compression method and internally compresses the trajectories if fast_disk is enabled.

Parameters:
  • net – A list of trajectories (scaled to be in a 0 by 1 box).
  • red_sample – A list of trajectories (scaled to be in a 0 by 1 box).
  • blue_sample – A list of trajectories (scaled to be in a 0 by 1 box).
  • min_disk_r – The minimum disk radius to consider.
  • max_disk_r – The maximum disk radius to consider.
  • alpha – The spatial error with which to approximate the trajectories.
  • disc – The discrepancy function to use.
  • fast_disk – Default True.
Returns:

pyscan.max_disk_trajectory_fixed(net, m_sample, b_sample, min_disk_r, max_disk_r, disc, fast_disk=True)[source]

Computes the highest discrepancy disk over a set of trajectories. Executes at multiple scales, but uses whatever set of points the trajectories have been compressed with.

Parameters:
  • net – A list of trajectories (scaled to be in a 0 by 1 box).
  • m_sample – A list of trajectories (scaled to be in a 0 by 1 box).
  • b_sample – A list of trajectories (scaled to be in a 0 by 1 box).
  • min_disk_r – The minimum disk radius to consider.
  • max_disk_r – The maximum disk radius to consider.
  • disc – The discrepancy function to use.
  • fast_disk – Default True.
Returns:

A tuple of the maximum disk and the corresponding maximum value.

pyscan.paired_plant_region(traj_start, traj_end, r, q, region_plant_f)[source]

This plants a region where every trajectory completely outside or inside of the region has an endpoint chosen at random. Every trajectory with one endpoint inside the region has an endpoint chosen inside with probability q (exactly q fraction have one endpoint in the region) traj_start and traj_end should be the same length.

Parameters:
  • traj_start – List of points.
  • traj_end – List of points.
  • r – Fraction of points in the region
  • q – Fraction of points in the region that are anomalous
  • region_plant_f – Scanning function to use to find the region (example max_disk)
Returns:

Red planted set, blue planted set, and the planted region.

pyscan.plant_disk(pts, r, p, q)[source]

Create a set of red and blue points with a random disk planted containing r fraction of the points with q fraction of the points in the region being red and p fraction of the points outside of the region being red.

Parameters:
  • pts – List of points.
  • r – Fraction of points contained in the planted region
  • p – Fraction of points outside region that red.
  • q – Fraction of points inside region that are red.
Returns:

red set, blue set, planted region.

pyscan.plant_full_disk(trajectories, r, p, q)[source]

Choose a point at random from a trajectory and then expand outward from there until we find a disk contains r fraction of the trajectories.

Parameters:
  • trajectories – List of trajectories. This can be rearanged.
  • r – Fraction of trajectories in region.
  • p – Fraction of red trajectories outside of region.
  • q – Fraction of red trajectories inside of region.
  • disc – Discrepancy function to measure exactly on region.
Returns:

red set, blue set, planted disk, maximum discrepancy.

pyscan.plant_full_disk_region(region_set, r, p, q)[source]

Choose a point at random from a region and then expand a disk out from this point till this disk contains r fraction of all the regions. :param region_set: List of list of points. This argument is modified by adding a point at the end in some cases. :param r: double between 0 and 1 :param p: double between 0 and 1 :param q: double between 0 and 1 :return: red set of regions, blue set of regions, the planted region

pyscan.plant_full_halfplane(trajectories, r, p, q)[source]

Choose a random direction and then finds a halfplane with this normal containing r fraction of the total trajectories.

Parameters:
  • trajectories – List of list of points.
  • r – Fraction of trajectories in region.
  • p – Fraction of red trajectories outside of region.
  • q – Fraction of red trajectories inside of region.
  • disc – Discrepancy function to measure exactly on region.
Returns:

red set of trajectories, blue set of trajectories, planted region, discrepancy function value.

pyscan.plant_full_square(trajectories, r, p, q, max_count=32)[source]

Choose a point at random from a trajectory and then expand a square out from this point till this region contains r fraction of all the trajectories.

Parameters:
  • trajectories – List of list of points.
  • r – double between 0 and 1
  • p – double between 0 and 1
  • q – double between 0 and 1
  • disc – The discrepancy function to evaluate exactly on this region.
  • max_count – The maximum number of times we will attempt to find the right sized region.
Returns:

red set of trajectories, blue set of trajectories, the planted region, and the exact partial discrepancy.

pyscan.plant_full_square_region(regions, r, p, q, max_count=32)[source]

Choose a point at random from a region and then expand a square out from this point till this region contains r fraction of all the regions.

Parameters:
  • regions – List of list of points. These lists are modified. In some cases the regions will have a point appended at the end to close them.
  • r – double between 0 and 1
  • p – double between 0 and 1
  • q – double between 0 and 1
  • max_count – The maximum number of times we will attempt to find the right sized region.
Returns:

red set of regions, blue set of regions, the planted region

pyscan.plant_halfplane(pts, r, p, q)[source]

Create a set of red and blue points with a random halfplane planted containing r fraction of the points with q fraction of the points in the region being red and p fraction of the points outside of the region being red.

Parameters:
  • pts – List of points
  • r – Fraction of points contained in the planted region
  • p – Fraction of points outside region that red.
  • q – Fraction of points inside region that are red.
Returns:

red set, blue set, planted region.

pyscan.plant_partial_disk(trajectories, r, p, q, eps)[source]

Choose a point at random from a trajectory and then expand outward from there. Computes the fraction of length inside the disk for each segment and then does bisection on this amount since it is a monotonic function to compute within eps fraction.

Parameters:
  • trajectories – List of list of points.
  • r – double between 0 and 1
  • p – double between 0 and 1
  • q – double between 0 and 1
  • eps – This defines the maximum difference between r and the fraction of the trajectories that are in the found region.
  • disc – The discrepancy function to evaluate exactly on this region.
Returns:

red set of trajectories, blue set of trajectories, the planted region, and the exact partial discrepancy.

pyscan.plant_partial_halfplane(trajectories, r, p, q, eps)[source]

Choose a random direction and expand the region along this direction till we contain r fraction of the total trajectory arc length.

Parameters:
  • trajectories – List of list of points.
  • r – double between 0 and 1
  • p – double between 0 and 1
  • q – double between 0 and 1
  • eps – This defines the maximum difference between r and the fraction of the trajectories that are in the found region.
  • disc – The discrepancy function to evaluate exactly on this region.
Returns:

red set of trajectories, blue set of trajectories, the planted region, and the exact partial discrepancy.

pyscan.plant_partial_rectangle(trajectories, r, p, q, eps)[source]

This plants a region containing r fraction of the total arc length of all trajectories. q fraction of the trajectories crossing this region are assigned to be anomalous and p fraction of trajectories not crossing this region are assigned to be anomalous.

Parameters:
  • trajectories – List of list of points.
  • r – double between 0 and 1
  • p – double between 0 and 1
  • q – double between 0 and 1
  • eps – This defines the maximum difference between r and the fraction of the trajectories that are in the found region.
Returns:

red set of trajectories, blue set of trajectories, the planted region.

pyscan.plant_rectangle(pts, r, p, q)[source]

Create a set of red and blue points with a random rectangle planted containing r fraction of the points with q fraction of the points in the region being red and p fraction of the points outside of the region being red.

Parameters:
  • pts – List of points.
  • r – Fraction of points contained in the planted region
  • p – Fraction of points outside region that red.
  • q – Fraction of points inside region that are red.
Returns:

red set, blue set, planted region.

pyscan.plant_region(points, r, p, q, eps, scan_f)[source]

This takes a scanning function and two point sets and then computes a planted region that contains some fraction r of the points with some tolerance. This then computes a red and blue set of points based on the planted region.

Inside the region q fraction of points are red. Outside the region p fraction of points are red

Parameters:
  • points – List of points.
  • r – Fraction of points inside the region.
  • p – Fraction of points outside the region that are red.
  • q – Fraction of points inside the region that are red.
  • eps – Difference between r and the fraction of points inside the region.
  • scan_f – The scan function to use (ex max_disk)
Returns:

red set, blue set, region planted.

pyscan.random_rect(points, r)[source]

Plants a random rectangle containing r fraction of the points.

Parameters:
  • points – List of points
  • r – Fraction of points inside of planted region.
Returns:

The planted region.

pyscan.evaluate_halfplane(reg, mpts, bpts, disc_f)
pyscan.evaluate_halfplane_labeled(reg, lmpts, lbpts, disc_f)
pyscan.evaluate_halfplane_trajectory(reg, mtrajs, btrajs, disc_f)
pyscan.evaluate_disk(reg, mpts, bpts, disc_f)
pyscan.evaluate_disk_alt(reg, mpts, bpts, disc_f)
pyscan.evaluate_disk_labeled(reg, lmpts, lbpts, disc_f)
pyscan.evaluate_disk_trajectory(reg, mtrajs, btrajs, disc_f)
pyscan.evaluate_rectangle(reg, mpts, bpts, disc_f)
pyscan.evaluate_rectangle_labeled(reg, lmpts, lbpts, disc_f)
pyscan.evaluate_rectangle_trajectory(reg, mtrajs, btrajs, disc_f)

These functions can be used to evaluate different regions on different kinds of data sets.

pyscan.size_region(fraction)

Creates a discrepancy function that can be used to find a region containing a certain fraction of the points.

Parameters:fraction – A double between 0 and 1.
Return type:A discrepancy function.
pyscan.evaluate(disc_f, m, m_total, b, b_total)

A utility function for explicitly evaluating a discrepancy function object.

Parameters:
  • disc_f – A double between 0 and 1.
  • m – A double between 0 and m_total
  • m_total – A double.
  • b – A double between 0 and b_total.
  • b_total – A double.
Return type:

A double returned by the discrepancy function.