Looking at the program on Pastebin, there are some things that look not right. I would be surprised if this program executes at all.
In particular, you are referring to outside distributed data sets inside the filter function. You are calling collect() in every filter function, which actually triggers the program execution (every time the filter function is invoked!)
To make this work, you need to pull the collect call out of the filter function.
Also, consider using a join, if you want to do an intersection of data sets (or contains-check). Broadcast variables are also available for filter functions.
Stephan